Graphics processing units (GPUs) are key to both generative and predictive AI. Data scientists, machine learning engineers, and AI engineers rely on GPUs to experiment with AI models, and to train, tune, and deploy them. Managing these essential resources can be complex, however, often stalling development and innovation.

Infrastructure limitations shouldn't hold your organization back. Your team needs to focus on building, refining, and using AI models, not managing complex GPU infrastructure. This is why information technology operations (ITOps) plays a crucial role in enabling rapid AI development and inference by providing on-demand GPU access, also known as GPU-as-a-Service.

The GPU challenge: A multifaceted problem for ITOps

Setting up an efficient GPU infrastructure for AI workloads is not trivial, and ITOps teams face several significant challenges:

  • GPU scarcity and cost constraints: GPUs can be challenging to access due to limited supply, cloud constraints, and internal competition. This can be further compounded with a lack of customer choice and control over the underlying accelerator architecture, not to mention that. GPUs already come with high costs, including acquisition and operational expenses, and are often underused.
  • Lack of GPU access drives shadow IT: If data scientists, ML engineers, and AI engineers cannot readily access GPUs when they need them, they may turn to "shadow IT." This can mean using third-party services, potentially exposing sensitive company data, or independently procuring GPU resources from various cloud providers, leading to increased costs and security risks. This results in a loss of control over resource usage, data security, and compliance.
  • Fragmented GPU infrastructure: GPU resources are often scattered across on-premise data centers, multiple public clouds, and even edge locations. This heterogeneous environment, with varying accelerator types—including GPUs,  AI accelerators,  and architectures—makes management complex and hinders efficient resource allocation, reducing efficiency and increasing costs.
  • The GPU utilization black box: Organizations often face difficulty tracking GPU usage across the enterprise, making it hard to maximize return on investment (ROI) and identify underused resources. In a multi-tenancy situation, it becomes challenging to establish fair usage policies, accurately allocate resources, and attribute costs.
  • Achieving secure GPU multi-tenancy: Strengthening data security in multi-tenancy is complex, and involves isolating tenants' network traffic to prevent unauthorized access and data leakage, protecting sensitive data from unauthorized access or theft, and adhering to regulatory requirements while maintaining evidence of compliance.

Red Hat's solution: Solving the GPU puzzle with GPU-as-a-Service

Red Hat provides a complete strategy for addressing these challenges. Our approach focuses on consolidating and simplifying the underlying GPU infrastructure. By pooling accelerators—including different GPU types, sizes, and locations—from on-premise, cloud, and edge environments, organizations can simplify GPU management and orchestration through a single, unified platform.

The Red Hat AI platform optimizes performance and efficiency by intelligently matching workloads to the most suitable GPU resources, maximizing utilization through efficient scheduling and placement. To help organizations maintain visibility and control, we also provide real-time GPU monitoring to help identify bottlenecks and improve resource allocation. Ongoing enhancements will track consumption and usage patterns to help with cost optimization.

This system also enables protected and efficient GPU multi-tenancy. It isolates tenant environments with robust network security and data isolation. It also implements granular access controls and resource quotas for each tenant, simplifying compliance and maintaining audit trails for security and governance.

Key components for delivering GPU-as-a-Service

Red Hat uses powerful open source technologies to deliver its GPU-as-a-Service offering, primarily within Red Hat OpenShift and Red Hat OpenShift AI.

Kueue is an open source, intelligent workload scheduler for Kubernetes that prioritizes and preempts jobs, so critical workloads run first by preempting lower-priority jobs when necessary. It manages quotas for fair resource allocation across teams and optimizes resource usage to prevent bottlenecks and maximize efficiency. Kueue works with job dispatching, queuing, and scheduling.

The ability to partition GPUs is used to enable more efficient GPU sharing, dividing GPUs into smaller, virtual GPUs and dynamically allocating resources. This facilitates resource sharing, allowing multiple users to share a single physical GPU, improving resource usage and maximizing GPU utilization. 

The Red Hat solution also uses a range of open source technologies to help optimize the AI lifecycle—from training and fine-tuning to inference. For training and fine-tuning, the stack includes technologies like PyTorch, Ray, Kubeflow Trainer, and KubeRay. It uses CodeFlare for job dispatching, and Kueue for queuing and scheduling.

For inference, Red Hat AI uses vLLM for the memory-efficient serving of large language models and KServe for broader model serving. It also supports frameworks like PyTorch, Hugging Face TGI, and ONNX.

Additionally, Red Hat OpenShift AI offers robust accelerator and GPU management capabilities. This includes idle culling for workbenches and the ability to configure available GPU slices, helping optimize resource allocation. The platform also provides out-of-the-box images with the necessary libraries for accelerator support. It also offers observability tools to monitor individual user workload status, cluster-wide workload status, queues, and GPU usage.

Red Hat: Your partner in AI innovation

Red Hat, the world's leading provider of enterprise open source software solutions, can help you set up your GPU-as-a-Service system. By providing GPUs on-demand for AI workloads with a strong emphasis on security and privacy, Red Hat helps your data scientists, ML engineers, and AI engineers to focus on AI, not infrastructure.

Learn more about our AI solutions at Red Hat AI and talk to a Red Hatter today.

Resource

Get started with AI Inference: Red Hat AI experts explain

Discover how to build smarter, more efficient AI inference systems. Learn about quantization, sparsity, and advanced techniques like vLLM with Red Hat AI.

About the author

My entrepreneurial spirit led me to co-found an AI startup. This experience, combined with my work driving key go-to-market initiatives at Red Hat and building strategic partnerships, has shaped my ability to translate complex technologies into effective market strategies. I enjoy sharing these insights, whether speaking at UC Berkeley and Stanford or engaging with C-level executives. My background in AI research, including a collaboration between the Royal Institute of Technology and Stanford (with findings presented at SSDL 2017), continues to inform my passion for innovation.

UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Virtualization icon

Virtualization

The future of enterprise virtualization for your workloads on-premise or across clouds