We are excited to announce the addition of a new AI server to our infrastructure, enhancing our capabilities in generative AI applications.
Server Specifications
- Model: HP ProLiant DL380 Gen9
- GPUs: 2× NVIDIA RTX A5000
- Virtualization Platform: XCP-ng
Purpose and Applications
This server is dedicated to running advanced generative AI models, including:
- Stable Diffusion: For high-quality image generation.
- Flux: A cutting-edge text-to-image system that brings your ideas to life with just a prompt, revolutionizing visual content creation.
- Mochi1: A state-of-the-art AI video generation model capable of creating high-quality videos with superior motion and precise prompt adherence.
- LTX: A video generation model designed for creating short scenes based on text or starting images.
- F5-TTS: A text-to-speech system for natural language processing.
- Tabby: An AI code assistant for software development.
- Roop-Unleashed: A tool for deep learning-based face swapping.
Technical Highlights
- HP ProLiant DL380 Gen9: Renowned for its reliability and performance, this server offers the flexibility and efficiency required for demanding AI workloads.
- NVIDIA RTX A5000 GPUs: These GPUs provide exceptional processing power, enabling efficient training and inference of complex AI models.
- XCP-ng Virtualization: Utilizing XCP-ng allows for effective resource management and scalability, facilitating seamless deployment of virtual machines tailored to specific AI tasks.
Implementation
Integrating GPUs into a virtualized environment requires precise configuration:
- GPU Passthrough: Enabling GPU passthrough in XCP-ng involves configuring the system to allow virtual machines direct access to GPU resources, essential for performance-intensive AI applications.
- Driver Compatibility: Ensuring that the guest operating systems have the appropriate drivers installed is crucial for optimal GPU performance.
Future Expansion Plans
Looking ahead, we are planning to significantly enhance our computational capabilities by deploying four 4U rack servers, each equipped with nine NVIDIA H100 GPUs. The NVIDIA H100, based on the Hopper architecture, delivers exceptional performance for AI workloads, offering up to 30 teraflops of FP64 compute power and 80 GB of HBM3 memory with 3 TB/s bandwidth. This expansion will enable us to tackle more complex generative AI tasks, support larger models, and accelerate training and inference times, further advancing our AI research and application development.