Managing GPUs is Tough

Hello, everyone. I'm someone who works hard every day. Recently, I've been quite forgetful, so I'm writing this down as a reminder.

Behind the software that everyone uses is hardware. With the recent popularity of cloud computing, the hardships involved in hardware are often forgotten. However, if you look at the world's market capitalization rankings, companies that invest properly in equipment—such as server data centers, logistics warehouses, smartphone devices, and GPU hardware—are at the top, not just those solely focused on software.

Our company has been purchasing hardware, specifically GPU machines, for a few years now, and it's still quite a challenge. This involves various aspects such as matching machine specs with business needs, and even after several years, it's still quite a task.

Personally, I got used to building machines during my previous job in CG. Many people face the dilemma of whether to use self-built GPU machines or store them in server racks.

For personal use, you buy a case, motherboard, CPU, fan, memory, SSD, GPU, and put it all together. While Windows might be convenient for everyday use, we use Linux for work, which makes management relatively easier.

When using a personal machine, you need to consider the balance between price and specs, as well as the size, heat, and noise that it will produce in your room.

For business use, if you have to place the machine in an office, you can think of it as an extension of personal use. If you can properly prepare a room for business use, noise and heat become separate issues. Additionally, you typically don't need to connect monitors or keyboards, which makes it easier.

For individual or simple corporate use, GPUs are usually connected via PCIe slots. Performance is influenced by factors such as CPU, SSD, memory capacity, and PCIe transfer speed. If you're working with large language models (LLMs), the speed and VRAM capacity of the GPU are crucial. Consumer products with 12GB or 24GB of VRAM will likely be significant decision points.

When operating such machines as servers or workstations, IPMI (Intelligent Platform Management Interface) allows you to manage power on/off and OS installation via a small machine attached to the motherboard.

As servers, these machines don't need screens, which makes them lighter. Services are provided via SSH for setup, and ports serve as communication windows with the outside world. You manage these without needing keyboards or monitors by connecting via SSH. During setup, you only need to connect a keyboard and monitor, and a table with casters can be useful.

For services using GPUs, the number of external ports is also important. For example, on a single motherboard, you can insert two GPUs. If you want to use multiple GPUs on the same motherboard, you can manage with one port. However, if you want to provide services using VRAM separately, you can open separate ports for each GPU. While you can manage with one port for two GPUs, it becomes troublesome if one fails, so it's simpler to run separate programs.

In this case, you have one SSH port and two for GPUs.

Depending on the service, you might not use much CPU, memory, or storage. In such cases, placing two GPUs on one system instead of one can save on SSH ports and electricity costs. Designing according to usage load requires various trials.

Finding consumer products with 4 or 8 PCIe slots can be challenging. While running costs can be saved, initial costs might spike, so careful design is necessary. In consumer and workstation setups, experimenting freely to find an environment that suits you is essential, though it requires patience and money.

Network management also involves allocating ports per system, providing services, and maintenance. External SSH ports need to be remapped, adding to the workload and security concerns, so setting up a login server as a stepping stone might be advisable.

Considering the network, the optimal GPU machine configuration changes. As the number of machines increases, it becomes confusing. Thus, while you can change configurations with PCIe, balancing pre-processing and post-processing with CPU, memory, GPU load, and network structure can optimize management effort and costs.

Moving beyond consumer and workstation setups, you enter the realm of business servers. Servers come as barebones (pre-assembled parts) or complete units. Server rooms use racks with standardized sizes. Due to cost considerations, higher density allows more efficient operation.

Even with self-built PCs, barebones systems are available. Unlike consumer motherboards, they have strict airflow management and height constraints. As a rule, they intake air from the front and exhaust it from the back. Consumer heat sinks typically come with fans, but in barebones systems, separate fans are used to strictly manage front-to-back airflow.

For barebones systems, GPUs are usually PCIe. Due to height constraints, they use riser cards (extension cables) to lay GPUs flat or change their configuration. While the basic structure is similar to consumer systems, differences in airflow management slightly alter the design. Knowledge exchange is necessary when moving from consumer systems to barebones.

Consumer machines are not designed for the constant high-density usage of servers, considering power consumption and stability. Servers, designed for stable, fault-free operation, have lower power consumption and more restrictions on specs, making them expensive. Moving from consumer setups to business setups requires getting used to configurations while deciding on specs.

Once configurations are decided, complete servers with integrated GPUs can be used. With significant procurement costs and running costs, it's best to use them once specs and purposes are determined. GPU servers consume a lot of power, making it difficult to secure power for machines with eight GPUs. Additionally, they generate a lot of heat, requiring large heat sinks, resulting in 4U or 6U large enclosures. The weight is substantial, and care must be taken when mounting them on server racks to avoid injury.

Each rack has a power limit, so common server photos often show only about two 4U or 6U enclosures per rack. Cooling also has its limits, so power consumption per unit is often a consideration.

Recent water cooling systems use cold plates instead of heat sinks, transferring heat to a coolant, reducing height constraints. This increases machine density per rack, but also increases unit weight, necessitating a review of the data center's floor load design.

Enclosures with integrated GPUs typically connect GPUs via high-speed interfaces internally. They don't handle graphics as traditionally defined, so there are no external output terminals, with all calculations completed internally. You can either connect multiple GPUs internally via high-speed interfaces or manage eight GPUs separately as described earlier. Additionally, recent developments allow further splitting of individual GPUs for use, broadening configuration options.

Managing GPUs involves not just placing them, but also the challenge of cost-effective operation as a service. This is an evolving field with a lot of necessary know-how, so we have to keep working hard every day.

That's all.

Managing GPUs is Tough

Yuichiro Minato