Situation
Imagine a host with 2 CPU sockets and PCIe slots for each.
Each CPU socket is attached to its PCIe through PCIe Lanes. Accessing PCIe #1 using CPU2 will be sequenced as follows:
- The VM requests CPU#2‘s GPU#1 for the current clock cycle.
- CPU#2 will call CPU#1 to request access to GPU#1, as GPU#1 is attached to CPU#1’s socket through PCIe.
- Following our benchmarks, back-and-forth overhead is 3%.
*The estimated 3% were previous to the L1TF Intel vulnerability, which has been corrected with a counter measure that invalidates/flushes the L1 CPU Cache at the kernel level. Therefore, it might be a little less since both VM will have to rebuild the L1 cache even if the VM stays on the same CPU.
Multiple GPUs and VMs in a virtualization full life cycle
In the next section, we’ll look at 4 clock cycle sequences and detail the communication flow between GPU(s), the VM they’re attached to, and the CPU and RAM managed by the hypervisor to execute these VM.
Setup
We’ll start with 3 VMs on the same physical host. Physical host:
- 2 processors (2 CPU sockets)
- 4 RAM components
- 4 GPUs (2 attached to each CPU socket)
Scenario starts the clock cycle with to. Every cycle lasts t (delta t). Therefore:
- to is the initial setup
- to + t the system state after 1 CPU cycles
- to + 2t the system state after 2 CPU cycles
- to + 3ẟt the system state after 3 CPU cycles
Let’s examine the complete sequence and communication flows.
Initial communication
VM#1 will have GPU#4 and RAM#2 on the first cycle.
Two-way communication
At the first cycle, VM#1 will have access to GPU#4 and RAM#2 through CPU#1.
Third-cycle communication: speculative execution.
During the third cycle, VM#1 will still have RAM#2 and CPU#2 ascribed, and CPU#1 will request access.
Execution speculation
Depending on your execution strategy, speculative execution may happen during every cycle (not since L1TF Intel vulnerability for Intel CPUs).
CPU anticipates execution plan (called execution pipeline) and stores future potential outcomes in L1 cache (the cache of the CPU it self).
This anticipation of computation (compute ahead) is accomplished while the CPU is at rest, passing execution results to the next stage. This means the computational branch calculated ahead may not be significant if the preceding stage’s results don’t need to be evaluated based on the algorithm (essentially if the branch’s condition – like if in your code – is not met).
This speculative execution is relevant if the 3 conditions are met:
- Speculative execution is enabled (which is not the case this L1TF intel vulnerability : spectre)
- The next cycle’s CPU instruction must be arranged on the prior CPU.
- The condition of the branch must be satisfied.
Speculative execution may have streamlined this 3rd cycle because VM#1 is operating on the same CPU (CPU#1) as the previous cycle (2nd cycle)
Cycle four
In this cycle, VM#1 is scheduled back to CPU#2 without an overhead because associated hardware is directly tied to the CPU socket.
CPU-pinning
To avoid the back-and-forth, CPU pinning was adopted.
OpenStack
OpenStack can handle NUMA-Node processing and teach libvirt to statically pin vCPUs to a physical CPU so they no longer “move around”
VMware
This feature is on HostRooster Private Cloud and managed VMware on Bare Metal.
GPU virtualization: performance impacts
GPU virtualization allows splitting a GPU over many VMs. Cost is noteworthy when comparing Nvidia V100s with 32GB of VRAM to V100s with 16GB (check this blog post for further details).
We decided against this because…
- As shown in the graphic below, separating GPU#4 could overwhelm CPU#2.
- The second reason is that computing power will be shared (cuda core for Nvidia), reducing GPU performance per VM.
Conclusion
In this article, we explained how virtualization works, why compute cycle orchestration matters, and why we chose PCIe passthrough mode for our VM GPU solutions.
Welcome to the world of DomainRooster, where roosters (and hens) rule the roost! We're a one-stop shop for all your entrepreneurial needs, bringing together domain names and website hosting, and all the tools you need to bring your ideas to life. With our help, you'll soar to new heights and hatch great success. Think of us as your trusty sidekick, always there to lend a wing and help you navigate the sometimes-complex world of domain names and web hosting. Our team of roosters are experts in their fields and are always on hand to answer any questions and provide guidance. So why wait? Sign up today and join the ranks of the world's greatest entrepreneurs. With DomainRooster, the sky's the limit! And remember, as the saying goes, "Successful people do what unsuccessful people are not willing to do." So don't be afraid to take that leap of faith - DomainRooster is here to help you reach for the stars. Caw on!