Building a Dynamic GPU VM Allocation System

Inspired by a recent posting at Nuro where they were looking to build a dynamic GPU VM allocation system, I decided to create a proof-of-concept to get my hands on Kubernetes by writing a custom operator and controller. There are some interesting challenges when it comes to VMs on Kubernetes, and below is the work that I’ve done thus far. It assumes integration into Nuro’s infrastructure based on their ML Scheduler blog post, but could be generalized to any batch job scheduler as well. ...

November 9, 2024 · 7 min · 1349 words · Pranav Sundararajan