Nvidia partners with Run:ai and Weights & Biases for MLops Stack

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!


Running a full machine learning workflow lifecycle can often be a complicated operation, involving multiple disconnected components.

Users need to have machine learning optimized hardware, the ability to orchestrate workloads across that hardware, and then also have some form of machine learning operations (MLops) technology to manage the models. In a bid to help make it easier for data scientists, artificial intelligence (AI) compute orchestration vendor Run:ai, which raised $75 million in March, as well as MLops platform vendor Weights & Biases (W&B), are partnering with Nvidia.

“With this three-way partnership, data scientists can use Weights & Biases to plan and execute their models,”  Omri Geller, CEO and cofounder of Run:AI told VentureBeat. “On top of that, Run:ai orchestrates all the workloads in an efficient way on the GPU resources of Nvidia, so you get the full solution from the hardware to the data scientist.”

Run:ai is designed to help organizations use Nvidia hardware for machine learning workloads in cloud-native environments – a deployment approach that uses of containers and microservices managed by the Kubernetes container orchestration platform.

Among the most common ways for organizations to run machine learning on Kubernetes is with the Kubeflow open-source project. Run:ai has an integration with Kubeflow that can help users to optimize Nvidia GPU usage for machine learning, Geller explained.

Omri added that Run:ai has been engineered as a plug-in for Kubernetes that enables the virtualization of Nvidia GPU resources. By virtualizing the GPU, the resources can be fractioned so multiple containers can access the same GPU. Run:ai also enables management of virtual GPU instance quotas to help ensure that workloads always get access to the required resources.

Geller said that the partnership’s goal is to make a full machine learning operations workflow more consumable for enterprise users. To that end, Run:ai and Weights & Biases are building an integration to help make it easier to run the two technologies together. Omri said that prior to the partnership, organizations that wanted to use Run:ai and Weights & Biases had to go through a manual process to get the two technologies working together.

Seann Gardiner, vice president of business development at  Weights & Biases, commented that the partnership allows users to take advantage of the training automation provided by Weights & Biases with the GPU resources orchestrated by Run:ai.

Nvidia is not monogamous and partners with everyone

Nvidia is partnering with both Run:ai and Weights & Biases, as part of the company’s larger strategy of partnering within the machine learning ecosystem of vendors and technologies.

“Our strategy is to partner fairly and evenly with the overarching goal of making sure that AI becomes ubiquitous,” Scott McClellan, senior director of product management at Nvidia, told VentureBeat.  

McClellan said that the partnership with Run:ai and Weights & Biases is particularly interesting as, in his view, the two vendors provide complementary technologies. Both vendors can now also plug into the Nvidia AI Enterprise platform, which provides software and tools to help make AI usable for enterprises.

With the three vendors working together, McClellan said that if a data scientist is trying to use Nvidia’s AI enterprise containers, they don’t have to figure out how to do their own orchestration deployment frameworks or their own scheduling. 

“These two partners kind of complete our stack –or we complete theirs and we complete each other’s – so the whole is greater than the sum of the parts,” he said.

Avoiding the “Bermuda Triangle” of MLops

For Nvidia, partnering with vendors like Run:ai and Weights & Biases is all about helping to solve a key challenge that many enterprises face when first embarking on an AI project.

“The point in time when a data science or AI project tries to go from experimentation into production, that is sometimes a little bit like the Bermuda Triangle where a lot of projects die,” McClellan said. “I mean, they just disappear in the Bermuda Triangle of — how do I get this thing into production?”

With the use of Kubernetes and cloud-native technologies, which are commonly used by enterprises today, McClellan is hopeful that it is now easier than it has been in the past to develop and operationalize machine learning workflows.

“MLops is devops for ML — it’s literally how do these things not die when they move into production, and go on to live a full and healthy life,” McClellan said.

Leave a Comment