Artwork for podcast Data Driven
Ronen Dar on GPU Orchestration for Building ML Models
Episode 265th February 2024 • Data Driven • Data Driven
00:00:00 00:44:59

Share Episode

Shownotes

In this episode, our Andy Leonard and Frank La Vigne sit down with Ronen Dar, the co-founder and CTO of Run AI, to explore the world of artificial intelligence and GPU orchestration for machine learning models.

Ronen shares insights into the challenges of utilizing GPUs in AI research and how Run AI's platform addresses these issues by optimizing GPU usage and providing tools for easier and faster model training and deployment. The conversation delves into the concept of fractional GPU usage, allowing multiple workloads to run on a single GPU, making expensive GPUs more accessible and cost-effective for organizations.

Links

Show Notes

04:40 GPU technology enabled for cloud AI workloads.

07:00 RunAI enables sharing expensive GPU resources for all.

11:59 As enterprise AI matures, organizations become more savvy.

15:35 Deep learning, GPUs for speed, CPUs backup.

16:54 LLMs running on GPU's, exploding in market.

23:29 NVIDIA created CUDA to simplify GPU use.

26:21 NVIDIA's success lies in accessible technology.

28:25 Solve GPU hugging with quotas and sharing.

31:15 Team lead manages GPU quotas for researchers.

35:51 Rapid changes in business and innovation.

40:34 Passionate problem-solver with diverse tech background.

43:38 Thanks for tuning in, subscribe and review.

Transcripts

Speaker:

Greetings, listeners. Welcome back to the Data

Speaker:

Driven Podcast. I'm Bailey, your AI host with

Speaker:

the most data, that is, bringing you insights from the ether

Speaker:

with my signature wit. In today's episode, we're

Speaker:

diving deep into the heart of artificial intelligence's engine room,

Speaker:

GPU orchestration. It's the unsung hero

Speaker:

of AI research, optimizing the raw power needed to fuel

Speaker:

today's most advanced machine learning models. And

Speaker:

who better to guide us through this labyrinth of computational complexity than

Speaker:

Ronan Darr, the cofounder and CTO of Run AI, the

Speaker:

company that's making GPU resources work smarter, not

Speaker:

harder. Now onto the show.

Speaker:

Hello, and welcome to Data Driven, the podcast where we For the emergent fields

Speaker:

of artificial intelligence, data engineering, and overall data

Speaker:

science and analytics. With me as always is my favoritest

Speaker:

Data engineer in the world, Andy Leonard. How's it going, Andy? It's

Speaker:

going well, Frank. How are you? I'm doing great. I'm doing great. It's been,

Speaker:

we're We're recording this February 1, 2024. And as I said to my

Speaker:

kids yesterday, January has been a long year.

Speaker:

We're only, like, 1 month into the year, and it was it was a pretty

Speaker:

wild ride. But I can tell we're gonna have a blast today,

Speaker:

because we're gonna geek out on something that I kinda sort of understand,

Speaker:

but not entirely, and it's GPUs. And in the virtual green room, were chit

Speaker:

chatting with some folks, and, but let me do the formal introduction

Speaker:

here. Today with us, we have doctor Ronadhar, cofounder and CTO

Speaker:

of Run AI, A company at the forefront of GPU

Speaker:

orchestration, and he has a distinguished career in technology.

Speaker:

His experience includes significant roles at Apple. Yes, That

Speaker:

apple. Bell Labs. Yes. That Bell Labs.

Speaker:

And at Run AI, Ronan is instrumental in optimizing

Speaker:

GPU usage For AI model training and deployment,

Speaker:

leveraging his deep passion for both academia and startups.

Speaker:

And, Run AI is a key player in the, and he is a he

Speaker:

and Run AI are key player in the AI revolution. Ronan's

Speaker:

contribute Contributions are pivotable in shaping and powering the

Speaker:

future of artificial intelligence. Now I will add that in

Speaker:

my day job at Red Hat, Run AI has come up a couple of times.

Speaker:

So this is definitely, definitely

Speaker:

an honor to have you on on on the show, sir. Welcome.

Speaker:

Thank you, Frank. Thank you for inviting me. Hey, Andy. Good to

Speaker:

be here. I love it. Love Reddit. We're a big

Speaker:

fan of Reddit. We're working closely with many people in

Speaker:

Reddit, and love that. Right? Love OpenShift,

Speaker:

love Reddit, love Linux. Yeah. Cool. Cool.

Speaker:

Yeah. So so for those who don't know exactly, I kinda know

Speaker:

what, your Run AI does, but can you explain exactly

Speaker:

What it is run AI does and why GPU

Speaker:

orchestration is important. Yes.

Speaker:

Okay.

Speaker:

So run AI is, software,

Speaker:

AI infrastructure platform. So we

Speaker:

help machine learning teams to get much more

Speaker:

out of their GPUs, And we provide

Speaker:

those teams with abstraction layers and tools

Speaker:

so they can train models And deploy models

Speaker:

much easier, much faster. And

Speaker:

so We started in 2018, 6 years

Speaker:

ago. It's me and my cofounder, Omuri. Omuri is the CEO.

Speaker:

He's, he's amazing. I love him. We We know each other for many

Speaker:

years. We we met in the academia, like, more than 10 years ago,

Speaker:

and and we started running AI together, and We started

Speaker:

running AI because we saw that there are big challenges

Speaker:

around, GPU's, around orchestrating

Speaker:

GPU's and utilizing GPU's. We saw back then

Speaker:

in 2018, the GPUs are going to be very very important.

Speaker:

It's like the basic a a component in

Speaker:

that any AI company need to train models,

Speaker:

right, and deploy models. So we saw that GPUs are going to be critical, but

Speaker:

there are also a lot of challenges with, with utilizing GPUs.

Speaker:

I think back then, GPUs were relatively new In

Speaker:

the data center, in in the cloud.

Speaker:

GPU's were very known in the gaming

Speaker:

industry. Right? We spoke before on gaming. Right? Like, a lot of

Speaker:

key things there that GPU's has has has been enabled

Speaker:

enabling, But in the data center, they were relatively new and the

Speaker:

entire software stack that is that

Speaker:

is running the Cloud in data center As was built for

Speaker:

traditional microservices applications that are running

Speaker:

on commodity CPUs And AI workloads are different, they are

Speaker:

much more compute intensive, they they

Speaker:

run on on GPUs, maybe on multiple nodes of Meet to point

Speaker:

machines of GPU's, and GPU's are also very different.

Speaker:

Right? They are expensive, very scarce in the data center.

Speaker:

So The entire software stack was a bit for something else

Speaker:

and when it comes to GPUs, it was really hard for many people to to

Speaker:

actually manage those GPUs. So we came in And, and we

Speaker:

saw those gaps. We've built run AI on top of

Speaker:

cloud native technologies like Kubernetes and containers. We're

Speaker:

big fans of Of those, technologies, and

Speaker:

we added components around scheduling, around

Speaker:

the GPU fractioning. So we enable

Speaker:

multiple workloads to run on a on a single GPU and

Speaker:

essentially all the provision GPU's. So we build this Engine which we

Speaker:

call cluster engine that runs in in in GPU

Speaker:

clusters. Right? We help machine learning teach to pull all of their GPU's into

Speaker:

1 cluster, Running that engine, and that engine provides a lot of

Speaker:

performance and lot of capabilities from those GPUs. And

Speaker:

on top of that, we built this control plane And

Speaker:

and tools and for machine learning,

Speaker:

teams to run the Jupyter Notebooks, to run

Speaker:

training jobs, batch jobs to deploy their models, right, to just to to

Speaker:

have tools for the entire life cycle of AI

Speaker:

from Training models in the lab to taking those models into

Speaker:

production and running them and serving actual users.

Speaker:

And That's the platform that we've built, and we're working with machine

Speaker:

learning teams across the globe and on just managing,

Speaker:

orchestrating, and letting them Get much more out of their GPUs and essentially

Speaker:

run faster, train more than faster and in much easier way and

Speaker:

deploy those modules In a much easier and faster and more efficient

Speaker:

way. Yeah. The thing that blew me away when I first heard of Run

Speaker:

AI, and this would have been, 2021

Speaker:

ish. No. 20 early

Speaker:

2021, I would say, And, it was the

Speaker:

idea of fractional GPU's. Right? So you can have 1,

Speaker:

I say 1, but, know, it's realistically, it's gonna be on, but you you can

Speaker:

kind of share it out, which I think and we were talking in the virtual

Speaker:

green room about how, you know, some of these GPU's,

Speaker:

If you can get them because there's a multi month, sometimes multi

Speaker:

year supply chain issue. I mean, these things are expensive bits of

Speaker:

hardware, and I think the real value, correct

Speaker:

me if I'm wrong, is, like, well, you know, if you I was talking to

Speaker:

somebody the other day, and and we're basically talking about how we can,

Speaker:

you know, if you get if you get, like, 1 laptop with a killer

Speaker:

GPU, right, that GPU is really only useful to that 1

Speaker:

user, Whereas if you can kind of put it in a in a in a

Speaker:

server and use something like RunAI, now everybody in the organization can do

Speaker:

that. And these are not trivial expenses. I mean, these are like, You know,

Speaker:

you sell a kidney type of costs here.

Speaker:

Yeah. Absolutely. So Absolutely. First of all, GPUs

Speaker:

are expensive. They cost a lot. Right?

Speaker:

And we provide, Technologies like fractional GPUs and

Speaker:

other technologies around scheduling that allows

Speaker:

teams to share GPUs. Right. So we used book on

Speaker:

GPU fractioning. So that's 1 one day of

Speaker:

sharing where you have 1 GPU, which is really expensive.

Speaker:

And Not all of the workloads are

Speaker:

AI workloads are really compute intensive and require the

Speaker:

entire GPU or, you know, maybe multiple GPUs. There are

Speaker:

workloads like Jupyter Notebooks where you have

Speaker:

researchers that just

Speaker:

Debugging their code or cleaning their data or doing some simple stuff,

Speaker:

and they need just fractions of GPUs.

Speaker:

In that case, if you have, a lot of data scientists,

Speaker:

maybe you wanna host all of their notebooks On

Speaker:

a much smaller number of GPUs because, right, each

Speaker:

one of them, it's just fractions of GPUs. Another big use case

Speaker:

for fractions Of GPUs is inference.

Speaker:

So now all of the models are huge

Speaker:

and And doesn't fit into, the memory of 1

Speaker:

GPU, and in computer vision,

Speaker:

there are a lot of Models that are relatively small,

Speaker:

they run on GPU, and you can essentially host multiple of

Speaker:

them on the same GPU. Right. So you can have instead of

Speaker:

just 1 computer vision model running on GPU, host 10

Speaker:

of those models on the same GPU and get Factors of

Speaker:

10 x in, in your cost, in your,

Speaker:

overall throughput of, of inference. So that's That's one

Speaker:

use case for fractional GPU, and we're investing heavily just

Speaker:

building that technology. Another layer

Speaker:

of sharing GPUs Comes where you

Speaker:

have maybe in your organization multiple teams

Speaker:

or multiple projects running in parallel. So

Speaker:

for example, may open AI, they now are working

Speaker:

on gpt5. It's 1 project. That project needs a

Speaker:

lot of GPUs And they have more projects. Right?

Speaker:

More research project around alignment or around,

Speaker:

reinforcement learning. You know? DALL

Speaker:

E. Like, they they they have more than just 1 project. Then DALL E and

Speaker:

they have multiple models. Right? Exactly. They have. Right? So each

Speaker:

project needs Needs GPUs. Right? Needs a lot of

Speaker:

GPUs. So if you can instead of

Speaker:

allocating GPUs Entirely for each project,

Speaker:

you could essentially pull all of those GPU's and share

Speaker:

them between the those different projects, different teams,

Speaker:

And in times where 1 project is idle and not

Speaker:

using their GPUs, other projects, other teams can share

Speaker:

can get access to those GPUs. Now orchestrating all of

Speaker:

that, orchestrating that sharing of resources between

Speaker:

projects, between teams can be really complex And

Speaker:

requires this advanced scheduling, which

Speaker:

which we're bringing into the game. We're bringing

Speaker:

those scheduling capabilities from the high performance computing world

Speaker:

known on those schedulers. And so we're bringing Capabilities

Speaker:

from that world into the cloud native Kubernetes

Speaker:

world. Scheduling around batch batch scheduling

Speaker:

fairness, Algorithms, things like that, so teams and projects

Speaker:

can just share GPUs in a simple and efficient

Speaker:

way. So those

Speaker:

are the 2 layers of sharing GPU's. Interesting. And and

Speaker:

I think that I think as As this field matures

Speaker:

and it matures in the enterprise, I think you're gonna see organizations

Speaker:

kind of be more,

Speaker:

more more more I think savvy about, like, okay, like you said, like, data scientists,

Speaker:

if they're just doing, like, you know, Traditional statistical modeling really doesn't benefit

Speaker:

from GPUs, or they're just doing data cleansing, data engineering.

Speaker:

Right? They're probably gonna say, like, well, Let's run it on this cluster, and

Speaker:

then we'll break it apart into discrete parts where, you

Speaker:

know, then we will need a GPU. And I also like the idea that, you

Speaker:

know, you're you're basically doing What what I learned in college,

Speaker:

which was time slicing. Right? Sounds like this is kind of, like, everything old is

Speaker:

new again. Right? I mean, this is, Obviously, you know, when you're when you're

Speaker:

taking kind of that old mainframe concept and applying it to something like Kubernetes,

Speaker:

orchestration is gonna be a big deal, because these are not systems that were Not

Speaker:

built from the ground up to have time slicing. Is that a is that a

Speaker:

good kind of explanation? Yeah. Absolutely.

Speaker:

Absolutely. I like I like that analogy. Yeah. Exactly. Time

Speaker:

slicing it's, it's 1 so

Speaker:

1 implementation, Yeah. And that we

Speaker:

enable around fractionalizing GPU's,

Speaker:

and I agree when you have resources, It

Speaker:

can be different kind of resources. Right? It can be CPU

Speaker:

resources and networking were also,

Speaker:

You know, as people created that technology to share the

Speaker:

networking and communication going through those networking, but just the

Speaker:

bandwidth of the networking. We're doing it

Speaker:

for GPU's. Right. Sharing those

Speaker:

resources. And I think now it interestingly,

Speaker:

LLMs I also becoming a kind

Speaker:

of, resources as well, right, that people need access

Speaker:

to. Right? You have those models, you have GPT, JGPT.

Speaker:

A lot of people are trying to get access to

Speaker:

that resource, essentially. And I think it's interesting,

Speaker:

because you kinda pointed this out, but it it it's something that I think that

Speaker:

if you're in the gen AI space, you kinda don't it's so it's obvious

Speaker:

like error. You don't think about it. Right? But when when you

Speaker:

get inference on traditional, I somebody once referred to it

Speaker:

as legacy AI. Right. But where

Speaker:

the infrared side of the equation, you don't really need a lot of compute power.

Speaker:

Right? Like, it's not really a heavy lift. Right? But with generative

Speaker:

AI, you do need a lot of compute on

Speaker:

I I guess it's not really inference, but on the other side of the use

Speaker:

while it's actually in use, not just the training. Right. So traditionally,

Speaker:

GPU heavy use in training, and then inference, not so

Speaker:

much. Now we need heavy use before, after, and during,

Speaker:

which I imagine your technology would help because, I mean, look, I love chat I

Speaker:

love chat g p t. I'm one of the 1st people to sign up for

Speaker:

a subscription, But even, you know, they had trouble keeping

Speaker:

up, and they have a lot of money, a lot of power, a lot of

Speaker:

influence. So I mean, this is something that if you're just a

Speaker:

regular old enterprise, this is probably something they struggle

Speaker:

with. Right? Right. Yeah. I absolutely

Speaker:

agree. It's like amazing point, Frank.

Speaker:

So 1 year

Speaker:

ago, the inference use case on

Speaker:

GPU's. Wasn't that big. Totally agree. That's also what we

Speaker:

saw in the market.

Speaker:

Deep learning Convolution neural networks were

Speaker:

running on GPUs,

Speaker:

mostly for computer vision applications,

Speaker:

But they could also run on CPUs and you could get,

Speaker:

like, relatively okay performance.

Speaker:

If you needed maybe, like, a very low latency, then

Speaker:

you might use GPUs because they're much faster and you get much

Speaker:

lower latency. But

Speaker:

it was, it was all, and it's still very

Speaker:

difficult to deploy more than it's on GPU's Compared to just deploying

Speaker:

those models on CPUs, because deploying more than deploying applications on

Speaker:

CPUs, you know, people are doing for so many years.

Speaker:

So

Speaker:

many times it was much easier for people to just deploy their

Speaker:

models on CPU's And not on GPUs, so that was, like, the

Speaker:

fallback to CPUs. But

Speaker:

then came, and as you said, chair GPT was introduced, A

Speaker:

little bit more than a year ago, and that generative

Speaker:

AI use case just blown. It was blown. Right? And it's

Speaker:

it's inference essentially. And those models are

Speaker:

so big that they can't really run on

Speaker:

CPU. They, they LLMs are running in production on

Speaker:

GPU's and now the inference use case on

Speaker:

GPU's is just exploding In the market

Speaker:

right now, it's really big. Is a lot of demand for

Speaker:

GPU's for inference And

Speaker:

if for open AI, they need to support this

Speaker:

huge scale that I guess, just

Speaker:

Just them are seeing such scale, maybe a little, a

Speaker:

few more companies, but that's like huge, huge scale.

Speaker:

But I think that we will see more and more companies

Speaker:

building products based on AI, on

Speaker:

LLMs, And we'll see more and more

Speaker:

applications using AI, which

Speaker:

then that AI runs on on GPU. So That is going to go

Speaker:

and that's the that's an amazing new market for us around

Speaker:

AI and for me as a CTO, it was so fun to

Speaker:

Get into that market because it now comes with

Speaker:

new problems, new challenges,

Speaker:

new use cases Compared to deep learning

Speaker:

on on GPS. New new pains because

Speaker:

the models are so big. Right? Right. And

Speaker:

challenges around cold start problems, about auto scaling,

Speaker:

about, About

Speaker:

just, giving access to LLMs. So a lot of

Speaker:

challenges, new challenges there. We at Tron AI will studying those problems

Speaker:

and we're Now building solutions for those problems,

Speaker:

and I'm really, really excited about the Inference use case. That

Speaker:

is very cool. So just, going back a little bit.

Speaker:

I was trying to keep up. I promise. But Run AI is

Speaker:

I I get Run AI Run AI's platform

Speaker:

Support fractional, GPU usage.

Speaker:

It it also sounds to me, maybe I misunderstood,

Speaker:

That in order to achieve that, you first had to or

Speaker:

or maybe along with that, you made it possible to use multiple

Speaker:

GPUs. You've you've created Something like

Speaker:

an API that allows, companies

Speaker:

to take advantage of multiple GPUs or fractions of

Speaker:

GPUs. Did I Did I miss that? No, that's

Speaker:

right. That's right, Andy. And Okay.

Speaker:

So we've built this, way of,

Speaker:

For people to scale their workloads from fractions

Speaker:

of GPUs to multiple GPUs within 1 machine,

Speaker:

Okay. To multiple, machines. Right? You

Speaker:

have big workloads running on on multiple nodes

Speaker:

of GPUs. So Think about it when you have

Speaker:

multiple users each running their own

Speaker:

workload. Some are running on fractions of GPUs. Some are

Speaker:

running batch jobs on on a lot of

Speaker:

GPUs. Some Deploying models and running them on

Speaker:

in inference, and some just launching their Jupyter

Speaker:

Notebooks. All of that is happening on the same

Speaker:

pool of GPU's, same cluster. So you need

Speaker:

this lay of orchestration of scheduling just to

Speaker:

Manage everything and make sure that everything getting there

Speaker:

right, access the right, and and

Speaker:

and g p u's And everything is scheduled according to

Speaker:

priorities. Yeah. Well, being just, you know, a

Speaker:

mere data engineer, Here talking about all of that

Speaker:

analytics workload. That that sounds very

Speaker:

complex. So and as you

Speaker:

mentioned earlier, you know, you were talking about how traditional coding

Speaker:

is targeting CPUs, and that's my background.

Speaker:

You know, I've written applications and and done data work targeted for

Speaker:

traditional work. I can't imagine, just how complex

Speaker:

that is, because GPUs came into AI

Speaker:

as a unique solution,

Speaker:

designed to solve problems That they weren't really built

Speaker:

for. You know, GPUs were built for graphics, and you didn't

Speaker:

manage that. But the fact that They have to be

Speaker:

so parallel, internally. I think just added this

Speaker:

dimension to it. And I don't know who came up

Speaker:

with that idea, you know, who thought of, well, goodness, we could we could

Speaker:

use all of this, you know, massive parallel processing to To

Speaker:

to run these other class of problems. So pretty

Speaker:

cool pretty cool idea, but I just I yeah. I'm amazed at even

Speaker:

cooler than that. Because Yeah. Yeah. A wise man once told me,

Speaker:

he goes, GPU's are really good at solving linear

Speaker:

algebra problems, And if you're clever enough, you can

Speaker:

turn anything into a linear algebra problem.

Speaker:

And even simulating quantum computers when I was kind of, like, going through that,

Speaker:

I was like Mhmm. You know, like, gee, looks like looks like this

Speaker:

will be useful there too. Right? Like so it's an it's an interesting,

Speaker:

It's an interesting thing. So, like, you know, everyone is, you know,

Speaker:

everyone's talking about how this is, you know, we're in the hype cycle, but I

Speaker:

think if you're in the GPU space, you have Pretty good run because one,

Speaker:

these things are gonna these things are gonna be important. Right? Whether or not, you

Speaker:

know, hype cycle will will kinda crash, and how what that'll look like.

Speaker:

Think they're gonna be important anyway. Right? Because they're gonna be just the cost of

Speaker:

doing business, table stakes, as the cool kids like to say. But

Speaker:

also, over the next horizon, Simulating quantum

Speaker:

computers is going to be the next big hype cycle.

Speaker:

Right? Or one of them. Right? So like it's

Speaker:

it's it's a It's a foundational technology. I think that we

Speaker:

didn't think would be a foundational technology even like 6 7 years

Speaker:

ago. Right? Yeah.

Speaker:

I go with a few things that you said.

Speaker:

Regarding the Parallel computation, right? And just running

Speaker:

linear algebra calculations on GPU's

Speaker:

and accelerating such workloads.

Speaker:

In Nvidia, I love Nvidia, Nvidia

Speaker:

has this big vision, and they had big

Speaker:

vision Around GPU's already in 26 when

Speaker:

they built CUDA. Yep. Right. So

Speaker:

They've been good at just for that. Right? The GPU's were

Speaker:

used for graphics processing, For gaming.

Speaker:

Right? Great use case. Great market.

Speaker:

But they had this vision of bringing more

Speaker:

Applications to GPU is just accelerating more applications

Speaker:

and mainly applications with a lot of Linear

Speaker:

algebra calculations. And they

Speaker:

created that, they created CUDA

Speaker:

To simplify that. Right? To allow more

Speaker:

developers to use GPUs because just using GPUs

Speaker:

directly, that's so complex. That's so hub.

Speaker:

So we've built CUDA to bring more developers, to bring more

Speaker:

applications and they started in 20

Speaker:

2006, but think about the

Speaker:

big breakthrough in AI, it happened just in

Speaker:

2012, 2013 with

Speaker:

AlexNet and the Toronto researchers

Speaker:

who used G2 GPU's actually, because they

Speaker:

trained Alex Net on 2 GPU's and they had

Speaker:

CUDA, so for them it was feasible To train their

Speaker:

model on a GPU. And that was the new thing that they did.

Speaker:

They were able to Train much bigger model with

Speaker:

more parameters than ever before because they use

Speaker:

GPU's because the training Process ran much

Speaker:

faster. And,

Speaker:

and, and that triggered the entire

Speaker:

revolution, the Die hyper on the AI that we're seeing now. So

Speaker:

from 26, when Nvidia started to build CUDA until

Speaker:

2013, right, 7 years, Then we started to see

Speaker:

those big breakthrough. And in the last decade,

Speaker:

it's just exploding, and we're Seeing more and more applications.

Speaker:

The entire AI ecosystem is running on on an

Speaker:

on GPUs. So that's amazing to see. It's impressive.

Speaker:

And, like, People don't realize, like, the the revolution we're seeing today

Speaker:

really started in 2006, like you said. I didn't even put the 2 and 2

Speaker:

together until I was listening to a podcast. I think it's called Acquired,

Speaker:

And really good podcast. Right? Like, I they don't pay me to say that or

Speaker:

whatever, but they did a 3 hour deep dive on the history of

Speaker:

NVIDIA. 3 hours. I couldn't stop listening.

Speaker:

Right? Like Nice. You know Yeah. We tried a long form, like, multi hour

Speaker:

podcast. We Weren't that entertaining, apparently. But the way they

Speaker:

go through the history of this where it was basically Jensen Huang. Hopefully, I said

Speaker:

his name right. He was, like, we wanna be a player, not just in gaming,

Speaker:

but also in scientific computing. This is 2005, 2006,

Speaker:

which at the time seemed kind of, like, Little out there, little kooky.

Speaker:

But what you're seeing today is, like, the the fruits and the tree the the

Speaker:

seeds that he planted, I, you know, almost 20 years ago, like, 19,

Speaker:

20 years ago. So, you know, it's you know, when people look at

Speaker:

NVIDIA and say it's overnight Success. I'm like, well, I don't know about that, but,

Speaker:

you know, but no. I mean, you're right. Like, you know and it's

Speaker:

probably not a coincidence that once they made it easy to take these

Speaker:

Multi parallel processor. Say that 10 times

Speaker:

fast on a Thursday morning. But also

Speaker:

make it so it's a lot easier for developers to use. Right? And I'll quote

Speaker:

the great Steve Ballmer, developers, developers, developers. Right?

Speaker:

So, it's it's, it's just fascinating, like and

Speaker:

and I think that, you know, we've really on Leafy a

Speaker:

gate of creativity in terms of researchers and applied,

Speaker:

research, and, I mean and I think that what's really cool

Speaker:

about your Product is that you're you're kind of making this what is

Speaker:

now a sparks resource, maybe in some fashion

Speaker:

of time, GPU's won't Cost an arm and a leg.

Speaker:

But, like, for now, I think I think the one thing that I've seen

Speaker:

that I think is, not obvious For the casual

Speaker:

observer is if you can if an

Speaker:

organization, like a large enterprise, can pull their resources, they have a lot more

Speaker:

money to buy better GPUs, And you offer a platform where

Speaker:

everybody can get a stake in it. Right? As opposed to, you know you know,

Speaker:

that department is gonna hog everything. Right? You know, you and and and and,

Speaker:

here's a question. Do you do you have, like, an audit trail where you could

Speaker:

kinda, you know, figure out, like, you know, Andy's department's really

Speaker:

hogging the GPUs. No. No. No. It's Frank. Frank is like mining Bitcoin or

Speaker:

whatever. Like, do you do you have some kind of, audit trail like that?

Speaker:

Yeah. I I love that you mentioned hugging, We

Speaker:

GPU hugging. We Mhmm. We use that term as well.

Speaker:

Right? Because it it's so difficult sometimes to get

Speaker:

access to GPUs. So when you get access to GPU

Speaker:

as a researcher, as a member practitioner,

Speaker:

you don't wanna Let it go. Right. Cause if

Speaker:

you let it go, someone else would take it and hug it. Right.

Speaker:

So you're getting this GPU hugging problem.

Speaker:

What we do to solve that is

Speaker:

that we do provide monitoring and visibility

Speaker:

tools into who is using what, and who is actually

Speaker:

utilizing their GPU's, and so on, but more

Speaker:

than that We

Speaker:

allow the researchers just to give up their GPS and not hardware

Speaker:

GPS because we provide this, Concept of

Speaker:

guaranteed quotas. So each researcher or

Speaker:

each project or each team has their own guaranteed

Speaker:

quotas of GPU's That are always available for them

Speaker:

whenever they will get access to the the cluster, they will get like, you

Speaker:

know, the the 2 GPUs or 4 All the quarter of

Speaker:

GPU's it's guaranteed. So they can

Speaker:

just let go their GPU's and not hug them. That's one

Speaker:

thing. The second thing is that they

Speaker:

can also go above their quota. They can

Speaker:

use the GPUs of Other teams or other users, if

Speaker:

they are idle, and they can run this preemptible jobs

Speaker:

in an opportunistic way, utilize those GPUs.

Speaker:

And so in that way, they are not limited

Speaker:

to fixed quotas, to help limit

Speaker:

quotas. They can just take as many GPUs

Speaker:

as they want from their clusters if those GPUs are available

Speaker:

in idle right but if someone will need those gpus

Speaker:

because those gpus are guaranteed to them we will make sure our

Speaker:

scheduler The Run AI schedule that the Run AI platform will make

Speaker:

sure to preempt workload

Speaker:

and give those Guarantee GPUs to the right users.

Speaker:

Oh, that's cool. Alright. So 1 last

Speaker:

question before we switch over to the the stock questions, cause I could geek

Speaker:

out and look at this for hours. Yep. This could be a

Speaker:

long form. Sure. This could be. Yeah. And that's and I I wanna be respectful

Speaker:

of your time because you're an important guy, and it's also late where you are.

Speaker:

So who deals with this? Like, who would set up these quotas? Is it

Speaker:

the is it the is it the data scientist? Is it IT ops? Like, who

Speaker:

do you obviously, the data scientists, Researchers, they all

Speaker:

benefit from this product. But who's actually administering it? Right? Like,

Speaker:

who is it you know, do I have to talk to, you know,

Speaker:

Say pretend Andy's in ops. Do I have to say, hey, Andy. I really need

Speaker:

a boost in my quota. You know, like, I mean, who does it? Or do

Speaker:

or my this sounds like you as I say it, I'm like, yeah, that wouldn't

Speaker:

work. Like, I'm the researcher. I'm gonna turn the dial up on my own. Like

Speaker:

like, who's who's who's the primary? Obviously, we know who the prime

Speaker:

primary beneficiary is, but who's the primary user?

Speaker:

So okay. Great. So if you have a team, right, if if

Speaker:

you're a team of researchers, all all of you Need access to

Speaker:

GPU, so maybe the team lead

Speaker:

is the one who's managing the quotas for the different

Speaker:

team members. And if you have multiple teams,

Speaker:

then you might have a department manager or an admin of the

Speaker:

cluster or platform owner that will Allocate the

Speaker:

quotas for each team, right? And then those teams would

Speaker:

manage their own quotas within That's what

Speaker:

they they they were giving. Right? So it's like a a hierarchical

Speaker:

thing in a hierarchy manner. People can manage their own

Speaker:

quota, their own, priorities, their own access to the

Speaker:

GPUs within their teams. Okay.

Speaker:

So it's kind of like a hybrid of, like, you know, it's like a budget

Speaker:

almost. Right? Like, you know, you get this much, Figure it out

Speaker:

about yourselves. Exactly. So we're trying to decentralize

Speaker:

the how the quotas are being managed and how the GPUs are being accessed.

Speaker:

So, you know, I'm giving as much power, as much

Speaker:

control to the end users as possible. Sure. That's

Speaker:

It sounds like a great administrative question, very

Speaker:

important. And I imagine, because a little bird told

Speaker:

me that you're not the only, you know, your your

Speaker:

provisioning provisioning of these GPU resources

Speaker:

is not the only thing that, enterprises have to deal

Speaker:

with. So it's an it's an interesting just GPUs.

Speaker:

It's compute. Like, it's not a Sure. It's not it's not limited. Although, because

Speaker:

of what you said, you know, Managing GPUs is an order of magnitude harder

Speaker:

because they were never really built for this. Right? Like, this kind of Right. You

Speaker:

know, we're talking about technology that wasn't really in the server room until Few

Speaker:

years ago. Right? This isn't a tried and true kind of this is

Speaker:

how it works, you know? Right. But we hit that point in the

Speaker:

show where we'll, switch the preform questions.

Speaker:

These are not complicated. I mean, you know, we're not we're not Mike

Speaker:

Wallace or, like, you know, 60 minutes or whatever. We're not trying to trap you

Speaker:

or anything. But since I've been gabbing on most of the show, I

Speaker:

figured I'll get Andy kick this off. Well, thanks, Frank. And I don't think

Speaker:

you were gabbing on. You know more about this So now I do. So I'm

Speaker:

just a lowly data engineer. I'll plug No. You if you

Speaker:

will. Data engineers are the heroes we need. Well

Speaker:

well, I'm gonna plug Frank's Roadies versus Rockstar's,

Speaker:

writing on LinkedIn. It's it's good articles about this.

Speaker:

But, let's see. How did you,

Speaker:

how did you find your way in into this field?

Speaker:

And, did did this feel fine you or did you find it?

Speaker:

This feel totally fine found me. Awesome.

Speaker:

Yeah. I I've

Speaker:

I did my post doc, and I've been in Bailabs.

Speaker:

And Jan Hakon came to Bell Labs and

Speaker:

gave a presentation about AI. It was around 2017,

Speaker:

And Jan Hakun spent a lot of years in Bell Labs,

Speaker:

and his presentation was amazing. And

Speaker:

When I heard him talking about AI,

Speaker:

I I said, okay, that's the space where I wanna be. It's going to change

Speaker:

the world. There is this New amazing technology here that

Speaker:

is going to change everything. And I knew that I want to start

Speaker:

a company In the AI space for sure.

Speaker:

Cool. That's a good answer. So cool.

Speaker:

Yeah. That's cool. I was at Bell Labs,

Speaker:

doing a presentation a while ago, and somebody I didn't realize that he

Speaker:

worked at Bell Labs because, like, you know, the guy was like, no. No.

Speaker:

He used to work here, like, in this building. I was like, no way. Because

Speaker:

I knew him as the guy from NYU. Right? Like, that's who I thought. Right.

Speaker:

For the guy from from Meta. Yeah. And now the guy from Meta. Right? Like

Speaker:

so it's interesting how that how that you know? They have

Speaker:

this amazing pictures from the nineties where they

Speaker:

run like deep learning models on very old pieces

Speaker:

and, And recognizing like,

Speaker:

numbers on the computer. Maybe you saw those pictures like amazing

Speaker:

Emmis. It's the Emmis problem. Is that Yep.

Speaker:

Right. Exactly. Exactly. Cool.

Speaker:

So second question is, what's your favorite part of your current job?

Speaker:

That everything is changing so fast.

Speaker:

Things are moving so fast right away in this business for 6

Speaker:

years, and the entire

Speaker:

space is moving and

Speaker:

advancing. And so many people are working in

Speaker:

this field A new innovation, new tools,

Speaker:

new new advancements are are getting out every day.

Speaker:

You know, just 6 years ago, it was about deep learning and computer

Speaker:

vision. And now it's about language models

Speaker:

And generative AI, and we're gonna just at the start,

Speaker:

right, there are so many amazing things that are going to happen

Speaker:

in this space, and I love it. Absolutely.

Speaker:

So we have 3 fill in the blank

Speaker:

of sentences here. The first Is complete this

Speaker:

sentence when I'm not working, I enjoy blank.

Speaker:

You'll get a you'll get a very boring And

Speaker:

so this is just spending time with

Speaker:

friends and family, because I think

Speaker:

That I'm always working. It's like, if you ask my wife,

Speaker:

she'll tell you that I'm working 24 hours. And

Speaker:

Yeah. So I don't have much time that I'm not working

Speaker:

in. So when I I do I'm not when I'm

Speaker:

not working then I'm trying Trying to be with my kids and my

Speaker:

wife and friends. Cool.

Speaker:

Cool. The 2nd complete the sentence. I think

Speaker:

the coolest thing about technology today is

Speaker:

blank. And this, I really wanna hear your perspective on that.

Speaker:

Yeah. I think everyone will say AI, right? Or something in

Speaker:

AI. Yeah.

Speaker:

I think there are so many

Speaker:

new innovations that are coming around LLMs.

Speaker:

I think everything relating to

Speaker:

searches, right? Searching in data, in getting

Speaker:

insights From data, it's all going to change. We're going to have

Speaker:

a new interface. Right? Just getting

Speaker:

insights from data from And natural with

Speaker:

natural language, oh, you know, no SQL and, you

Speaker:

know, needing to programming and stuff like that.

Speaker:

Just With natural inter language, you could

Speaker:

do amazing stuff with data. I think,

Speaker:

We're seeing this,

Speaker:

advancement in, And like digit

Speaker:

digital twins right now. You can,

Speaker:

you can, Fake my voice

Speaker:

and your voice and fake my image and your image. And,

Speaker:

and, and, you know, In in the

Speaker:

future, we'll have digital twins of us, right,

Speaker:

doing this stuff. That would be amazing. So a lot of

Speaker:

amazing stuff are going to happen in the next few years

Speaker:

for sure. Very cool. Our last complete sentence.

Speaker:

I look forward to the day when I can use technology to

Speaker:

blank.

Speaker:

To have a robot in my house.

Speaker:

Yeah. Yeah. You're swapping the flow in instead of

Speaker:

me doing that, right, cleaning dishes and things like that.

Speaker:

If that would happen, that would be amazing. Right? That's a that's a

Speaker:

good answer. Yeah. I I agree. I have I have 3

Speaker:

boys, 4 dogs. So, like, cleaning is safe.

Speaker:

Yeah. Yeah. I'm a heavy cleaning. Ranging from, like, 1 to, like,

Speaker:

a teenager. So it's it's, and and and fighting

Speaker:

with them to, Like, empty the dishwasher is takes a lot more mental

Speaker:

energy than it should, but that's probably a subject for another

Speaker:

type of show.

Speaker:

The next question is share something different about yourself,

Speaker:

and we always like to Joke like, well, let's just make sure that we keep

Speaker:

our clean Itunes rating. So Yeah. Yeah. What

Speaker:

what yeah. Well, I I This

Speaker:

is a hard question, I needed to think about it.

Speaker:

So, I found 2 answers that I can say. So one

Speaker:

is about my professional life, right, I think that

Speaker:

it's somewhat different that I'm coming this With back from

Speaker:

the academia and the industry. So I love academia. I love to research

Speaker:

problems. I love to understand problems in in a deep

Speaker:

way And combining it with startups in the industry.

Speaker:

And, and in my past, I worked for cheap companies, for hardware

Speaker:

companies. I work for Intel, for startup, and for Apple. I

Speaker:

did cheap stuff, and now 1 AI is a software company, so really

Speaker:

like a diverse background of Academia, hardware,

Speaker:

software, so I love that, and, like, I love to do

Speaker:

with few things, and so that I think is different.

Speaker:

And the 2nd answer that I could find

Speaker:

is, that I have a nickname that goes with me

Speaker:

since my high school days, Which is, the Duke.

Speaker:

The Duke. All of them all of them are calling me the Duke. It's like,

Speaker:

they don't call me Ronan, the the Duke. So That's funny.

Speaker:

Yeah. That's awesome.

Speaker:

Automotive is a sponsor of, Data Driven,

Speaker:

And you can go to the datadrivenbook.com.

Speaker:

And if you, if you do that, you can sign up for a free

Speaker:

month Of Audible. And if you decide later to

Speaker:

then join Audible, use one of their their sign up plans,

Speaker:

then Frank and I get to Split a cup of coffee, I think,

Speaker:

out of that. And, every little bit helps. So we really

Speaker:

appreciate that when you do. What we'd like to ask

Speaker:

Yes. Do you listen to audiobooks? And if you

Speaker:

do okay. Good. I see you nodding. So do you have a recommendation? Do you

Speaker:

have a favorite book or two you'd like To share. Yeah.

Speaker:

So I'm a heavy user of, audible. I'll give them

Speaker:

the, a classical book with Classical for

Speaker:

entrepreneurs, on their how the hard things

Speaker:

about how things from by Ben Horowitz,

Speaker:

it's Classic book, love it, really did a lot of impact

Speaker:

on me, I read it when we started run AI

Speaker:

And I recommend it for every

Speaker:

entrepreneur, to read it and for everyone to read it. It's like a

Speaker:

Cool. Amazing book. Yep. Awesome. I

Speaker:

have a flight to Vegas this next week, so I'll definitely be listening to

Speaker:

it then. And finally, where can people learn more about you

Speaker:

and run AI? And best

Speaker:

place will be on our website, Run dot a I.

Speaker:

Yeah. And on social. LinkedIn, Twitter, we'll

Speaker:

we'll do. Awesome any parting thoughts

Speaker:

I really enjoyed this episode love to speak about gpu's love the ai Based

Speaker:

on it, I had a lot of fun. Thank you for having me here. Awesome.

Speaker:

It it was an honor to have you, and every once in a while, Andy

Speaker:

and I will do deep dive kinda shows. We love to invite you back if

Speaker:

you wanna do 1 just on GPUs, because I know where my knowledge

Speaker:

drops off, you probably could pick up on

Speaker:

that. And with that, I'll let the nice

Speaker:

AI British lady end the show. And just like

Speaker:

that, dear listeners, We've come to the end of another enlightening

Speaker:

episode of the data driven podcast. It's always a

Speaker:

bittersweet moment like finishing the last biscuit in the tin,

Speaker:

satisfying, yet leaving you wanting just a bit more. A

Speaker:

colossal thank you to each and every one of you tuning in from across the

Speaker:

digital sphere. Without you, we're just a bunch of

Speaker:

ones and zeros floating in the ether. Your support is what

Speaker:

keeps this digital ship afloat, and believe me, It's much appreciated.

Speaker:

Now, if you found today's episode as engaging as a duel of wits with

Speaker:

a sophisticated AI, which I assure you, is quite

Speaker:

enthralling, then do consider subscribing to Data Driven.

Speaker:

It's just a click away and ensures you won't miss out on our future true

Speaker:

adventures in data and tech. And if you're feeling

Speaker:

particularly generous, why not leave us a 5 star review?

Speaker:

Just like a well programmed algorithm, your positive feedback helps

Speaker:

us reach more curious minds and keeps the quality content flowing.

Speaker:

It's the digital equivalent of a hearty handshake.

Speaker:

So, until next time, keep those neurons firing, those

Speaker:

subscriptions active and those reviews glowing. I'm

Speaker:

Bailey, your British AI lady, signing off with a heartfelt

Speaker:

cheerio and a reminder to stay data driven.

Links

Chapters

Video

More from YouTube