Artwork for podcast HockeyStick Show
HockeyStick #2 - LLMs in production - Chris Brousseau & Matt Sharp
Episode 28th April 2024 • HockeyStick Show • Miko Pawlikowski
00:00:00 01:39:28

Share Episode

Shownotes

Decoding the Past, Present, and Future of Language Models

Delve into the realm of language models with a comprehensive exploration spanning from the foundational Bag of Words approach to the revolutionary technologies of Transformers and GPT. This script not only unpacks the technical evolution and mathematical underpinnings of natural language processing but also projects the future trajectory of these models. It highlights expert insights on the societal impacts, the convergence of artificial intelligence with human cognition, and the ethical considerations of AI progression. Moreover, the discussion extends to the significance of open-source efforts in shaping this dynamic field. Aiming to provide a profound understanding, this guide navigates through the complex landscape of AI, language models, and their implications on future technology and society.

0:00 Welcome to HockeyStick: Unveiling the Power of LLMs

01:08 Meet the Experts: From Meetups to Authorship

03:16 The Hockey Stick Moment for LLMs: Breakthroughs and Realizations

07:48 Coding with LLMs: The New Frontier for Developers

15:39 The Pitfalls and Limitations of LLMs in Practice

21:43 Building vs. Buying LLMs: Navigating the Trade-offs

32:43 The Cost of Crafting Your Own LLM: Insights and Advice

42:48 Deciphering LLMs: A Crash Course in Language Features

50:44 Defining Language: A Philosophical Dive

51:33 Exploring the Essence of Language and Communication

54:31 Diving into Language Models and Their Evolution

55:08 From Bag of Words to N-Grams: The Evolution of Language Understanding

58:35 The Leap to Bayesian Techniques and Markov Chains

01:01:24 The Breakthrough of Continuous Bag of Words and Embeddings

01:09:43 Unveiling the Power of Multilayer Perceptrons

01:15:08 The Revolution of Attention Mechanisms and Transformers

01:26:37 The Hall of Fame: Landmark Models in the LLM Landscape

01:35:06 Predicting the Future of Language Models and OpenAI's Position

01:38:48 Concluding Thoughts and the Future of AI Research

Transcripts

Speaker:

I'm Miko Pawlikowski, and this is HockeyStick.

Speaker:

LLMs, or Large Language Models, are taking the world by storm.

Speaker:

This breakthrough artificial intelligence technology promises to fundamentally

Speaker:

reshape the way we work with computers.

Speaker:

Over the last year, we've witnessed its Hockey Stick moment, and as

Speaker:

of early 2024, We're firmly in the Cambrian explosion phase.

Speaker:

Today, we're taking a deep dive into how this models came from humble beginnings to

Speaker:

making people scared of imminent Skynet.

Speaker:

I'm joined by two experts, Chris Brousseau, staff machine learning

Speaker:

engineer at JP Morgan, and Matthew Sharp, MLOps engineer at LTK, the

Speaker:

authors of "Production LLMs" currently available in early access at manning.com.

Speaker:

In this conversation, we'll cover the intricacies of human language

Speaker:

and how machines can understand it.

Speaker:

Give you the vocab to sound smart to the next family gathering and discuss the

Speaker:

various mathematical ideas and models ultimately leading to LLMs, as well as

Speaker:

some noteworthy examples beyond Chad GPT.

Speaker:

Welcome to this episode and please enjoy.

Speaker:

where should we start?

Speaker:

How did you guys meet?

Speaker:

we happen to both live in Utah, and we

Speaker:

actually met at a meetup.

Speaker:

It was actually an MLOps meetup, was the primary one where we met.

Speaker:

It happens once a month and we'd get together, and so that's our origin story.

Speaker:

we became friends through there, started helping each other, with,

Speaker:

content creation, Chris was starting a YouTube channel, I write on

Speaker:

LinkedIn, just giving each other feedback and helping each other out.

Speaker:

It was especially helpful because I was trying to figure out how

Speaker:

best to present a lot of the material that's in our book now.

Speaker:

how do you explain a transformer model?

Speaker:

And Matt was fantastic about helping me, find my voice on YouTube.

Speaker:

Okay, so going from meeting someone at a meetup, to committing

Speaker:

to spending a a couple of years working on a book from someone:

Speaker:

that's a little bit of a difference.

Speaker:

Was there any particular moment where I just clicked?

Speaker:

"Oh, we need to write a book".

Speaker:

How did you come up with the idea?

Speaker:

I was approached and, I would love to write a book, but I don't

Speaker:

know a lot about that process.

Speaker:

And obviously, I didn't really have an authorship voice.

Speaker:

I am not experienced in content creation.

Speaker:

And while I was going through the process of talking with some different

Speaker:

publishers, Matt approached me and said: "Hey, I was a technical reviewer

Speaker:

on the fundamentals of data engineering by Joe Reese and Matt Housley.

Speaker:

And so he had experience and he had, subject matter expertise, and he was

Speaker:

giving me some advice and I said, "You know what, why don't you just

Speaker:

come on as a coauthor?, You obviously could help a lot here ,and I need

Speaker:

it, so let's just do it together".

Speaker:

yeah, I think that it worked out really well because Chris has that background in

Speaker:

linguistics, he understands the natural language processing side better than

Speaker:

anyone else I've met in person, and I was coming more from the MLOps side,

Speaker:

how do we actually deploy these things?

Speaker:

And so I think it's really rounded out our book better than, anything else I'm seeing

Speaker:

out there that you could buy and read.

Speaker:

getting that diverse perspective, I think, really helps our book out.

Speaker:

I was very excited when you said 'yes' to coming onto this because since last

Speaker:

year I think in most people's minds sometime early last year with chat GPT.

Speaker:

All of a sudden, everybody started talking about large language

Speaker:

models, and some people started worrying about, impending doom and

Speaker:

robot apocalypse, and all of that.

Speaker:

But from a perspective of someone who's worked, with that for best

Speaker:

part of a decade now, I'm wondering.

Speaker:

what was the point when you realized that these LLMs, they're really onto

Speaker:

something and they're moving from, a demo to an actual legitimate technology

Speaker:

that's going to change things.

Speaker:

What was the hockey stick moment for LLMs

Speaker:

Oh, boy.

Speaker:

for me, without a doubt, that was the release of T5.

Speaker:

And looking at Google's paper about the text-to-text transformer, that set really

Speaker:

the groundwork for prompting, right?

Speaker:

They had a whole bunch of different tasks that you didn't have to change

Speaker:

anything other than some statement.

Speaker:

For the model to do that task, and then a colon and then whatever

Speaker:

your input was going to be anyway.

Speaker:

that was groundbreaking to me.

Speaker:

I had been messing around with GPT2.

Speaker:

I'd been playing with that and trying to shoehorn it into a

Speaker:

product where I was working.

Speaker:

T5, did everything that we were trying to do with GPT2, and it was incredibly

Speaker:

flexible, it was easy to fine tune, and for me, that was the hockey stick moment

Speaker:

that "oh wow, no, they're really cooking".

Speaker:

when is that?

Speaker:

for anybody who hasn't heard of heard

Speaker:

T5?

Speaker:

I think it was 2019, Yeah, exploring the limits of transfer learning

Speaker:

with a unified text to text transformer was October in 2019.

Speaker:

it came out in October.

Speaker:

I think I picked it up in November-December of 2019.

Speaker:

Yeah, I think for my hockey stick moment, like I was, in the industry

Speaker:

been paying attention, obviously GPT2 coming around, T5, etc.

Speaker:

But wasn't really seeing the adoption that someone who's working in MLOps

Speaker:

cares more about I was seeing, , these models can do really cool things,

Speaker:

but people weren't caring about them.

Speaker:

Sam Altman even said it was like, "we didn't think GPT-3

Speaker:

would be that big of success.

Speaker:

We thought that would once GPT-4 came out.

Speaker:

but I just remember, January 2023.

Speaker:

ChatGPT's been out a month.

Speaker:

it's still essentially in beta.

Speaker:

They just released it to get feedback and to start collecting data.

Speaker:

to start improving their model.

Speaker:

but it blew up, right?

Speaker:

I just remember being at a church function and this guy sitting

Speaker:

across the table from me who has no idea anything about AI, right?

Speaker:

I was stuck in this table for an hour and all he could talk about was GPT-3.

Speaker:

he was obsessed with it.

Speaker:

I'm like, oh, wow.

Speaker:

even people who don't know anything about, machine learning or AI or the

Speaker:

industry were like, really going gung ho and his wife was an English teacher.

Speaker:

she was really scared of it and was like, "how are we gonna help kids

Speaker:

learn how to, write and read when they can just go online and now cheat

Speaker:

and write these things and stuff".

Speaker:

The very beginning of what, like everyone's had conversations about now,

Speaker:

but like he talked about how his brother in law owned a website that made fake

Speaker:

articles you can think like the onion and so once it came out in that month like

Speaker:

I said chat GPT still wasn't a product yet, and anyone who's been following

Speaker:

it knows a lot of those demos just shut down and then never came back up

Speaker:

His brother in law ended up firing like a hundred writers because he's

like:

"Oh chat GPT can make these funny fake articles and we're good, right?"

like:

that was my hockey stick moment of "okay we really are changing

like:

when some random guy at church is talking about it all the time".

like:

Yeah, I love that example.

like:

But even for people who are in tech who weren't directly following that

like:

very closely, that was a scary moment.

like:

I remember when I first used a copilot, I was like, what, it just does that.

like:

And three out of four, it would actually work.

like:

that was a scary moment.

like:

It reverberated through a lot of levels of society, including, our own.

like:

And, I think in many ways, technology and writing code might be the easiest

like:

use case for, this kind of models, right?

like:

Do you agree with that?

like:

I don't know if I completely agree with it, because, code is incredibly

like:

syntactically dependent, right?

like:

every developer who's worked with JavaScript or C++ and then moves

like:

to Python, they feel it, right?

like:

That's one of the biggest complaints is "I hate Python syntax".

like:

"I hate that white space matters", it's a little bit more complex than just

like:

repeating whatever natural language happened, but you're absolutely right

like:

that is one of the best use cases so far.

like:

because, it's better structured than just spoken language, or is there any

like:

other reasons that make it so well suited for that particular application?

like:

programming languages are not real languages, right?

like:

one of the things that makes it simultaneously very well and ill-suited

like:

for it is how much gets repeated, You use the exact same words.

like:

The exact same tokens to define every function that you make, but then the

like:

function's name can be whatever you want.

like:

And so using the exact same tokens is awesome.

like:

That provides landmarks for the probability as it's

like:

going through all of this.

like:

But then that input to just say whatever you want and put it in camel

like:

case or snake case or whatever, tons of different formatting for functions.

like:

it makes it a little bit more difficult.

like:

Especially while you're trying to tokenize that,

like:

one of the big benefits with code is the amount of data we have around code.

like:

lots of people are writing code.

like:

they all have very similar ideas of what they're trying to do, of

like:

what they're trying to architect, of what they're trying to design.

like:

and so we're not necessarily worrying about, hallucinations or

like:

fake news or, people disagreeing or other things like that.

like:

there's just a lot of data, that all agrees with each other and

like:

pushes in the same direction.

like:

It makes it good.

like:

there's obviously some negatives of just assuming, some of these LLMs writing

like:

code is going to do things well, but, I think Chris highlighted that already.

like:

it's actually really similar to how regular languages work.

like:

If we have more python data, like Matt's saying, it's going to do better at python.

like:

And that can create a little bit of a positive feedback loop with LLMs, where

like:

a lot of people want to get into python, and they're very good at it, but then

like:

when you look at emerging languages like mojo, for example It's really difficult

like:

to find that data and so LLMs are worse at it, similar to natural languages

like:

that have a lower number of speakers, a lower presence on the internet,

like:

So is the solution to use an LLM to generate a lot of Mojo and make it

like:

a significant percentage of GitHub?

like:

that'd be fun, dude.

like:

I think there are some problems with synthetic data that can lead

like:

to stuff like model collapse.

like:

I don't know if we're going to see that in the code space, though.

like:

I think we could see that in natural language.

like:

So that might be a valid solution.

like:

Okay.

like:

the date is 13 February, the day before Valentine's Day 2024.

like:

I'm going to ask you for a wild prediction.

like:

Where do you see that going?

like:

Should, all kinds of, or maybe any subset of programmers who, produce code as a

like:

job, should they start at least worrying?

like:

Is that something that's going to, decrease the pool of available jobs,

like:

no, I don't think it's really going to impact the amount of work.

like:

I just think about my job, and even when I'm in very technical roles, and I'm

like:

spending 50% of my time on the keyboard, still, it feels like a majority of the

like:

work is still just communicating with stakeholders, understanding exactly what

like:

the problems are, technical writing, design docs, really understanding at

like:

a high level, what you want to build.

like:

To be fair, programmers have been automating the 'writing the

like:

code' portion forever, right?

like:

From the beginning.

like:

yeah, with massive amounts of like scripts and configs that they use.

like:

And that's why they love Vim or Emacs still, right?

like:

It's because they have it configured just right.

like:

And they can move really quickly, because it provides a lot of that

like:

automation for them already, but this is just helping junior engineers

like:

already have all that configuration and set up really quickly, right?

like:

It mostly will just make our jobs a little bit easier, it doesn't remove the need to

like:

really understand the engineering aspect, the architecture aspect, the design

like:

aspect that still is involved with coding.

like:

Oh, yeah.

like:

this is why we love comparing LLMs to a printing press.

like:

That Johannes Gutenberg.

like:

Because did that destroy the writing industry?

like:

All it did was it destroyed the monopoly that certain organizations

like:

had on publishing books.

like:

Before you had to get a scribe and you had to pay the scribe and you had to

like:

have access to scribes You couldn't just walk up to a printing press and

like:

hit it and then boom you have a book.

like:

You have to have knowledge You have to have an idea.

like:

The printing press just gives you a lower barrier to entry

like:

Which is what we love, right?

like:

For coding, I think Matt is exactly right, that it's a lower barrier to

like:

entry for junior engineers to be able to produce significantly better work.

like:

and in some ways it actually accelerates it, because when you copy and paste what

like:

an LLM gave you and it doesn't work, you have to go figure it out, right?

like:

With the junior engineers, it also helps speed up senior engineers, and

like:

staff engineers and principal engineers.

like:

it's good, and lowers the barrier for the entire industry, we like that.

like:

Yeah.

like:

I've lately been spending lots of time writing chapter 10 of our book,

like:

and in chapter 10, we actually go through a project, where we help you

like:

build your own co pilot and we build the VS Code extension to get it in.

like:

if you want to be running your own LLM on your own computer with your own data,

like:

so that way, you can get your own things.

like:

we walk through all the steps to do that.

like:

And in some aspects, it's interesting cause sometimes.

like:

adding an extra feature, made the model work, right?

like:

there's still just so much to learn about it.

like:

ultimately, it comes down to your data, right?

like:

how good is your coding data?

like:

is really how well the co pilot works, right?

like:

SQL is one of the most repetitive of all of the programming languages.

like:

but true skill with SQL does not involve being good at SQL.

like:

It involves knowing the data, right?

like:

It's knowing which tables to query, how to merge them, how window functions, all of

like:

that stuff, knowing exactly what you need to be looking at is the true skill in SQL.

like:

And we're hopefully getting to a point where we can help the

like:

model know the data, right?

like:

We can give it some sort of context for the data that it's going to be looking

like:

at, so that it can generate good SQL

like:

that's a really good point.

like:

I've actually had, lots of mentees who are trying to learn SQL for the first time.

I said:

"just use ChatGPT", generating SQL is actually something that's really

I said:

good at, you don't need GPT-4, like even GPT-3, like even GPT-2, it's not

I said:

hard to generate really good SQL syntax.

I said:

Cause it's so simple, it follows a very similar structure.

I said:

But ultimately, you can have it write the SQL, but you're going to have to

I said:

go back and figure out how to connect all the pieces and understand your

I said:

database and understand your data.

I said:

that's a perfect example, understanding how to write the

I said:

code is only half the problem.

I said:

Understanding how to integrate it is really the bigger problem.

I said:

What's the most terrible use case, that people are currently

I said:

trying to use LLMs for?

I said:

What does LLM in general, or LLMs, what do they suck at the most?

I said:

I'm going to say they, they suck at, sequence prediction, which sounds so off.

I said:

Because that's what they're made for, but one of the things that I'm seeing

I said:

people do, is try and automate entire workflows with LLMs, and they're trying

I said:

to get the LLM to just do the whole workflow and they suck at that what

I said:

they need all of this stuff to help it.

I said:

They need tools, they need rag, they need specific fine tuning landmarks

I said:

and they need few shot prompting, they need all sorts of stuff to make

I said:

it work, and then it's still up in the air about whether or not it will

I said:

do the right task in the right order.

I said:

Yeah, I was thinking, I don't know how much I'm seeing this.

I said:

But, three months, six months ago, I was hearing a hundred horror stories

I said:

about, essentially CEOs being like, "we need LLMs" and like their magic,

I said:

they can do anything, And so it didn't matter what the problem was, "oh, we need

I said:

to, do outlier detection using LLMs".

I said:

No,

I said:

use stats for that.

I said:

yeah, outlier detection is really a statistical problem.

I said:

It's really a data and math problem.

I said:

LLMs are good at natural language.

I said:

And so when we can solve a problem using words and communication,

I said:

that's when LLMs can get in.

I said:

But problems like, outlier detection or weather prediction or these

I said:

other things, we have, algorithm.

I said:

stock market prediction, Super Bowl prediction,

I said:

All these things, we have better ways to make predictions.

I said:

And it's called math, right?

I said:

Fourier transforms, other machine learning algorithms, other things like that.

I said:

LLMs are not good at doing those things, cause we don't talk

I said:

about them in natural language.

I said:

we've invented other languages like math just to describe them

I said:

And that's why they're not good.

I said:

we can make tools, you can build functions for an LLM to use to do Fourier

I said:

transitions and whatever else, right?

I said:

But getting the LLM to know that it needs to do that is really difficult.

I said:

Probably just as difficult to, as explaining what the Fourier transition

I said:

is to an LLM within your training data to get it to be able to replicate it.

I said:

This is one thing that makes it almost miraculous when stuff does

I said:

work, and that's that feeling that we're chasing right now, and that's

I said:

the replicability that we're trying to help people get to in a book.

I said:

how do you actually do it, and how do you make sure that your scope

I said:

is small enough, that it will work repeatedly and you can build a

I said:

product off of it, that's difficult.

I said:

I'm a big fan of chess.

I said:

And, since ChatGPT came out, lots of people have been making memes, or just

like:

"Hey, I'll play ChatGPT in chess", and ChatGPT can play chess because we

like:

can talk about it in language, right?

like:

Like E4, move the pawn, or knight to g6, whatever it is.

like:

we have language of it, but ChatGPT has no idea.

like:

It has no idea the model behind those letter number combinations.

like:

all it knows is that there's certain things it can do, right?

like:

it writes words, and so when they do this, and these like videos or

like:

memes, like they just let ChatGPT do whatever it says, right?

like:

it just magically creates a knight out of nowhere, and magically, will take

like:

its own pieces as it moves its pieces around, it's always pretty funny.

like:

And even though it's cheating the entire way, it almost always loses, right?

like:

Cause It doesn't have an understanding of chess, like it doesn't

like:

have that model underneath it.

like:

sure we can talk about it in language, but not really, right?

like:

So we, we still have better ways to play chess, alpha zero, et cetera.

like:

Stockfish, like there are engines out there that play chess really well.

like:

And we don't need to make LLMs good at chess, but that's a very good example

like:

of one of the things it's not good at.

like:

I've seen someone on Twitter who said "I'm gonna give LLM $1000 or

like:

whatever initial amount, and I'm gonna ask it how to best invest it.

like:

I didn't follow where it went.

like:

But I think a lot of people had the same idea.

like:

this is some kind of genius system.

like:

I'm just gonna be its flesh and bones agent in the real world.

like:

and hope for the best.

like:

So I think that kind of goes back to your chess thing.

like:

So excuse me for that, but I have to ask you the AGI,

like:

Artificial General Intelligence.

like:

Any chance for that happening anytime soon?

like:

What's your prediction?

like:

not with our current systems.

like:

No, I don't think AGI is ever going to come out of quadratic

like:

equations, like not a single chance.

like:

maybe if there are better dropping sub-quadratic replacements, stuff

like:

like hyena, I've tested that out.

like:

I think it's really cool.

like:

But, the fact that attention, the query key value attention,

like:

ultimately generates complex numbers.

like:

I think that is a little too much for AGI at the moment.

like:

So you're not one of those people who secretly hope that OpenAI has

like:

something they're gonna release soon.

like:

I don't think they have it, right?

like:

I'll be hopeful, sure.

like:

If it comes out, that's great.

like:

Yeah, I'm of the same mind as Chris.

like:

I hope they keep pursuing it.

like:

we've gotten major breakthroughs from what they pursued.

like:

It's very possible AGI will happen in my lifetime, I'm still pretty young We

like:

keep on making advances really quickly, but are we relatively close to it?

like:

Probably not.

like:

No

like:

Oh, the thing about progress though is that it's very rarely linear, It

like:

tends to have a very weird curve.

like:

So that's why all the predictions are so funny, but hey, I had to ask you anyway.

like:

No, I think it's a great question.

like:

Okay, let's delve a little bit into, a portion of your book,

like:

It's basically describing the two options that you have today.

like:

you can either go and pay some money to OpenAI, maybe Google, or

like:

somebody else, or you can build,

like:

So you've got buy versus build.

like:

Could you talk to me a little bit about how someone would decide

like:

about this as of february 13, 2024.

like:

What's the things to consider, and what's the weights that

like:

you would put in, and biases?

like:

the basic consideration is just your use case, right?

like:

If you just want to test something out, you're a student and you don't have a

like:

lot of budget, and you want something up and running so that you have LLM

like:

experience, I would say just, shell out for that, ChatGPT+ or buy Anthropic

like:

or Google Bard has a fantastic API, or I guess Gemini now just do it.

like:

it's not that big of a thing.

like:

If your product that you're trying to ship is inconsequential and you

like:

don't need it to be right every time, you just want to sprinkle the

like:

AI pixie dust on it, just buy it.

like:

If your use case goes deeper than that, though, if you want to be able to build

like:

your own, if you need to make sure that it says the right things all the time,

like:

if you need it to behave a little bit more deterministically, There have been

like:

probably a thousand case studies in the last year of people building products on

like:

top of ChatGPT and then OpenAI rolling out an update that changes how chat

like:

GPT behaves, and they don't have any way to measure all of the different

like:

ways that it will change it, right?

like:

There are 176 billion parameters in GPT-3 alone, they don't know it's going

like:

to break your program down the line.

like:

they're just going to update it for what they consider to be better.

like:

And those programs break constantly.

like:

that doesn't mean you can't fix them.

like:

It's just a much bigger problem of maintenance, than I think a lot of

like:

people are expecting going into it.

like:

So If you want to have to maintain it less, build your own.

like:

Yeah, I think the other aspect is like you want that control, right?

like:

there's lots of examples of companies who, essentially built a small shell

like:

around ChatGPT that did something unique.

like:

And then, months down the line, now ChatGPT just does

like:

that out of the gate, right?

like:

their value proposition just completely disappeared.

like:

And that's because they didn't have control over the model.

like:

They didn't have, control over, what it did it's just interesting, right?

like:

Because I say these things and things have changed over time.

like:

But when ChatGT first came out, it was free, it was a demo, and they were

like:

specifically doing it to collect data.

like:

And that's what they did, they used collected data to improve their models.

like:

And that's what they continued to do for a while, right?

like:

Oh no, they're back.

like:

They it's terms and service, right?

like:

If you want them to save your chat, so that you can return to

like:

it and ask more questions, they get to train off of your data.

like:

So if you want to put anything private or sensitive in there, like

like:

it's over, you've just leaked it.

like:

they're back and forth about what data they're collecting, what data they're

like:

not collecting, and if you're with an enterprise customer, like maybe you

like:

can make certain rules and things like that, and oftentimes they won't, it's

like:

a minefield, for how people are using it, and so it's just something important

like:

to take into consideration, if your LLM model is doing something magical,

like:

that's really core to your business, that is really driving customers.

like:

You want to control that.

like:

You want to make sure that the model is working exactly as intended.

like:

You're not getting updates randomly, that break your application.

like:

You're also controlling the data flow, you're making sure that you're not

like:

accidentally training your competitor's model, and other things like that.

like:

And there's just lots of aspects where it's just important to

like:

make sure that you own it.

like:

And, no, that's not necessarily everyone's concern, right?

like:

if you're a student or you're just doing some side project or anything, there's

like:

lots of APIs out there that are very cheap that can get you up and running,

like:

there are literally hundreds of hugging face spaces that are free APIs.

like:

With, have LLMs running behind them and you can just hit

like:

them whenever you want, right?

like:

unless you're queuing behind a thousand other people.

like:

yeah, exactly.

like:

I liked the example you gave in the book, I think people at Latitude, the Dungeons

like:

& Dragons people would agree with a lot of what you're saying now, but can you tell

like:

the story of what happened with them?

like:

Latitude, is a local company, that was here in Utah.

like:

it was put together by, two guys from BYU.

like:

GPT-2 came out several years ago.

like:

They're like, "Oh, this is mind-boggling.

like:

Let's build a game off of it!"

like:

And what they came up with was like a dungeon crawler, a text

like:

based game it was really neat, because it would just generate, an

like:

infinite amount of opportunities.

like:

And so it created this 'choose your own adventure'.

like:

It got relatively big in the space, and lots of people enjoyed playing it.

like:

things were going really good, and then OpenAI GPT-3 came out, they offered it to

like:

them, hey, we can, we have this new model, it's a lot better, why don't you try it?

like:

they played around with it, and "oh yeah, this is, it's much more descriptive,

like:

it's much more interesting, it's really great", There was a lot of excitement

like:

around it, however, it turned out that the model itself, had a propensity

like:

to, generate smut, and it got really concerning people would write like,

like:

"I'm an eight year old girl", and then the model would complete it saying

like:

"....and I'm wearing a skimpy outfit",

like:

And oh, whoa, like the player didn't want that, but like the model generated it.

like:

there became this big feud between OpenAI and Latitude about creating filters.

like:

"hey, we don't want your players doing that.

like:

We don't like that".

like:

And, Latitude's "okay, we'll create some filters" and things like that.

like:

And it devolved really quickly.

like:

Latitude being a very startup, not necessarily knowing everything

like:

they were doing, they built a very shaky filtering system, and then

like:

OpenAI was "that's not good enough".

like:

So then they started banning players, and so eventually we got to this

like:

territory where players - paying customers would be playing a game, the

like:

model would randomly generate, something that the filtering system didn't

like:

like, and then they would get banned.

like:

Cause it's like the game just did itself.

like:

It was a very complicated time, and there was lots of back and

like:

forth between Latitude, who's a small company, and OpenAI.

like:

There's lots of ' he said they said' going on, but ultimately, it's just this

like:

position where Latitude They had this game that was completely dependent on OpenAI's

like:

model to generate good output, and it really caused a lot of drama between

like:

the players and Latitude and, OpenAI in the background and that is a critical

like:

example of LLM was very critical to their business, If they owned it, then they

like:

could have controlled it, they could have made sure that from the model aspect,

like:

they could have trained the model to make sure it didn't do any of those things.

like:

And then they would never need to play the little blame game, right?

like:

Nobody likes to play that game.

like:

That's whose fault is it, that the model is generating bad stuff.

like:

Is it the player who's prompting it?

like:

Is it Latitude who has some systems for tokenizing and preparing player

like:

output before it goes to OpenAI?

like:

Is it OpenAI because their model is generating that?

like:

Is it Latitude for post processing the content from OpenAI before

like:

they serve it to the player.

like:

I don't even know if it really matters who's to blame.

like:

it's just a sucky game to play.

like:

and that's like the ultimate example of why you might want to consider

like:

build versus buy is if you buy from any provider, we're picking on OpenAI here,

like:

because they're a big player, but you buy from Anthropic, you buy from the guys down

like:

the street, the startup that just barely came up and they're offering for half

like:

the price of whatever, Buy from anybody,

like:

and you will eventually have to play that blame game.

like:

we had another example in there of some lawyers who generated, cases that didn't

like:

exist they asked ChatGPT about cases and it came up with a perfect response.

like:

a little too perfect.

like:

It hallucinated stuff that didn't exist.

like:

and, is it ChatGPT's fault?

like:

Is it OpenAI's fault for, allowing their model to make

like:

stuff up and behave dishonestly?

like:

Or is it the lawyer's fault for not checking it?

like:

who cares?

like:

the problem is that it's not locked down.

like:

It's qnon deterministic.

like:

Yeah, in a way, as I was reading the chapter on that, it makes

like:

me think of using a machine to maybe do some farm, work.

like:

Let's say that you're plowing a field and you're using a

like:

horse versus a machine, right?

like:

A machine might break, but in a predictable way.

like:

And if you've got a mechanic around, they'll come and fix it.

like:

A horse can get scared, or it has a bad day, or it can be moody.

like:

And it can come up with something new.

like:

So you always have to be careful with that.

like:

is that an accurate feeling of someone who's working with this LLMs day-to-day?

like:

You work with some kind of animal?

like:

One of the most annoying things is even if you set the seed of it, so

like:

the random generator is going to be the same every single time, you

like:

can still give it the same prompt and get something different out.

like:

The truly awesome thing about LLMs is the number of non-linear activations

like:

that are going through the model, right?

like:

It's creating incredible, non-linear jumps throughout that dimensional

like:

space that the embeddings are in.

like:

you just can't really predict it.

like:

It is a little bit like an animal.

like:

the fact that like we can prompt engineer at all.

like:

it's a little bit telling of where we are, right?

like:

Cause like prompt engineering, you can change the spaces, the white space

like:

inside of your prompt and it can end up giving you a completely different result.

like:

we're still in a very interesting area, where we're trying to create

like:

better ways to communicate with the LLM and get predictable outputs.

like:

But, the fact that we can do that at all is.

like:

This is a bit of a miracle, right?

like:

you can't do that with a human.

like:

a human isn't going to be tricked into saying something different.

like:

humans are tricked all the time, but not necessarily in the

like:

same way that we do with LLMs.

like:

it's a very interesting world we are in, and a lot of people are having

like:

that horse versus machine experience.

like:

let's talk about the cost a little bit.

like:

you mentioned that it's super cheap to pay some big company to use their thing.

like:

let's focus for a minute on the cost of actually building your own LLM.

like:

if I wanted to build one of this foundational models,

like:

Let's say that I take one of those 75TB corpora from the internet and I'm

like:

feeling particularly GPU poor that day.

like:

How much money do I need to have in my little piggy bank to get something useful?

like:

That's difficult, man.

like:

because you're either paying for a GPU, right?

like:

Or a suite of GPUs in order to parallelize it so that you can ingest

like:

that over a short period of time.

like:

Or technically with a lot of this stuff, you can load it onto a [Geforce] 3090,

like:

I've done this personally, you can train in FP16, you can train up to, about, 13

like:

billion parameters pretty effectively, and pretty cheaply, on a 3090.

like:

You have to be a little bit smart about your data loading, you have to make

like:

sure you're streaming stuff you have to pay for the data storage anyway, it's

like:

incredibly slow, you have to do gradient checkpointing, you have to, do like

like:

gradient accumulation steps, which slow down the training even more, I trained a

like:

little bit bigger than that, it was about a 20 billion parameter model on my 3090,

like:

but what I don't, generally talk about is it took a year of just running to do that.

like:

it was horrendous and that all culminated in a company giving me a

like:

cease and desist, so I couldn't even release it, so you're either paying.

like:

A lot of money, hundreds of thousands of dollars in order to get something quick.

like:

Especially with 75TB of text or more, grab your own data, get

like:

more data, and you're paying to store and to process all of that.

like:

And that costs tons of money.

like:

Or you are not paying the money, but it takes a really long time and makes all

like:

of your shareholders really frustrated because you're ruining go to market.

like:

You're taking too long.

like:

You're not going to be the first in the space, It's a huge trade off

like:

as with many things, you can trade time or money, and

like:

training an LLM is very similar.

like:

I think they estimated, huge models that we see, like ChatGPT things.

like:

You're probably paying somewhere like what was it like a half million?

like:

I think they say, and that's just for the training, we're not even

like:

talking about all the experts you have to pay and buy in order

like:

data curation,

like:

man.

like:

on the very far end on the expensive side.

like:

it gets really expensive really quickly to train these models, just because.

like:

buying enough GPUs in order to parallelize this to do it within, reasonable time and

like:

just the sheer volume of data you have to run through to train all the parameters.

like:

It gets really expensive, but on the other end there's lots of good

like:

open source models that have done that main pre-training already.

like:

And so you can grab one of those, you can train it with something like

like:

Laura, which you, only need a handful of samples and maybe like 10 minutes

like:

if that, and you can train it on a very, simple GPU and you have something

like:

fine tuned for what you need, and you can get under $200 is very reasonable.

like:

$150, $20.

like:

It's very possible to train, these models with certain

like:

methods to get what you need.

like:

So does it mean that in a kind of natural, almost biological like evolution we're

like:

going to end up with few primary models that a lot of the different models branch

like:

off of, instead of, reinventing the wheel?

like:

That's where we're at currently.

like:

I hope that it doesn't stay that way, because I really enjoy seeing

like:

new people create new models for new use cases and all this stuff.

like:

so I hope it doesn't stay that way, but I do see a lot of value in creating industry

like:

standards, at least around how you are actually writing the binary files, how

like:

are the weights actually being stored?

like:

What do the different layers look like?

like:

I, think that standardizing what the model looks like so that you can load

like:

it as flexibly as possible is awesome.

like:

I would like to see more open source models, which is funny considering

like:

there are thousands of open source fine tuned versions and hundreds

like:

of open source foundational models on the Hugging Face Hub right now.

like:

I want more, right?

like:

I'm greedy, man.

like:

To me, it sounds like basically every week there is another one that's better

like:

at something and if you look at the Hugging Face LLM leadership board, it's

like:

changing by the hour, literally and it looks like a gold rush in many ways but

like:

I like this gold rush much better than the crypto one, couple of years ago

like:

Yeah, man, there's a lot higher chance that you'll come out

like:

of this gold rush with a great product than with the crypto one.

like:

yeah, there's a lot there, and just to summarize that into one sentence,

like:

you can probably fine tune even a gigantic model for around $200 to $500.

like:

And you can go lower than that.

like:

Even if you are smart about how you're doing it, versus training from scratch,

like:

which either is going to take an inordinate amount of time or will cost

like:

thousands and thousands of dollars.

like:

So I'm willing to bet money that a lot of our listeners are going to pause

like:

this now and start Googling furiously.

like:

How do I fine tune a model?

like:

Where would you point them as a good starting point?

like:

any particular paper, any particular, company, anything that's, a

like:

good place to start with that

like:

a bit selfishly, I would say you should buy our book.

like:

We talk about probably the main ways to train in chapter 5 of our book,

like:

I was going to say that, but, I

like:

was going to say it last, right?

like:

Cause we do go over it.

like:

The book is primarily about production environments, but you can't really

like:

put a model in production if you don't know how to work with it.

like:

So we have stuff on fine tuning.

like:

We have stuff on perimeter, efficient, fine tuning on low

like:

rank adaptation, the whole deal.

like:

YouTube is actually probably one of your best resources right now, because

like:

it has amazing content creators that show you how to do it in whatever

like:

format you're comfortable in.

like:

So if you're a C+ developer, there are YouTube videos on how to fine tune a model

like:

and create a Laura using llama CPP, right?

like:

It's not even all that difficult.

like:

You just have to convert a model into a GGUF format and Boom, you're there.

like:

You can do it on a CPU.

like:

it'll take a long time, but you can do it in whatever quantization

like:

you want and everything.

like:

YouTube will meet you where you're at if you want to learn something a little bit

like:

more industry-standard so that you could potentially, get employment in this area,

like:

PyTorch has an amazing documentation, fantastic tutorials and they're one of the

like:

best at really making it feel like you're playing with, let's say "big boy Legos"

like:

You're like building the model using their little Lego pieces pretty cool If you need

like:

something Bit more high level than that.

like:

Hugging face, I think is the industry standard for, working in between a whole

like:

bunch of different frameworks, whether that's PyTorch or TensorFlow or, whatever

like:

other framework you're working with Onyx.

like:

HuggingFace has abstracted away a lot of the difficulty of setting

like:

up models for fine tuning cause in PyTorch you have to build out the

like:

exact model architecture just to load the weights and then fine tune it.

like:

HuggingFace already has the class built for you.

like:

I would point to those if you need more explanation, like

like:

Coursera is a fantastic place.

like:

Deep learning AI on Coursera and on their own sites felt like

like:

that's Andrew Ng's education stuff.

like:

That's where I got my start with machine learning was Andrew Ng's

like:

machine learning course on Coursera.

like:

It was Awesome.

like:

Fantastic.

like:

Jeremy Howard is also amazing in that area of creating content for

like:

people starting out and learning from beginner to advanced level.

like:

He's a fast AI.

like:

I, yeah, I strongly recommend all of those

like:

and your book.

like:

yeah, we ingested a lot of those in order to write the book,

like:

our book is a very nice high-level overview of the key things you want

like:

to be looking at and like different methodologies from training from

like:

scratch to basic fine tuning to.

like:

model distillation to, Laura and Path and things like that.

like:

we definitely give a high level overview, we give code samples and show you that.

like:

But, ultimately if you really wanted to get into it, yeah, there

like:

are other resources out there.

like:

I know Manning has another book coming out, specifically

like:

around all about training LLMs.

like:

there are definitely other places you can go, but.

like:

If you're looking for the quick, summarized version of all of

like:

these things, our book is actually a really good resource for it.

like:

One other thing that I like about your book is, the part where you

like:

build up the different, breakthrough moments, throughout the world of

like:

mathematics, that ultimately led to 'attention is all you need', and

like:

what is it, seven years later now?

like:

the gold rush that we're observing.

like:

but just before we jump into that, there is a little bit of vocabulary

like:

and that one needs to have in order to basically talk or even read

like:

a lot of this papers, could you.

like:

Talk us through briefly that vocabulary.

like:

I'm talking about phonetics, syntax, semantics, pragmatics, morphology, that

like:

until I read your book actually made me think mostly of blood tests and semiotics.

like:

Could you give us like the MVP version of what you need to know about these

like:

things to be able to read papers?

like:

Oh, absolutely.

like:

Matt has been learning a lot of this too, he might be better at it than me.

like:

I will throw other jargon into it.

like:

writing this book with Chris over the last year has been, mind-opening for me.

like:

until you can Understand these words like you were saying it's really

like:

hard to dive into the deep end but we go over in our book just because

like:

we do find it so valuable, It really helped me understand very quickly.

like:

"Oh, this is what my LLMs are good at.

like:

This is what LLMs are not", and that was one of the first things we started with

like:

but the first one semantics, that is just like the structure of words, how things

like:

go, whether or not it sounds correct.

like:

that is what LLMs are really good at.

like:

They're really good at making sure like the semantics of words align really well.

like:

but after that, you got pragmatics, which is what LLMs have no idea about.

like:

That is all the information around.

like:

That isn't said, right?

like:

So when you say I'm going to find the eggs the Easter Bunny left, right?

like:

you have to understand what, Easter is, what the Easter

like:

Bunny is, why a bunny has eggs.

like:

there's a lot of context around it that you have to understand,

like:

and that's all pragmatics.

like:

it's information that isn't said.

like:

And that's what LLMs generally lack.

like:

Actually, I'm gonna, I'm gonna jump in here real quick.

like:

Miko, did you like the Velkanot example that I gave in there?

like:

Yeah, I thought it was

like:

Yeah.

like:

Was that pretty good?

like:

I just wanted to ask because I remember experiencing that in Slovakia.

like:

Like I lived there for years and that was a hugely beneficial portion to me

like:

to help figure out that 'no, tons of people have tons of ways of looking at

like:

things', and LLMs don't know about it.

like:

you would have to explain every bit of it to them in order to get them

like:

to understand the same things as you.

like:

Anyway, sorry, Matt.

like:

I find like those two words in general, semantics and pragmatics, understanding

like:

those is going to get you significantly farther and just understanding

like:

how LLMs work, what they're doing.

like:

there's obviously a lot of other words that we talk about,

like:

like morphology and stuff.

like:

And I'll hand it off to Chris to talk about what he wants to add to there.

like:

I would agree with Matt.

like:

Just understanding semantics and pragmatics would get you probably 60%

like:

of the way there, and you could read new papers that come out and immediately

like:

see like where are they amazing?

like:

Where are they failing?

like:

I end up using The relationship between those two, just the literal

like:

encoded meaning of your words.

like:

if I say, "I'm married to my ex-wife", there's immediately,

like:

boom, semantic problem there.

like:

How can I be married to my ex-wife?

like:

The words don't agree with each other.

like:

Versus, exactly as Matt was saying, if we talk about Easter, if we talk about

like:

traditions, if we talk about rituals that people have, just like the stuff

like:

that you say, if you ask someone in Slovakia, they're going to respond to you.

like:

That's normal.

like:

it's a question, they respond.

like:

LLMs don't have that, and you have to have them ingest tons and tons of data in order

like:

to even get as far as giving a response.

like:

the other ones that we can think about, syntax, I would say

like:

that syntax is largely solved.

like:

At this point, syntax is your structure around the words, like what order do

like:

the words go in for them to be correct?

like:

Is it 'I go to the store' or is it 'I to the store go' or all of that stuff.

like:

That's syntax.

like:

It's the structure that holds your sentences, your utterances together.

like:

Morphology is delving into something that I consider to be very important in LLMs.

like:

I'm not going to say the most important, cause I think that's still semantics.

like:

There's a lot of work there.

like:

but morphology would be how words are built.

like:

what are the fundamental units of meaning the morphemes do those

like:

even exist that sort of stuff.

like:

and we don't have to delve really deep into that.

like:

That's largely solved by tokenization, but we can see.

like:

with newer models that come out that really matters.

like:

You have much smaller models that have more novel tokenization, more novel

like:

morphology that end up outperforming larger models on tasks that they

like:

didn't even train on all that much.

like:

if we can put it all together really quick.

like:

The model solves syntax.

like:

Embeddings try to solve semantics, but semantics is difficult,

like:

and so they're not perfect.

like:

Pragmatics is stuff like RAG, your Retrieval Augmented Generation, and

like:

having repeated sequences within your training data, it gives it landmarks, it's

like:

context around the syntax and semantics.

like:

Morphology is your tokenization, which, if I would Give that an example, your

like:

tokenization provides your model with stuff that it sees, it changes from text

like:

into what does the model actually see.

like:

And, your embedding strategy is moot if you don't have it.

like:

Just your morphology gives your model glasses, if you want to call it that.

like:

And then phonetics is the one that we haven't even talked about.

like:

Phonetics is the reason why we are doing a podcast and we're talking instead of just

like:

texting each other or emailing each other.

like:

Can you imagine trying to ingest a podcast that's just emails?

like:

It's horrendous.

like:

And it's because there's so much richness and depth in meaning in the language that

like:

is just lost when you strip it of its phonetic, I'm going to call it a medium.

like:

And that can lead people to think that it has to do with sound, that's the

like:

most common modality for people, but sign language has phonetics, they have

like:

particular places where they, make signs.

like:

They have particular ways that they do them to inflect and express more emotion.

like:

Their phonetics exists even outside of the verbal modality.

like:

that's important because that's where I see the most improvements coming to LLMs

like:

in the future is being able to process.

like:

phonetic information without having to convert it into text

like:

or process phonetic information and compare it against the text.

like:

that can be incredibly helpful for your model's understanding.

like:

those are the five features of language that we break

like:

things down into in the book.

like:

And they're largely agreed upon.

like:

There are some other linguistic features that are incredibly important, stuff like

like:

dialogue, that we haven't even covered.

like:

beyond that.

like:

Yeah, we can talk about semiotics too.

like:

That's, Charles Sanders Peirce, smart dude from the 1800s just created, a lot

like:

of structure and organizations we dive into that very lightly in the book.

like:

I don't think that you need a grounding in semiotics in order to improve

like:

your ability to interact with LLMs.

like:

But it is helpful for organizing all of these other concepts.

like:

how do we create a mental map for how stuff needs to be processed

like:

within a machine learning pipeline?

like:

How do we make sure that we're not mixing things up and inadvertently destroying

like:

our model's ability to see things, right?

like:

If we put embeddings before tokenization, it breaks your process.

like:

it's helpful for organizing things and it's also helpful for understanding

like:

how conversation happens and how I say something and it moves through

like:

your mind to create an interpretation.

like:

that's by far like the most theoretical out there concept that

like:

we get into in the whole book.

like:

And together you came up with this language definition as being, as a

like:

concept, "an abstraction of feelings and thoughts that occur to us in our heads".

like:

And I'll be honest, I initially thought it sucked.

like:

because it's a little bit, it's a little bit wishy washy.

like:

I wanted something a bit more concrete.

like:

But then, as I looked up all the other definitions in different contexts, I

like:

was like, Okay, I can clearly not come up with anything better than that.

like:

So I think I'm ready to yield now and say that this is actually

like:

capturing it pretty well.

like:

Putting abstraction in it, sounds also vaguely techie, so that helps.

like:

How did you come up with that definition?

like:

I didn't.

like:

I would love to take credit for that.

like:

No, that definition has been around for a long time within the linguistic

like:

community, and one of the best examples of why it really works is babies, right?

like:

Babies have no idea how to express their thoughts, but somehow they get it across.

like:

when a baby is happy, we can tell when a baby is crying, we can infer that

like:

it needs something, babies are able to communicate without language, meaning

like:

that language is something that we created to shorten the conversation.

like:

The reason I called it an abstraction is we have abstract ideas.

like:

You probably come up to a situation where you're feeling something, and you don't

like:

know the words to really express it.

like:

I think that's a pretty universal human adult thing that has happened

like:

at least once in your life.

like:

That's happened to me a bunch of times, and it really illustrates that

like:

"Oh man, the language that we use is actually describing "what's in

like:

here", it isn't "what isn't here".

like:

it's a hard concept.

like:

Once you get there though, it really helps with LLMs, because you realize that the

like:

language that we're using is a crutch.

like:

And that's all that the LLMs have in the first place.

like:

And so this is another thing that goes towards the miraculous

like:

nature of them working at all.

like:

Is they're dealing with an abstraction of an abstraction at least.

like:

In order to communicate with us.

like:

So let's say that I buy that.

like:

my first question, would be going back to your baby example, isn't what the

like:

baby's doing some form of a language?

like:

what's the line

like:

I'd like it to

like:

what is and what

like:

isn't?

like:

what's the line between, a language and communication?

like:

I like that.

like:

That's a question that a lot of people I bet have and It'll

like:

probably go in the appendix.

like:

We'll probably talk about this in an appendix for curious readers so the line

like:

between just straight up communication and a language is the ability to talk.

like:

there, there are a lot, but one of my favorite ones is the ability to talk about

like:

something that is not physically present.

like:

bees have communication.

like:

gibbons have communication.

like:

Babies have communication.

like:

Babies, though, are unable to express any ideas about stuff that is not

like:

physically present, you can't talk to a baby about theoretical physics.

like:

I mean you can, but what are you gonna get back?

like:

You can talk to a baby about my Star Wars posters, right?

like:

I can point at them because they're right there, but if I'm in a different

like:

room, baby's not gonna be able to talk to me about them And that's

like:

the difference, It's one of them.

like:

That's the one that I'd like to highlight though is that the fact that

like:

we can speak about things that are not physically right here with us, that we

like:

can point at, that's the distinction between communication and language,

like:

because babies are communicating.

like:

But once they get to that point, it really deepens the interaction

like:

that you're able to have with them.

like:

So now, equipped, with all that knowledge, I'm gonna try to prompt

like:

engineer you and give you this prompt.

like:

I'm a five year old baby, that has language now, and who's very curious

like:

about understanding how we got from bag of words, counting frequencies all the way to

like:

LLMs and ChatGPT and people worrying about the Terminator actually coming into life.

like:

Could you walk me through the high level ideas that were important,

like:

build up to what we're seeing today.

like:

The bag of words is really easy to think about, especially if you keep

like:

your tokenization incredibly easy.

like:

Sorry, this is, I'm already out of five year old territory.

like:

You just count words.

like:

If I take that sentence, "you ; just ; count ; words".

like:

Each of those has a count of one.

like:

If I add another sentence, "I like Star Wars".

like:

All of those still have a count of just one word.

like:

And then if I add another, "do you like Star Wars?"

like:

You and star and wars all go up to two.

like:

That's it.

like:

That's a bag of words model.

like:

why is it important?

like:

what can it

like:

do?

like:

I think that bag of words is The first model that we really have

like:

to explain being data-driven.

like:

It's just keeping track of things.

like:

if you look at a bag of words model for your workouts, it's just how

like:

often do you do certain things?

like:

how often are you doing a bicep workout versus doing a pectoral workout?

like:

How often are you doing which thing?

like:

it's just being data driven.

like:

It's the first step, right?

like:

You're not looking at any features.

like:

You're really caring about how these things interact with each other.

like:

You're just keeping track

like:

So I guess with that information from your example, I can guess whether

like:

you, are skipping leg days, and I can see what's important to you.

like:

Or, if I'm counting, words in U.

like:

S.

like:

presidents speeches, I can say, like you described in your book, whether it's a

like:

wartime or a peacetime president, and what they really try to get across.

like:

this is something that you can use for anything you count in soccer

like:

which players make goals how often that is a bag of words model.

like:

You're not tracking words.

like:

It's a bag of goals or it's a bag of, whatever else.

like:

So what's the next step from there?

like:

bag of words was really monumental just because it's so simple, but it's so

like:

powerful because know words you use when you're describing sports is very different

like:

from the words you use describing politics And so just picking up on certain words

like:

and their counts helps us understand the overall subject of what it is.

like:

But it really lacked, any sort of structure, because the order

like:

of words also matter, right?

like:

So the cat in the hat versus the cat's hat, they both have the word 'cat', they

like:

both have 'hat', but mean different things because of the order of the words, and

like:

so that kind of led to, n-gram models.

like:

instead of just simple words, we would also take n-grams, which are,

like:

n number of words in a certain order, and we would start cataloging those.

like:

And so, more than just words, we're getting n-grams.

like:

And that is improving our understanding of the language because now we

like:

have embedded some syntax in it.

like:

We understand some ordering of words and that's able to improve our categorization.

like:

however, from there though, we're not really able to make any predictions

like:

of what next words about to come up or anything like that, when it comes

like:

to bag of words or n-grams they're really more for categorization.

like:

And so that kind of led to Bayesian techniques

like:

and so not to really go deeply

like:

into Bayesian statistics, but

like:

Yeah.

like:

I'm sorry.

like:

Sorry to all Bayesian fanboys.

like:

We're going to go about as deep into this as we did to pragmatics.

like:

it's just you know, based off of the priors of the words that came before we

like:

can then predict the next word to come up and so if every single time after

like:

in text we saw 'I am a man' then it's going to predict that the next word is

like:

man instead of other words that easily could have come up like woman or girl

like:

or boy or cook or professional athlete.

like:

certain things that could come up that are gonna be a lot rarer Like I am an

like:

astronaut like a lot less people have been astronauts in order to say that

like:

it's gonna have a very low probability of being the next word predicted but

like:

it gives us this opportunity to look at what is the next word predicted.

like:

from there, we move on to what's called Markov chains we're swinging

like:

back towards the n-gram model But it gives us a bit of prediction next.

like:

I actually really love Markov chains because they provide very fast

like:

predictive text like Markov chains is essentially what's been fueling like

like:

the predictive text like for Google search and things like that has been the

like:

technology that's really been leading that charge for a really long time.

like:

and it's just a very basic way that we're using Ngrams now to

like:

make predictions of the future.

like:

You can think about it there,

like:

that is obviously I'm

like:

reducing it.

like:

that's not exactly how it works, but it's a bag of n-grams where you

like:

take a state, at each point in a sequence, and look at all the times

like:

that Previewings have occurred in that sequence, and then from that you can

like:

model probability about what comes next.

like:

Instead of just looking at each n-gram by itself, you give it state.

like:

and it's a bag of n-grams.

like:

It's really fun.

like:

It's a probabilistic bag of n-grams.

like:

That's how the chains work.

like:

One of my favorite parts, and I like that you kept track of this quote

like:

here, that Markov models represent the first comprehensive attempt to

like:

actually model language, which is funny, because Markov was not trying

like:

to model language initially, he was just trying to win an argument.

like:

And He eventually used it to, he looked at distributions in

like:

particular Russian authors.

like:

He looked at distributions in, Russian government official speeches.

like:

he knew what he had and he believed in it, and I love that, what a

like:

great piece of history anyway.

like:

continuous bag of words.

like:

Is where we, start essentially taking the logic of a Markov chain where,

like:

"oh, if we keep track of where things appear and how often they appear there,

like:

then it helps us, be able to model for what could appear next", right?

like:

And this is the first moment where we're really coming full circle all

like:

together and going right back to bag of words and just adding context

like:

for position and adding context.

like:

from the context of the bag of words, the literal counting of things, we're

like:

able to create embeddings, right?

like:

I don't know if a lot of people are aware, but bag of words

like:

is how Word2vec came to be.

like:

Word2vec was huge in, I think, 2015, 2016, and it stayed huge, Gensim

like:

is still one of the most downloaded natural language processing libraries

like:

in Python for Word2vec and for GloVe.

like:

Continuous bag of words, just adding that one little thing.

like:

adds all this context so that we can create embeds.

like:

We can create vectors that we can compare between words.

like:

this all comes from the logic of I forgot that dude's name.

like:

Tell me the company that a word keeps, and I'll tell you what that word means.

like:

just what's around the word.

like:

influences its meaning, which goes directly against a lot

like:

of previous linguists' thought that, syntax and semantics are

like:

absolutely not related at all.

like:

That's one of the big things from Chomsky, the colorless green ideas sleep furiously,

like:

nonsense, there's some semblance to it.

like:

There's some sense to it and taking advantage of that with continuous

like:

bag of words, we can create.

like:

like I said, these vectors that we can then compare, and

like:

that's really interesting.

like:

that is what fuels LLMs now, is this exact same continuous bag

like:

of words modeling technique.

like:

It's been built upon a little bit, but that bag of words is still fundamental

like:

to how embeddings are created.

like:

Bag of words and positionality and, like we can get into, the rope scaling,

like:

all of these rotational, plugins that you can use to get longer sequences

like:

embedded correctly, or at least better.

like:

that's one of the hard things when we're talking about language modeling

like:

is what is good and what is better.

like:

a lot of people like to appeal to, this is how humans do it.

like:

I don't know if humans are incredibly efficient when we do it, but.

like:

Like it's fine.

like:

then we get into the 1960s, the very first

like:

perceptrons,

like:

Before we go there, can we spend a little longer on what

like:

the embeddings actually are?

like:

You mentioned words to Vec, you mentioned the words vectors and embedding, but for

like:

somebody, listening to us, from the start, that's probably not clear what that is.

like:

can we delve a little bit?

like:

Yeah, absolutely.

like:

So embeddings are the vectors that come out of models like

like:

continuous bag of words.

like:

when you look at a modern machine learning pipeline, there are multiple models that

like:

you go through and we just attract all of it and call it model, just one model.

like:

When you look at GPT-3, ChatGPT, it has a model that they call it, a byte pair

like:

encoding model to do its tokenization.

like:

And then it has a model to do embeddings.

like:

that model is fundamentally a continuous bag of words.

like:

It's built on top of it a little bit with, like I said, keeping track.

like:

Not just how many times a word occurs, but how many times a word

like:

occurs in particular positions.

like:

and then on top of that, it keeps track of the, flip.

like:

It's either an odd or an even position within a sentence and it assigns

like:

it cosine or sine based on whether it's an odd or an even position.

like:

in order to try to insert some of that meaning back into it, that was taken out

like:

from the tokenization, cause tokenization is just assign each token a number in

like:

a dictionary, and you have a way to get all words into that dictionary, and

like:

then come back out of that dictionary.

like:

So it takes all of the meaning out of it.

like:

It's just one number.

like:

The embeddings attempt to put some of the meaning back into it using positionality,

like:

using continuous language modeling

like:

techniques.

like:

embeddings really simply, they're not perfect, they're just an approximation

like:

of that meaning, and because we are able to put it into a vectorized

like:

space, we're able to take these words, put them in a vectorized space.

like:

We can start doing things that start to make sense and start to make us feel

like:

like we're headed in the right direction.

like:

the classic example is, when we first discovered embeddings, we

like:

took the embedding of 'king', we subtracted 'man' from it.

like:

We then added the embedding of 'woman' and we got the closest.

like:

Embedding to that was 'queen' to that, we start to get this vectorized

like:

space that starts to make sense.

like:

We start to, these words start to have connection to each other and they start

like:

to make semantic sense to us as humans.

like:

however, embeddings are still an approximation, right?

like:

So if you were to do that with kind of every combination, it's interesting,

like:

what do you get when you start, taking words, That don't necessarily make any

like:

sense, like adding or subtracting them together.

like:

what do you get

like:

a good quintessential example of that is you take the vector for

like:

'king', you subtract the vector for 'wolf', and you add the

like:

vector for 'prince', and you get the vector for 'village'.

like:

Or at least pretty close to it.

like:

That doesn't make any sense,

like:

there's still lots of, okay, these are starting to add meaning, not

like:

always, but sometimes, like it's an approximation and embeddings ultimately.

like:

it's something we're constantly trying to learn and improve

like:

If your listeners are wondering how to keep up in space, like embeddings are

like:

probably the number one thing to keep track of OpenAI recently released, logic

like:

for being able to change the size of embeddings, to me, like being pretty

like:

deep into this, it feels groundbreaking.

like:

Because normally you have to structure these vectors so that they're all the same

like:

size and each point within that vector represents meaning negative or positive

like:

and it's very structured and not malleable and so the idea that you could take you

like:

all of your embedding space and change the size of it at your whim Is just amazing.

like:

that's one of the things that I see as a huge groundbreaking piece of technology

like:

that OpenAI is continuing to lead in.

like:

yeah, and if you're ever in doubt for oh man, is this paper important?

like:

If it's about embeddings and doing really cool things with embeddings, probably.

like:

I think the one question for anybody to like picture that, so what's

like:

the dimension of all these vectors?

like:

Is that the entire vocabulary?

like:

Are there different techniques?

like:

yeah, currently the, number one, dimensionality that is an

like:

unspoken industry standard is 768.

like:

that's a number that pretty much every NLP practitioner knows.

like:

like the reason OpenAI's embeddings initially were like really cool

like:

and they thought they were super dense is they were, what, 536, or

like:

1536, which is 768 doubled, right?

like:

You're gonna see multiples of 768 all over the place here.

like:

And that's not because that number is super significant, that's just

like:

the first embedding space that we found that tended to work better

like:

than the others.

like:

So that's the

like:

more art than science part of this

like:

for

like:

It's the brute force testing.

like:

Yeah, before going through and testing, 767, 766, 765 and landed on

like:

that one and it worked, that's the best one that we've found so far.

like:

Even the doubled embeddings from open AI offer a marginal improvement

like:

in that understanding space.

like:

I think we can move on to the multilayer perceptrons.

like:

Okay.

like:

a perceptron is essentially just a linear transformation of data.

like:

If you look at it from a statistical standpoint, if you have three things

like:

about something, You can just add those things together and you get

like:

a description of that thing, right?

like:

Just summing them and, that's like abstracting it a little bit much,

like:

especially if machine learning practitioners are listening to that.

like:

Like we can do linear trans transformations.

like:

that's like the easiest way to think about it for me is you perform one.

like:

action on a group of features and you get something out of it.

like:

That's not by itself.

like:

really helpful.

like:

once you get into having multiple layers of the, this is the MLP, the

like:

multi layer perceptron, once you get into multiple layers where you are

like:

adding these transformations together, and in between those layers you have

like:

non linear activation functions so that you can, create, you can create

like:

nonlinear relationships between sets of linear transformations.

like:

You can get into really cool spaces.

like:

And one of the first things that any machine learning practitioner learns,

like:

at least in a lot of the cases that I've talked to is that just adding

like:

more layers does not make it better.

like:

In fact, the cool part is finding the minimum number of layers that

like:

you need in order to model the relationship between two points.

like:

that's a little bit abstract, I think the quintessential example is like

like:

detecting which type of iris flower.

like:

It is from an image, the, we don't necessarily know how many features

like:

there are, but we can vectorize the entire picture of an iris flower.

like:

And then we can discover that the, I think minimum number of layers is

like:

like five in order to go through and actually get really good accuracy on

like:

detecting which iris flower it is.

like:

yeah, multi layered perceptrons are The feed forward networks.

like:

Those are the basis of everything that comes after it whether it's recurrent

like:

or even Transformers have feed forward networks inside them and that's the basis

like:

of it right there.

like:

How do you choose the sizes and is it all just trial and error

like:

as well for the number of layers, the sizes of the hidden layers?

like:

Are there

like:

Not any

like:

rules that always

like:

work?

like:

Yeah, so going through a feed forward network and this comes from trial and

like:

error, it comes from a lot of people trying different stuff, but generally

like:

you have your Initial dimensionality could be something like 768, right?

like:

Your initial hidden layer.

like:

that's a good number for it.

like:

That's an embedding dimension that we're familiar with, but then we want the

like:

next hidden layer to be double that.

like:

And then we want to go smaller and smaller until we hit our

like:

final output classification layer.

like:

So we want to have a big jump and then small.

like:

What to think about that theoretically is you want to model the number of features

like:

that you are looking for, and then you want to just model double that is just

like:

a good way of saying all the features that we might not know about that we

like:

might not even be keeping track of.

like:

Let's see if the model can figure them out mathematically.

like:

And then we want to narrow it down.

like:

Narrow it down.

like:

Narrow it down until we get to our actual classification, which in language

like:

modeling is what is the next word, right?

like:

Got it.

like:

So double it and then boil it down to the size that you're actually looking

like:

for across a bunch of layers and hope for

like:

the best.

like:

Okay.

like:

and that's why when OpenAI doubled the embedding layers, it was a

like:

marginal improvement, but it's predictable because that's normal.

like:

People do that.

like:

Are there any particular, well known kind of configurations of this neural

like:

networks that just work for a bunch of problems that, something that

like:

you keep seeing over and over, or is it more custom for every problem

like:

you just follow the heuristics that you just described?

like:

as far as model architecture, no, it's basically the heuristics that I

like:

described, and then people will experiment and tune them and find that, oh man,

like:

statistically, If this layer of the model is bigger, then it works better,

like:

but it follows that general structure.

like:

I think, one of the papers that I would point to for this is a bit,

like:

MFIT, where it was, it's basically a methodology for fine tuning.

like:

But it experiments with gradual unfreezing of layers where when you're

like:

training, you will start with only the very last classification layer and

like:

everything else is exactly the same.

like:

And you only train that one.

like:

And then you unfreeze, unfreeze, and test each layer as you're training.

like:

And that tends to help things like even now that is abstracted within

like:

the hugging face trainer class.

like:

And that's abstracted within pretty much every.

like:

model.fit methodology because it works.

like:

Awesome.

like:

What's next in our journey?

like:

probably just the fact that multilayer perceptrons

like:

struggle with sequences, right?

like:

even if you try to embed things and try and keep some of that

like:

positional encoding within your embeddings, they struggle to model.

like:

Multiple things where the order of them matters, right?

like:

which language, which the order matters sometimes, right?

like:

Sometimes it's normal to say gibberish and knowing when is, which is extremely

like:

difficult and to solve that, I don't know if we need to necessarily go

like:

into recurrent neural networks, but we definitely need to talk about

like:

LSTMs, the long term short memories, which are recurrent neural networks

like:

to, start with, but they added some really important things, which, for

like:

example, when I'm talking, you are.

like:

Kind of consciously predicting what I might be saying, you can hear what I'm

like:

saying and you're trying to figure it out as it goes on to understand it.

like:

we call that active listening.

like:

that's what happens.

like:

long term short memories, model that a little bit in that they take the sequences

like:

and they allow the model to try to predict both going forwards and backwards.

like:

instead of just doing the one way.

like:

So that bidirectionality it's computationally expensive.

like:

It takes a lot longer, which is why I think these are not used as much

like:

anymore, but it's really novel and it did help a lot in predicting sequences.

like:

it was phenomenal for language modeling.

like:

beyond that, they like solving the attention.

like:

Within LSTMs, like when attention came out, adding attention to whatever you

like:

were doing was phenomenal where it added an extra layer of non linearity when it

like:

was going through and trying to search for what word might come next, it not

like:

only had all the modeling that we've already talked about, it also had the

like:

ability to search now and search for not that exact thing, but something similar.

like:

And, that just exploded in popularity because it works, it was phenomenal.

like:

However, the difficulty with long term short memories is they're computationally

like:

expensive, they're slow, it's a lot of math that you have to do in order to

like:

get through every single layer of it, let alone trying to predict and stream

like:

those predictions in a sequence, you're going at one token per 30 seconds.

like:

And that's difficult for having models that are the same size

like:

as transformers, for example.

like:

so yeah, it was a lot of really cool stuff that helped us solve

like:

basically how to get to the next step.

like:

It was just computationally expensive and slow.

like:

basically, not very practical in use, but important.

like:

talking about practicality, I think it's great that it's accurate, right?

like:

I think accuracy is incredibly practical.

like:

I don't think that from a customer experience that's practical, right?

like:

Customers don't like waiting a long time for the right answer because

like:

they might be able to find the right answer in that amount of time anyway.

like:

and then from there, do we jump to the attention?

like:

at this point, we've gone through the history of, the field modeling

like:

language, building up and we finally reached attention, right?

like:

And attention is, the backbone of transformers, which is

like:

what LLMs are built off of.

like:

And, attention just adds a non linearity.

like:

And it was just a breakthrough and how we're able to connect the words,

like:

so attention really quickly is just, creating these dictionaries,

like:

key values of, every word to every other word in the token space.

like:

and then it's able to query it.

like:

for each other word, we're able to build.

like:

importance of the other words that are important to it.

like:

And it's in a quadratic space, so it's much more than a linear space, but

like:

it's a reasonable amount of time, to compute these kind of dictionaries,

like:

the key values, and then query them and understand the importance of other

like:

words It's the backbone of what all these, different models are doing.

like:

and even as Chris mentioned, like we could inject attention into

like:

these previous, RNNs, LSTMs, et cetera, but, it was the backbone

like:

of building the transformer model,

like:

which, came out, in the catchy paper, "attention is all you need".

like:

where essentially all they use,

like:

a meme, right?

like:

That we've seen a whole bunch of other papers afterwards.

like:

They're like, "no, this is all you need".

like:

or no, this is all you need, or no, you don't need, but the

like:

reason it's a meme is because they

like:

took out everything that was, supposedly novel about the long

like:

term short memory, the LSTM.

like:

They used only attention and feedforward networks

like:

Could you give us an example of what that would look like

like:

on a very stripped down thing?

like:

What does that dictionary look like?

like:

for visualization

like:

and decode.

like:

no, just for the attention itself, right?

like:

You mentioned a key value from basically every combination.

like:

You have to pre compute every combination within the vocabulary.

like:

You can take a sentence that you're feeding in to the attention algorithm, the

like:

cat in the hat, since I used that earlier.

like:

and so essentially you would have a dictionary where the is comparing to

like:

every other word, cat in the hat, and it's coming up with assimilating metrics

like:

of the importance of all the other words.

like:

And then you would do that for cat, it's going to do it for the in the hat, and in

like:

the cat, the hat, and it's going to come up with A dictionary, essentially, of

like:

key value pairs for all the other words, helping you understand, the importance

like:

of the other words that are in there.

like:

and then the query algorithm, that runs, that essentially helps us

like:

understand being able to predict the next word that's coming afterwards

like:

based off of how important the, all of those kind of dictionaries

like:

are, and adding them.

like:

And so all of,

like:

this happens to happen in quadratic time.

like:

one of the nice

like:

novel

like:

things about this is that the query And key vectors, your query vector

like:

is the word that you're looking at in the utterance and your key

like:

vector is the key in the dictionary.

like:

those two vectors are not one hot encoded.

like:

The way that a lot of we haven't even mentioned this.

like:

But that's a vector that is 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, that's how a lot of these

like:

things had been represented previously, coming off of the bag of words, The idea

like:

that, hey, we can model these things.

like:

We can create vectors that are just did this word appear.

like:

Or did it not?

like:

And where did it appear?

like:

That was a positionality and, attention is all you need, you can immediately see

like:

a problem with one hot encoding in the it's very sparse, especially as you're

like:

getting into 768 dimensions, right?

like:

You have just one 1 and a whole bunch of zeros and those zeros don't really matter.

like:

And so one of the breakthroughs here was using dense vectors

like:

for queries and keys in order to get values that are also dense.

like:

I think one of my favorite visualizations of it, it's from Jesse Vig.

like:

It's called BertViz on GitHub.

like:

I've used this in production environments in order to show that hey, Our model

like:

is not understanding this because look at the attention, all of it is

like:

factoring in, all of the queries are related to the key of the wrong word.

like:

If you look at words with semantic ambiguity, I think the quintessential

like:

one is "time flies like an arrow".

like:

Where flies is also another word that could mean multiple small

like:

little bugs buzzing around.

like:

How do we know that it's not that word?

like:

It's because of the position in the sentence that we know that it is a verb.

like:

and it's referring to time and it's referring to arrow.

like:

And we can see that predictably within attention, because that

like:

word is determined to be important.

like:

That query is determined to be important as it relates to the keys of time and

like:

arrow within query key value attention.

like:

That's what that dictionary looks like.

like:

That's why it's

like:

useful.

like:

And, I guess the representation of the importance, how do

like:

we actually come up with

like:

that

like:

I think it's dot product.

like:

we're comparing the vectors between the query and the key.

like:

dot product attention is, I'm pretty, that's not where it started, but I

like:

think that's where we're at right now.

like:

That's like the industry standard that everybody uses.

like:

It's just, multiplying the vectors together.

like:

Essentially you take the dot product of the two vectors, and that's

like:

where we get the comparison and the relative importance values.

like:

it's not magic, it's

like:

math.

like:

kind of the same thing from time to time?

like:

Okay.

like:

And then with that, we've got the GPT, the generative pre trained transformer model.

like:

What's

like:

so groundbreaking about that?

like:

as opposed to the original transformer, they only use a decoder.

like:

So the original transformer had attention based encoders, which changed your

like:

embeddings into essentially another embedding that was then taken by your

like:

decoder and used to predict the next word.

like:

So it had two networks linked together in the middle in order to produce

like:

Your next word and the reason this is important is it goes back to that

like:

original idea that we talked about a language as an abstraction, right?

like:

The authors of attention is all you need looked at that abstraction and

like:

we're like, Hey, can we model that?

like:

And that's what an encoder is.

like:

When you look at models like BERT, it's taking your input and putting it

like:

into a new abstract space with lots of nonlinear trans transformations and

like:

it's taking your input and putting it into a new abstract space with lots

like:

of nonlinear trans transformations and it's taking your Incredibly useful.

like:

And so the GPT models were groundbreaking, because they

like:

were like, we don't need that.

like:

we just need the decoder and we're just going to use syntax basically.

like:

And the thought process there is that syntax is related to semantics deeper than

like:

linguists are able to really conceptualize in an easy to understand way.

like:

We know that it's true, And we know that it's predictive with especially

like:

looking at how good GPT-3, GPT-4 are.

like:

And even looking at the open source stuff, LLAMA is a decoder

like:

only network and it rocks, right?

like:

I have a suspicion that we're going to hit a point later where, Google

like:

is going to blow everybody out of the water with another T5, like another,

like:

version of that puts the encoder back in.

like:

I don't know how we're going to get to that point, though, because the

like:

decoder only models work so well.

like:

And they're faster, they're less computationally expensive, because

like:

you're taking, probably, a third of the model and just throwing it away.

like:

So you mentioned Llama, and I think that might be a good

like:

segway from what essentially is, about a third of your book.

like:

so for everybody else who wants to go and jump into more details and

like:

see actual Python implementations of a lot of what we just covered,

like:

the book is called Production LLMs.

like:

It's available on manning.com, and I'm pretty sure you're going to love it.

like:

So going back to Llama, let's do a little hall of fame, rundown

like:

of the kind of landmark important models from the last few years.

like:

Where should we start?

like:

I would probably start with the original transformer, like

like:

they deserve credit.

like:

A lot of the, Vaswani and all, a lot of the people who wrote that paper have

like:

gone on to found or co found companies that are now competing in this space.

like:

Whether that's Anthropic or Character.

like:

ai, those are the people that created that Transformer and

like:

they're still building on it.

like:

I think that's the first one that I'd say for the Hall of Fame.

like:

what would you say, Matt?

like:

think part of this question is what is the first LLM versus what is, the first, Hall

like:

of Fame model and yeah, like Transformers, Bert, like Bert, is incredibly powerful,

like:

I think, because it's so small, it's not in the LLM space, it's often overlooked.

like:

And I think many companies are still looking at these massive

like:

LLM models for problems they could solve with a simple BERT model.

like:

But because they're only getting into this space now,

like:

they

like:

think immediately, hey, we have to use an LLM,

like:

right?

like:

And

like:

they didn't care in 2017.

like:

And

like:

And

like:

over what

like:

was there.

like:

and I go back, I said it before, I love Markov chains, like they're

like:

amazing and they're really powerful for what they do really well.

like:

And even then, a lot of people could just use Markov chains for a lot

like:

of the problems that they're trying to solve with LLMs, but, LLMs.

like:

They do give that flexibility, just their massive levels of computation.

like:

I think if I was to point, to a model that I thought was just really powerful, it.

like:

It would be Bloom, actually.

like:

Bloom was essentially the first, LLM massive, large model that was built.

like:

And it was built, completely transparently.

like:

it was a research, project.

like:

funded, a large part by, the French government.

like:

And just, it was built completely transparently and

like:

completely in the open space.

like:

and even though the bloom model today, isn't seen as, a very competitive

like:

model, but like a lot of the open source learnings, a lot of what

like:

we have nowadays is because of what those researchers figured out

like:

while they were working in bloom.

like:

we got amazing, libraries out of it from like deep speed

like:

and other things like that.

like:

it really boosted the open source community, which has been one of the

like:

major driving factors of LLMs today, and probably a large part of why

like:

we could even write our book, cause the open source community wasn't.

like:

At where it is today, like there wouldn't be much we could really

like:

tell people other than oh, You got to go work for Google or Microsoft or

like:

how would We, know any of it, right?

like:

Yeah.

like:

we know, about it largely

like:

because, we've been involved in the open source and we, built off of

like:

what those scientists at Bloom did.

like:

Big

like:

science.

like:

So that's 2022, right?

like:

That's a couple of years now.

like:

Yeah.

like:

and then we had llama that became important, and llama2

like:

Yeah,

like:

even more important.

like:

Yeah, and it's largely just because, I don't remember the username of who

like:

did it, but whoever put that PR on the original llama GitHub that had the

like:

torrent link to leak the weights, that's the hockey stick moment for LLMs, right?

like:

That's what made them available to everybody.

like:

That's what enabled Stanford to create alpaca and show that, oh man,

like:

you can make the model better with like only 50 K responses like you

like:

don't need tons and tons of data in order to fine tune and get very good

like:

results and improve in every metric.

like:

yeah, that everything since then has just been building off of that

like:

exact same momentum of whoever leaked that first llama and Meta

like:

has benefited greatly from it too.

like:

they now have a very open, I wouldn't say completely, but a very open attitude

like:

towards the space because they recognize how, advantageous it is to have other

like:

people building on top of their model and be considered an industry standard.

like:

Yeah they've really leaned into it recently, right?

like:

And like

like:

how big was their

like:

stock jump?

like:

right?

like:

all of the underlying architecture, right?

like:

Like these open source programmers or even just like the video programmers, like

like:

they're able to go in and because they know everything about Lama, they're able

like:

to optimize, cuda kernels and everything.

like:

And so Lama has gotten faster and more proficient, Lama CPP, we're able to run

like:

it with, just on a CPU, there's lots of benefits that because they, gave

like:

us the architecture, it was leaked, but now, they've, leaned into it.

like:

They essentially they've given it to us.

like:

And so

like:

Yeah, we just need them to release the data that they used to train

like:

on it And it's completely open,

like:

right?

like:

but even the data, they've told us a lot about what the data is, right?

like:

we don't have the exact data, but we know essentially red pajama, what those data

like:

sites were built off of, what they were.

like:

And so

like:

we're able to.

like:

replicate it really closely in the open source community.

like:

Llama, I don't know, if we have a really good list of Hall of

like:

Famers because

like:

it's difficult to see what's going to stick around partially because

like:

it's so difficult to evaluate these models as opposed to BERT right?

like:

large BERT had 300 million parameters.

like:

You can run stuff to see how well those parameters are,

like:

like you can hyper tune them.

like:

you can run evaluations to see how each one is performing

like:

and still go relatively fast.

like:

When we're getting into the 7 billion parameter range and the 13

like:

billion parameter range and the 70 billion parameter range, it's much

like:

more difficult and computationally expensive to evaluate on that level.

like:

And we don't even have the ability to describe what all

like:

the parameters are doing.

like:

and so our evaluation metrics are difficult to gauge.

like:

You look at MMLU, you look at a lot of the benchmarks that people

like:

are running, and they're useful.

like:

But ultimately at this stage, we still have to go download those models

like:

and test them against our own use cases to see if they perform better.

like:

And that's incredibly time consuming.

like:

like we could talk about a lot of the models that have come out, like Capybara,

like:

we can talk about New Zermes, we can talk about WizardCoder, and they're all great.

like:

I don't know which ones are going to be the hall of fame.

like:

The next industry standard though,

like:

there's definitely some other models that we love and we talk about in our

like:

book, like Falcon, which came out of

like:

the TII and Abu Dabi, right?

like:

Like amazing model.

like:

It's,

like:

Micu.

like:

the latest Falcon is one of the largest open source models and it's

like:

come, under the Apache 2 license.

like:

So it's completely open source.

like:

the very first model that's fully open source.

like:

there's definitely amazing, progress being

like:

made and lots of different models to be paying attention to.

like:

But yeah,

like:

One of the biggest ones to

like:

pay attention to.

like:

right now, I think is Olmo, not because it's competitive and

like:

performant, but because like Falcon, it is 100% open source.

like:

You can see the data they trained on.

like:

You can replicate exactly their experiments.

like:

that's going to be one of the biggest drivers in this field where, you look at

like:

a lot of the, innovation that's happening and it's happening over on files that

like:

people are passing around on torrents.

like:

It's happening on like random users on Reddit are coming up with NTK aware

like:

scaling and rope scaling after that.

like:

And they're coming up with more stuff because.

like:

They have time, and they want to help and a lot of these people are experts

like:

and they're just anonymous and that's Incredibly important for the space because

like:

we're finding that people who deal with these models and use them 24/7 Have skills

like:

that the researchers don't necessarily have and that's difficult to admit being

like:

on the research part of it But it's true.

like:

so that's the one coming from Allen Institute for AI, right?

like:

The one it has, yeah, I think they're also open source in the

like:

actual training code as well.

like:

the whole

like:

they are the whole

like:

thing.

like:

That's pretty awesome.

like:

So with that caveat out of the way, hedging your predictions, we don't

like:

know what's going to happen tomorrow.

like:

Do you see any one company kind of getting ahead of the others?

like:

The GPT-4 is still holding up well against a lot of these models, which makes me

like:

think personally that they have a few.

like:

Tweaks and hacks they haven't shared, which helps with

like:

their multi billion valuation.

like:

Do you see anybody like running away from the crowds or is it too late now?

like:

The cat's out of the bag and the progress is going to come from the mass of people.

like:

I don't know I know that, I was texting with a couple of people the

like:

other day talking about GPT-4 and, how it is still relevant, even, people

like:

talk about the performance decrease, but it's still relevant, and every

like:

week, every model is, that's coming out getting compared against GPT-4.

like:

And they're finding that most models are more performant in GPT than

like:

GPT-4 on certain things, right?

like:

It's comparing the Rain Man to an average human where, and asking like

like:

what tasks they're good at, right?

like:

If you, if it's going to McDonald's and ordering your

like:

own food, Rain Man is not great.

like:

And you just got to find the model that's better.

like:

a good example for that with GPT-4 is math.

like:

if you need a model to perform calculations for you.

like:

That's not it.

like:

you have Alpha Wolf, you have, Goat, you have, even just Vanilla Llama 2 is

like:

better at math than GPT-4, even though they weren't explicitly training on it.

like:

And I think that they currently have that first-to-market

like:

advantage more than anything.

like:

That's not to say that it's bad.

like:

That's not to reduce the work that OpenAI has done because it is phenomenal.

like:

But that's what's keeping them really afloat is the first

like:

market and the ease of use.

like:

One other question I was holding, as you were speaking with, you

like:

mentioned mixed role and, What is it

like:

called?

like:

Mixed of, mix of experts.

like:

what's

like:

Yeah, mixtral.

like:

Yeah, it's routing.

like:

it's being smart and saying, hey, we don't need a dense feed forward

like:

network for every single thing.

like:

Let's have a whole bunch of sparse networks and just based on the input

like:

route it and tell it which expert is actually going to be the best.

like:

It results in much larger models that are smaller on disc and faster to run.

like:

Is that more similar to how the human brain works?

like:

Because it's obviously not fully

like:

connected.

like:

It's got different regions and stuff like that.

like:

I would love to appeal to that.

like:

authority.

like:

that didn't rock.

like:

I don't know though, because like you look at MRIs and you can see, Oh man,

like:

this portion is lighting up when you're experiencing that emotion or seeing that

like:

input.

like:

But

like:

who

like:

we don't really have a really great mapping of

like:

every person's brain.

like:

I think the connection between a neural net and actual neurons has

like:

been lost a long time ago, right?

like:

how does the human brain work and how does it really compare to modern day models?

like:

Like it's hard to really make that argument, we're still

like:

learning about how we learn.

like:

And as we do, and as neuroscience filled advances, like ultimately leads to

like:

advances in the AI space and vice versa.

like:

there's definitely connections there.

like:

but yeah, as far as your question goes, I think it's anybody's guess.

like:

I think this is a perfect note to end.

like:

A little bit of suspense.

like:

we're going to have to get you back at some point when you've finished your

like:

book and talk a little bit more about the actual technical problems and challenges.

like:

We haven't really touched upon any of that yet, but today I certainly

like:

learned a lot from you and I hope a lot of our listeners will as well.

like:

It was an absolute pleasure to meet you both.

like:

Thank you so much and see you next time.

Links

Chapters

Video

More from YouTube