Decoding the Past, Present, and Future of Language Models
Delve into the realm of language models with a comprehensive exploration spanning from the foundational Bag of Words approach to the revolutionary technologies of Transformers and GPT. This script not only unpacks the technical evolution and mathematical underpinnings of natural language processing but also projects the future trajectory of these models. It highlights expert insights on the societal impacts, the convergence of artificial intelligence with human cognition, and the ethical considerations of AI progression. Moreover, the discussion extends to the significance of open-source efforts in shaping this dynamic field. Aiming to provide a profound understanding, this guide navigates through the complex landscape of AI, language models, and their implications on future technology and society.
0:00 Welcome to HockeyStick: Unveiling the Power of LLMs
01:08 Meet the Experts: From Meetups to Authorship
03:16 The Hockey Stick Moment for LLMs: Breakthroughs and Realizations
07:48 Coding with LLMs: The New Frontier for Developers
15:39 The Pitfalls and Limitations of LLMs in Practice
21:43 Building vs. Buying LLMs: Navigating the Trade-offs
32:43 The Cost of Crafting Your Own LLM: Insights and Advice
42:48 Deciphering LLMs: A Crash Course in Language Features
50:44 Defining Language: A Philosophical Dive
51:33 Exploring the Essence of Language and Communication
54:31 Diving into Language Models and Their Evolution
55:08 From Bag of Words to N-Grams: The Evolution of Language Understanding
58:35 The Leap to Bayesian Techniques and Markov Chains
01:01:24 The Breakthrough of Continuous Bag of Words and Embeddings
01:09:43 Unveiling the Power of Multilayer Perceptrons
01:15:08 The Revolution of Attention Mechanisms and Transformers
01:26:37 The Hall of Fame: Landmark Models in the LLM Landscape
01:35:06 Predicting the Future of Language Models and OpenAI's Position
01:38:48 Concluding Thoughts and the Future of AI Research
I'm Miko Pawlikowski, and this is HockeyStick.
Speaker:LLMs, or Large Language Models, are taking the world by storm.
Speaker:This breakthrough artificial intelligence technology promises to fundamentally
Speaker:reshape the way we work with computers.
Speaker:Over the last year, we've witnessed its Hockey Stick moment, and as
Speaker:of early 2024, We're firmly in the Cambrian explosion phase.
Speaker:Today, we're taking a deep dive into how this models came from humble beginnings to
Speaker:making people scared of imminent Skynet.
Speaker:I'm joined by two experts, Chris Brousseau, staff machine learning
Speaker:engineer at JP Morgan, and Matthew Sharp, MLOps engineer at LTK, the
Speaker:authors of "Production LLMs" currently available in early access at manning.com.
Speaker:In this conversation, we'll cover the intricacies of human language
Speaker:and how machines can understand it.
Speaker:Give you the vocab to sound smart to the next family gathering and discuss the
Speaker:various mathematical ideas and models ultimately leading to LLMs, as well as
Speaker:some noteworthy examples beyond Chad GPT.
Speaker:Welcome to this episode and please enjoy.
Speaker:where should we start?
Speaker:How did you guys meet?
Speaker:we happen to both live in Utah, and we
Speaker:actually met at a meetup.
Speaker:It was actually an MLOps meetup, was the primary one where we met.
Speaker:It happens once a month and we'd get together, and so that's our origin story.
Speaker:we became friends through there, started helping each other, with,
Speaker:content creation, Chris was starting a YouTube channel, I write on
Speaker:LinkedIn, just giving each other feedback and helping each other out.
Speaker:It was especially helpful because I was trying to figure out how
Speaker:best to present a lot of the material that's in our book now.
Speaker:how do you explain a transformer model?
Speaker:And Matt was fantastic about helping me, find my voice on YouTube.
Speaker:Okay, so going from meeting someone at a meetup, to committing
Speaker:to spending a a couple of years working on a book from someone:
Speaker:that's a little bit of a difference.
Speaker:Was there any particular moment where I just clicked?
Speaker:"Oh, we need to write a book".
Speaker:How did you come up with the idea?
Speaker:I was approached and, I would love to write a book, but I don't
Speaker:know a lot about that process.
Speaker:And obviously, I didn't really have an authorship voice.
Speaker:I am not experienced in content creation.
Speaker:And while I was going through the process of talking with some different
Speaker:publishers, Matt approached me and said: "Hey, I was a technical reviewer
Speaker:on the fundamentals of data engineering by Joe Reese and Matt Housley.
Speaker:And so he had experience and he had, subject matter expertise, and he was
Speaker:giving me some advice and I said, "You know what, why don't you just
Speaker:come on as a coauthor?, You obviously could help a lot here ,and I need
Speaker:it, so let's just do it together".
Speaker:yeah, I think that it worked out really well because Chris has that background in
Speaker:linguistics, he understands the natural language processing side better than
Speaker:anyone else I've met in person, and I was coming more from the MLOps side,
Speaker:how do we actually deploy these things?
Speaker:And so I think it's really rounded out our book better than, anything else I'm seeing
Speaker:out there that you could buy and read.
Speaker:getting that diverse perspective, I think, really helps our book out.
Speaker:I was very excited when you said 'yes' to coming onto this because since last
Speaker:year I think in most people's minds sometime early last year with chat GPT.
Speaker:All of a sudden, everybody started talking about large language
Speaker:models, and some people started worrying about, impending doom and
Speaker:robot apocalypse, and all of that.
Speaker:But from a perspective of someone who's worked, with that for best
Speaker:part of a decade now, I'm wondering.
Speaker:what was the point when you realized that these LLMs, they're really onto
Speaker:something and they're moving from, a demo to an actual legitimate technology
Speaker:that's going to change things.
Speaker:What was the hockey stick moment for LLMs
Speaker:Oh, boy.
Speaker:for me, without a doubt, that was the release of T5.
Speaker:And looking at Google's paper about the text-to-text transformer, that set really
Speaker:the groundwork for prompting, right?
Speaker:They had a whole bunch of different tasks that you didn't have to change
Speaker:anything other than some statement.
Speaker:For the model to do that task, and then a colon and then whatever
Speaker:your input was going to be anyway.
Speaker:that was groundbreaking to me.
Speaker:I had been messing around with GPT2.
Speaker:I'd been playing with that and trying to shoehorn it into a
Speaker:product where I was working.
Speaker:T5, did everything that we were trying to do with GPT2, and it was incredibly
Speaker:flexible, it was easy to fine tune, and for me, that was the hockey stick moment
Speaker:that "oh wow, no, they're really cooking".
Speaker:when is that?
Speaker:for anybody who hasn't heard of heard
Speaker:T5?
Speaker:I think it was 2019, Yeah, exploring the limits of transfer learning
Speaker:with a unified text to text transformer was October in 2019.
Speaker:it came out in October.
Speaker:I think I picked it up in November-December of 2019.
Speaker:Yeah, I think for my hockey stick moment, like I was, in the industry
Speaker:been paying attention, obviously GPT2 coming around, T5, etc.
Speaker:But wasn't really seeing the adoption that someone who's working in MLOps
Speaker:cares more about I was seeing, , these models can do really cool things,
Speaker:but people weren't caring about them.
Speaker:Sam Altman even said it was like, "we didn't think GPT-3
Speaker:would be that big of success.
Speaker:We thought that would once GPT-4 came out.
Speaker:but I just remember, January 2023.
Speaker:ChatGPT's been out a month.
Speaker:it's still essentially in beta.
Speaker:They just released it to get feedback and to start collecting data.
Speaker:to start improving their model.
Speaker:but it blew up, right?
Speaker:I just remember being at a church function and this guy sitting
Speaker:across the table from me who has no idea anything about AI, right?
Speaker:I was stuck in this table for an hour and all he could talk about was GPT-3.
Speaker:he was obsessed with it.
Speaker:I'm like, oh, wow.
Speaker:even people who don't know anything about, machine learning or AI or the
Speaker:industry were like, really going gung ho and his wife was an English teacher.
Speaker:she was really scared of it and was like, "how are we gonna help kids
Speaker:learn how to, write and read when they can just go online and now cheat
Speaker:and write these things and stuff".
Speaker:The very beginning of what, like everyone's had conversations about now,
Speaker:but like he talked about how his brother in law owned a website that made fake
Speaker:articles you can think like the onion and so once it came out in that month like
Speaker:I said chat GPT still wasn't a product yet, and anyone who's been following
Speaker:it knows a lot of those demos just shut down and then never came back up
Speaker:His brother in law ended up firing like a hundred writers because he's
like:"Oh chat GPT can make these funny fake articles and we're good, right?"
like:that was my hockey stick moment of "okay we really are changing
like:when some random guy at church is talking about it all the time".
like:Yeah, I love that example.
like:But even for people who are in tech who weren't directly following that
like:very closely, that was a scary moment.
like:I remember when I first used a copilot, I was like, what, it just does that.
like:And three out of four, it would actually work.
like:that was a scary moment.
like:It reverberated through a lot of levels of society, including, our own.
like:And, I think in many ways, technology and writing code might be the easiest
like:use case for, this kind of models, right?
like:Do you agree with that?
like:I don't know if I completely agree with it, because, code is incredibly
like:syntactically dependent, right?
like:every developer who's worked with JavaScript or C++ and then moves
like:to Python, they feel it, right?
like:That's one of the biggest complaints is "I hate Python syntax".
like:"I hate that white space matters", it's a little bit more complex than just
like:repeating whatever natural language happened, but you're absolutely right
like:that is one of the best use cases so far.
like:because, it's better structured than just spoken language, or is there any
like:other reasons that make it so well suited for that particular application?
like:programming languages are not real languages, right?
like:one of the things that makes it simultaneously very well and ill-suited
like:for it is how much gets repeated, You use the exact same words.
like:The exact same tokens to define every function that you make, but then the
like:function's name can be whatever you want.
like:And so using the exact same tokens is awesome.
like:That provides landmarks for the probability as it's
like:going through all of this.
like:But then that input to just say whatever you want and put it in camel
like:case or snake case or whatever, tons of different formatting for functions.
like:it makes it a little bit more difficult.
like:Especially while you're trying to tokenize that,
like:one of the big benefits with code is the amount of data we have around code.
like:lots of people are writing code.
like:they all have very similar ideas of what they're trying to do, of
like:what they're trying to architect, of what they're trying to design.
like:and so we're not necessarily worrying about, hallucinations or
like:fake news or, people disagreeing or other things like that.
like:there's just a lot of data, that all agrees with each other and
like:pushes in the same direction.
like:It makes it good.
like:there's obviously some negatives of just assuming, some of these LLMs writing
like:code is going to do things well, but, I think Chris highlighted that already.
like:it's actually really similar to how regular languages work.
like:If we have more python data, like Matt's saying, it's going to do better at python.
like:And that can create a little bit of a positive feedback loop with LLMs, where
like:a lot of people want to get into python, and they're very good at it, but then
like:when you look at emerging languages like mojo, for example It's really difficult
like:to find that data and so LLMs are worse at it, similar to natural languages
like:that have a lower number of speakers, a lower presence on the internet,
like:So is the solution to use an LLM to generate a lot of Mojo and make it
like:a significant percentage of GitHub?
like:that'd be fun, dude.
like:I think there are some problems with synthetic data that can lead
like:to stuff like model collapse.
like:I don't know if we're going to see that in the code space, though.
like:I think we could see that in natural language.
like:So that might be a valid solution.
like:Okay.
like:the date is 13 February, the day before Valentine's Day 2024.
like:I'm going to ask you for a wild prediction.
like:Where do you see that going?
like:Should, all kinds of, or maybe any subset of programmers who, produce code as a
like:job, should they start at least worrying?
like:Is that something that's going to, decrease the pool of available jobs,
like:no, I don't think it's really going to impact the amount of work.
like:I just think about my job, and even when I'm in very technical roles, and I'm
like:spending 50% of my time on the keyboard, still, it feels like a majority of the
like:work is still just communicating with stakeholders, understanding exactly what
like:the problems are, technical writing, design docs, really understanding at
like:a high level, what you want to build.
like:To be fair, programmers have been automating the 'writing the
like:code' portion forever, right?
like:From the beginning.
like:yeah, with massive amounts of like scripts and configs that they use.
like:And that's why they love Vim or Emacs still, right?
like:It's because they have it configured just right.
like:And they can move really quickly, because it provides a lot of that
like:automation for them already, but this is just helping junior engineers
like:already have all that configuration and set up really quickly, right?
like:It mostly will just make our jobs a little bit easier, it doesn't remove the need to
like:really understand the engineering aspect, the architecture aspect, the design
like:aspect that still is involved with coding.
like:Oh, yeah.
like:this is why we love comparing LLMs to a printing press.
like:That Johannes Gutenberg.
like:Because did that destroy the writing industry?
like:All it did was it destroyed the monopoly that certain organizations
like:had on publishing books.
like:Before you had to get a scribe and you had to pay the scribe and you had to
like:have access to scribes You couldn't just walk up to a printing press and
like:hit it and then boom you have a book.
like:You have to have knowledge You have to have an idea.
like:The printing press just gives you a lower barrier to entry
like:Which is what we love, right?
like:For coding, I think Matt is exactly right, that it's a lower barrier to
like:entry for junior engineers to be able to produce significantly better work.
like:and in some ways it actually accelerates it, because when you copy and paste what
like:an LLM gave you and it doesn't work, you have to go figure it out, right?
like:With the junior engineers, it also helps speed up senior engineers, and
like:staff engineers and principal engineers.
like:it's good, and lowers the barrier for the entire industry, we like that.
like:Yeah.
like:I've lately been spending lots of time writing chapter 10 of our book,
like:and in chapter 10, we actually go through a project, where we help you
like:build your own co pilot and we build the VS Code extension to get it in.
like:if you want to be running your own LLM on your own computer with your own data,
like:so that way, you can get your own things.
like:we walk through all the steps to do that.
like:And in some aspects, it's interesting cause sometimes.
like:adding an extra feature, made the model work, right?
like:there's still just so much to learn about it.
like:ultimately, it comes down to your data, right?
like:how good is your coding data?
like:is really how well the co pilot works, right?
like:SQL is one of the most repetitive of all of the programming languages.
like:but true skill with SQL does not involve being good at SQL.
like:It involves knowing the data, right?
like:It's knowing which tables to query, how to merge them, how window functions, all of
like:that stuff, knowing exactly what you need to be looking at is the true skill in SQL.
like:And we're hopefully getting to a point where we can help the
like:model know the data, right?
like:We can give it some sort of context for the data that it's going to be looking
like:at, so that it can generate good SQL
like:that's a really good point.
like:I've actually had, lots of mentees who are trying to learn SQL for the first time.
I said:"just use ChatGPT", generating SQL is actually something that's really
I said:good at, you don't need GPT-4, like even GPT-3, like even GPT-2, it's not
I said:hard to generate really good SQL syntax.
I said:Cause it's so simple, it follows a very similar structure.
I said:But ultimately, you can have it write the SQL, but you're going to have to
I said:go back and figure out how to connect all the pieces and understand your
I said:database and understand your data.
I said:that's a perfect example, understanding how to write the
I said:code is only half the problem.
I said:Understanding how to integrate it is really the bigger problem.
I said:What's the most terrible use case, that people are currently
I said:trying to use LLMs for?
I said:What does LLM in general, or LLMs, what do they suck at the most?
I said:I'm going to say they, they suck at, sequence prediction, which sounds so off.
I said:Because that's what they're made for, but one of the things that I'm seeing
I said:people do, is try and automate entire workflows with LLMs, and they're trying
I said:to get the LLM to just do the whole workflow and they suck at that what
I said:they need all of this stuff to help it.
I said:They need tools, they need rag, they need specific fine tuning landmarks
I said:and they need few shot prompting, they need all sorts of stuff to make
I said:it work, and then it's still up in the air about whether or not it will
I said:do the right task in the right order.
I said:Yeah, I was thinking, I don't know how much I'm seeing this.
I said:But, three months, six months ago, I was hearing a hundred horror stories
I said:about, essentially CEOs being like, "we need LLMs" and like their magic,
I said:they can do anything, And so it didn't matter what the problem was, "oh, we need
I said:to, do outlier detection using LLMs".
I said:No,
I said:use stats for that.
I said:yeah, outlier detection is really a statistical problem.
I said:It's really a data and math problem.
I said:LLMs are good at natural language.
I said:And so when we can solve a problem using words and communication,
I said:that's when LLMs can get in.
I said:But problems like, outlier detection or weather prediction or these
I said:other things, we have, algorithm.
I said:stock market prediction, Super Bowl prediction,
I said:All these things, we have better ways to make predictions.
I said:And it's called math, right?
I said:Fourier transforms, other machine learning algorithms, other things like that.
I said:LLMs are not good at doing those things, cause we don't talk
I said:about them in natural language.
I said:we've invented other languages like math just to describe them
I said:And that's why they're not good.
I said:we can make tools, you can build functions for an LLM to use to do Fourier
I said:transitions and whatever else, right?
I said:But getting the LLM to know that it needs to do that is really difficult.
I said:Probably just as difficult to, as explaining what the Fourier transition
I said:is to an LLM within your training data to get it to be able to replicate it.
I said:This is one thing that makes it almost miraculous when stuff does
I said:work, and that's that feeling that we're chasing right now, and that's
I said:the replicability that we're trying to help people get to in a book.
I said:how do you actually do it, and how do you make sure that your scope
I said:is small enough, that it will work repeatedly and you can build a
I said:product off of it, that's difficult.
I said:I'm a big fan of chess.
I said:And, since ChatGPT came out, lots of people have been making memes, or just
like:"Hey, I'll play ChatGPT in chess", and ChatGPT can play chess because we
like:can talk about it in language, right?
like:Like E4, move the pawn, or knight to g6, whatever it is.
like:we have language of it, but ChatGPT has no idea.
like:It has no idea the model behind those letter number combinations.
like:all it knows is that there's certain things it can do, right?
like:it writes words, and so when they do this, and these like videos or
like:memes, like they just let ChatGPT do whatever it says, right?
like:it just magically creates a knight out of nowhere, and magically, will take
like:its own pieces as it moves its pieces around, it's always pretty funny.
like:And even though it's cheating the entire way, it almost always loses, right?
like:Cause It doesn't have an understanding of chess, like it doesn't
like:have that model underneath it.
like:sure we can talk about it in language, but not really, right?
like:So we, we still have better ways to play chess, alpha zero, et cetera.
like:Stockfish, like there are engines out there that play chess really well.
like:And we don't need to make LLMs good at chess, but that's a very good example
like:of one of the things it's not good at.
like:I've seen someone on Twitter who said "I'm gonna give LLM $1000 or
like:whatever initial amount, and I'm gonna ask it how to best invest it.
like:I didn't follow where it went.
like:But I think a lot of people had the same idea.
like:this is some kind of genius system.
like:I'm just gonna be its flesh and bones agent in the real world.
like:and hope for the best.
like:So I think that kind of goes back to your chess thing.
like:So excuse me for that, but I have to ask you the AGI,
like:Artificial General Intelligence.
like:Any chance for that happening anytime soon?
like:What's your prediction?
like:not with our current systems.
like:No, I don't think AGI is ever going to come out of quadratic
like:equations, like not a single chance.
like:maybe if there are better dropping sub-quadratic replacements, stuff
like:like hyena, I've tested that out.
like:I think it's really cool.
like:But, the fact that attention, the query key value attention,
like:ultimately generates complex numbers.
like:I think that is a little too much for AGI at the moment.
like:So you're not one of those people who secretly hope that OpenAI has
like:something they're gonna release soon.
like:I don't think they have it, right?
like:I'll be hopeful, sure.
like:If it comes out, that's great.
like:Yeah, I'm of the same mind as Chris.
like:I hope they keep pursuing it.
like:we've gotten major breakthroughs from what they pursued.
like:It's very possible AGI will happen in my lifetime, I'm still pretty young We
like:keep on making advances really quickly, but are we relatively close to it?
like:Probably not.
like:No
like:Oh, the thing about progress though is that it's very rarely linear, It
like:tends to have a very weird curve.
like:So that's why all the predictions are so funny, but hey, I had to ask you anyway.
like:No, I think it's a great question.
like:Okay, let's delve a little bit into, a portion of your book,
like:It's basically describing the two options that you have today.
like:you can either go and pay some money to OpenAI, maybe Google, or
like:somebody else, or you can build,
like:So you've got buy versus build.
like:Could you talk to me a little bit about how someone would decide
like:about this as of february 13, 2024.
like:What's the things to consider, and what's the weights that
like:you would put in, and biases?
like:the basic consideration is just your use case, right?
like:If you just want to test something out, you're a student and you don't have a
like:lot of budget, and you want something up and running so that you have LLM
like:experience, I would say just, shell out for that, ChatGPT+ or buy Anthropic
like:or Google Bard has a fantastic API, or I guess Gemini now just do it.
like:it's not that big of a thing.
like:If your product that you're trying to ship is inconsequential and you
like:don't need it to be right every time, you just want to sprinkle the
like:AI pixie dust on it, just buy it.
like:If your use case goes deeper than that, though, if you want to be able to build
like:your own, if you need to make sure that it says the right things all the time,
like:if you need it to behave a little bit more deterministically, There have been
like:probably a thousand case studies in the last year of people building products on
like:top of ChatGPT and then OpenAI rolling out an update that changes how chat
like:GPT behaves, and they don't have any way to measure all of the different
like:ways that it will change it, right?
like:There are 176 billion parameters in GPT-3 alone, they don't know it's going
like:to break your program down the line.
like:they're just going to update it for what they consider to be better.
like:And those programs break constantly.
like:that doesn't mean you can't fix them.
like:It's just a much bigger problem of maintenance, than I think a lot of
like:people are expecting going into it.
like:So If you want to have to maintain it less, build your own.
like:Yeah, I think the other aspect is like you want that control, right?
like:there's lots of examples of companies who, essentially built a small shell
like:around ChatGPT that did something unique.
like:And then, months down the line, now ChatGPT just does
like:that out of the gate, right?
like:their value proposition just completely disappeared.
like:And that's because they didn't have control over the model.
like:They didn't have, control over, what it did it's just interesting, right?
like:Because I say these things and things have changed over time.
like:But when ChatGT first came out, it was free, it was a demo, and they were
like:specifically doing it to collect data.
like:And that's what they did, they used collected data to improve their models.
like:And that's what they continued to do for a while, right?
like:Oh no, they're back.
like:They it's terms and service, right?
like:If you want them to save your chat, so that you can return to
like:it and ask more questions, they get to train off of your data.
like:So if you want to put anything private or sensitive in there, like
like:it's over, you've just leaked it.
like:they're back and forth about what data they're collecting, what data they're
like:not collecting, and if you're with an enterprise customer, like maybe you
like:can make certain rules and things like that, and oftentimes they won't, it's
like:a minefield, for how people are using it, and so it's just something important
like:to take into consideration, if your LLM model is doing something magical,
like:that's really core to your business, that is really driving customers.
like:You want to control that.
like:You want to make sure that the model is working exactly as intended.
like:You're not getting updates randomly, that break your application.
like:You're also controlling the data flow, you're making sure that you're not
like:accidentally training your competitor's model, and other things like that.
like:And there's just lots of aspects where it's just important to
like:make sure that you own it.
like:And, no, that's not necessarily everyone's concern, right?
like:if you're a student or you're just doing some side project or anything, there's
like:lots of APIs out there that are very cheap that can get you up and running,
like:there are literally hundreds of hugging face spaces that are free APIs.
like:With, have LLMs running behind them and you can just hit
like:them whenever you want, right?
like:unless you're queuing behind a thousand other people.
like:yeah, exactly.
like:I liked the example you gave in the book, I think people at Latitude, the Dungeons
like:& Dragons people would agree with a lot of what you're saying now, but can you tell
like:the story of what happened with them?
like:Latitude, is a local company, that was here in Utah.
like:it was put together by, two guys from BYU.
like:GPT-2 came out several years ago.
like:They're like, "Oh, this is mind-boggling.
like:Let's build a game off of it!"
like:And what they came up with was like a dungeon crawler, a text
like:based game it was really neat, because it would just generate, an
like:infinite amount of opportunities.
like:And so it created this 'choose your own adventure'.
like:It got relatively big in the space, and lots of people enjoyed playing it.
like:things were going really good, and then OpenAI GPT-3 came out, they offered it to
like:them, hey, we can, we have this new model, it's a lot better, why don't you try it?
like:they played around with it, and "oh yeah, this is, it's much more descriptive,
like:it's much more interesting, it's really great", There was a lot of excitement
like:around it, however, it turned out that the model itself, had a propensity
like:to, generate smut, and it got really concerning people would write like,
like:"I'm an eight year old girl", and then the model would complete it saying
like:"....and I'm wearing a skimpy outfit",
like:And oh, whoa, like the player didn't want that, but like the model generated it.
like:there became this big feud between OpenAI and Latitude about creating filters.
like:"hey, we don't want your players doing that.
like:We don't like that".
like:And, Latitude's "okay, we'll create some filters" and things like that.
like:And it devolved really quickly.
like:Latitude being a very startup, not necessarily knowing everything
like:they were doing, they built a very shaky filtering system, and then
like:OpenAI was "that's not good enough".
like:So then they started banning players, and so eventually we got to this
like:territory where players - paying customers would be playing a game, the
like:model would randomly generate, something that the filtering system didn't
like:like, and then they would get banned.
like:Cause it's like the game just did itself.
like:It was a very complicated time, and there was lots of back and
like:forth between Latitude, who's a small company, and OpenAI.
like:There's lots of ' he said they said' going on, but ultimately, it's just this
like:position where Latitude They had this game that was completely dependent on OpenAI's
like:model to generate good output, and it really caused a lot of drama between
like:the players and Latitude and, OpenAI in the background and that is a critical
like:example of LLM was very critical to their business, If they owned it, then they
like:could have controlled it, they could have made sure that from the model aspect,
like:they could have trained the model to make sure it didn't do any of those things.
like:And then they would never need to play the little blame game, right?
like:Nobody likes to play that game.
like:That's whose fault is it, that the model is generating bad stuff.
like:Is it the player who's prompting it?
like:Is it Latitude who has some systems for tokenizing and preparing player
like:output before it goes to OpenAI?
like:Is it OpenAI because their model is generating that?
like:Is it Latitude for post processing the content from OpenAI before
like:they serve it to the player.
like:I don't even know if it really matters who's to blame.
like:it's just a sucky game to play.
like:and that's like the ultimate example of why you might want to consider
like:build versus buy is if you buy from any provider, we're picking on OpenAI here,
like:because they're a big player, but you buy from Anthropic, you buy from the guys down
like:the street, the startup that just barely came up and they're offering for half
like:the price of whatever, Buy from anybody,
like:and you will eventually have to play that blame game.
like:we had another example in there of some lawyers who generated, cases that didn't
like:exist they asked ChatGPT about cases and it came up with a perfect response.
like:a little too perfect.
like:It hallucinated stuff that didn't exist.
like:and, is it ChatGPT's fault?
like:Is it OpenAI's fault for, allowing their model to make
like:stuff up and behave dishonestly?
like:Or is it the lawyer's fault for not checking it?
like:who cares?
like:the problem is that it's not locked down.
like:It's qnon deterministic.
like:Yeah, in a way, as I was reading the chapter on that, it makes
like:me think of using a machine to maybe do some farm, work.
like:Let's say that you're plowing a field and you're using a
like:horse versus a machine, right?
like:A machine might break, but in a predictable way.
like:And if you've got a mechanic around, they'll come and fix it.
like:A horse can get scared, or it has a bad day, or it can be moody.
like:And it can come up with something new.
like:So you always have to be careful with that.
like:is that an accurate feeling of someone who's working with this LLMs day-to-day?
like:You work with some kind of animal?
like:One of the most annoying things is even if you set the seed of it, so
like:the random generator is going to be the same every single time, you
like:can still give it the same prompt and get something different out.
like:The truly awesome thing about LLMs is the number of non-linear activations
like:that are going through the model, right?
like:It's creating incredible, non-linear jumps throughout that dimensional
like:space that the embeddings are in.
like:you just can't really predict it.
like:It is a little bit like an animal.
like:the fact that like we can prompt engineer at all.
like:it's a little bit telling of where we are, right?
like:Cause like prompt engineering, you can change the spaces, the white space
like:inside of your prompt and it can end up giving you a completely different result.
like:we're still in a very interesting area, where we're trying to create
like:better ways to communicate with the LLM and get predictable outputs.
like:But, the fact that we can do that at all is.
like:This is a bit of a miracle, right?
like:you can't do that with a human.
like:a human isn't going to be tricked into saying something different.
like:humans are tricked all the time, but not necessarily in the
like:same way that we do with LLMs.
like:it's a very interesting world we are in, and a lot of people are having
like:that horse versus machine experience.
like:let's talk about the cost a little bit.
like:you mentioned that it's super cheap to pay some big company to use their thing.
like:let's focus for a minute on the cost of actually building your own LLM.
like:if I wanted to build one of this foundational models,
like:Let's say that I take one of those 75TB corpora from the internet and I'm
like:feeling particularly GPU poor that day.
like:How much money do I need to have in my little piggy bank to get something useful?
like:That's difficult, man.
like:because you're either paying for a GPU, right?
like:Or a suite of GPUs in order to parallelize it so that you can ingest
like:that over a short period of time.
like:Or technically with a lot of this stuff, you can load it onto a [Geforce] 3090,
like:I've done this personally, you can train in FP16, you can train up to, about, 13
like:billion parameters pretty effectively, and pretty cheaply, on a 3090.
like:You have to be a little bit smart about your data loading, you have to make
like:sure you're streaming stuff you have to pay for the data storage anyway, it's
like:incredibly slow, you have to do gradient checkpointing, you have to, do like
like:gradient accumulation steps, which slow down the training even more, I trained a
like:little bit bigger than that, it was about a 20 billion parameter model on my 3090,
like:but what I don't, generally talk about is it took a year of just running to do that.
like:it was horrendous and that all culminated in a company giving me a
like:cease and desist, so I couldn't even release it, so you're either paying.
like:A lot of money, hundreds of thousands of dollars in order to get something quick.
like:Especially with 75TB of text or more, grab your own data, get
like:more data, and you're paying to store and to process all of that.
like:And that costs tons of money.
like:Or you are not paying the money, but it takes a really long time and makes all
like:of your shareholders really frustrated because you're ruining go to market.
like:You're taking too long.
like:You're not going to be the first in the space, It's a huge trade off
like:as with many things, you can trade time or money, and
like:training an LLM is very similar.
like:I think they estimated, huge models that we see, like ChatGPT things.
like:You're probably paying somewhere like what was it like a half million?
like:I think they say, and that's just for the training, we're not even
like:talking about all the experts you have to pay and buy in order
like:data curation,
like:man.
like:on the very far end on the expensive side.
like:it gets really expensive really quickly to train these models, just because.
like:buying enough GPUs in order to parallelize this to do it within, reasonable time and
like:just the sheer volume of data you have to run through to train all the parameters.
like:It gets really expensive, but on the other end there's lots of good
like:open source models that have done that main pre-training already.
like:And so you can grab one of those, you can train it with something like
like:Laura, which you, only need a handful of samples and maybe like 10 minutes
like:if that, and you can train it on a very, simple GPU and you have something
like:fine tuned for what you need, and you can get under $200 is very reasonable.
like:$150, $20.
like:It's very possible to train, these models with certain
like:methods to get what you need.
like:So does it mean that in a kind of natural, almost biological like evolution we're
like:going to end up with few primary models that a lot of the different models branch
like:off of, instead of, reinventing the wheel?
like:That's where we're at currently.
like:I hope that it doesn't stay that way, because I really enjoy seeing
like:new people create new models for new use cases and all this stuff.
like:so I hope it doesn't stay that way, but I do see a lot of value in creating industry
like:standards, at least around how you are actually writing the binary files, how
like:are the weights actually being stored?
like:What do the different layers look like?
like:I, think that standardizing what the model looks like so that you can load
like:it as flexibly as possible is awesome.
like:I would like to see more open source models, which is funny considering
like:there are thousands of open source fine tuned versions and hundreds
like:of open source foundational models on the Hugging Face Hub right now.
like:I want more, right?
like:I'm greedy, man.
like:To me, it sounds like basically every week there is another one that's better
like:at something and if you look at the Hugging Face LLM leadership board, it's
like:changing by the hour, literally and it looks like a gold rush in many ways but
like:I like this gold rush much better than the crypto one, couple of years ago
like:Yeah, man, there's a lot higher chance that you'll come out
like:of this gold rush with a great product than with the crypto one.
like:yeah, there's a lot there, and just to summarize that into one sentence,
like:you can probably fine tune even a gigantic model for around $200 to $500.
like:And you can go lower than that.
like:Even if you are smart about how you're doing it, versus training from scratch,
like:which either is going to take an inordinate amount of time or will cost
like:thousands and thousands of dollars.
like:So I'm willing to bet money that a lot of our listeners are going to pause
like:this now and start Googling furiously.
like:How do I fine tune a model?
like:Where would you point them as a good starting point?
like:any particular paper, any particular, company, anything that's, a
like:good place to start with that
like:a bit selfishly, I would say you should buy our book.
like:We talk about probably the main ways to train in chapter 5 of our book,
like:I was going to say that, but, I
like:was going to say it last, right?
like:Cause we do go over it.
like:The book is primarily about production environments, but you can't really
like:put a model in production if you don't know how to work with it.
like:So we have stuff on fine tuning.
like:We have stuff on perimeter, efficient, fine tuning on low
like:rank adaptation, the whole deal.
like:YouTube is actually probably one of your best resources right now, because
like:it has amazing content creators that show you how to do it in whatever
like:format you're comfortable in.
like:So if you're a C+ developer, there are YouTube videos on how to fine tune a model
like:and create a Laura using llama CPP, right?
like:It's not even all that difficult.
like:You just have to convert a model into a GGUF format and Boom, you're there.
like:You can do it on a CPU.
like:it'll take a long time, but you can do it in whatever quantization
like:you want and everything.
like:YouTube will meet you where you're at if you want to learn something a little bit
like:more industry-standard so that you could potentially, get employment in this area,
like:PyTorch has an amazing documentation, fantastic tutorials and they're one of the
like:best at really making it feel like you're playing with, let's say "big boy Legos"
like:You're like building the model using their little Lego pieces pretty cool If you need
like:something Bit more high level than that.
like:Hugging face, I think is the industry standard for, working in between a whole
like:bunch of different frameworks, whether that's PyTorch or TensorFlow or, whatever
like:other framework you're working with Onyx.
like:HuggingFace has abstracted away a lot of the difficulty of setting
like:up models for fine tuning cause in PyTorch you have to build out the
like:exact model architecture just to load the weights and then fine tune it.
like:HuggingFace already has the class built for you.
like:I would point to those if you need more explanation, like
like:Coursera is a fantastic place.
like:Deep learning AI on Coursera and on their own sites felt like
like:that's Andrew Ng's education stuff.
like:That's where I got my start with machine learning was Andrew Ng's
like:machine learning course on Coursera.
like:It was Awesome.
like:Fantastic.
like:Jeremy Howard is also amazing in that area of creating content for
like:people starting out and learning from beginner to advanced level.
like:He's a fast AI.
like:I, yeah, I strongly recommend all of those
like:and your book.
like:yeah, we ingested a lot of those in order to write the book,
like:our book is a very nice high-level overview of the key things you want
like:to be looking at and like different methodologies from training from
like:scratch to basic fine tuning to.
like:model distillation to, Laura and Path and things like that.
like:we definitely give a high level overview, we give code samples and show you that.
like:But, ultimately if you really wanted to get into it, yeah, there
like:are other resources out there.
like:I know Manning has another book coming out, specifically
like:around all about training LLMs.
like:there are definitely other places you can go, but.
like:If you're looking for the quick, summarized version of all of
like:these things, our book is actually a really good resource for it.
like:One other thing that I like about your book is, the part where you
like:build up the different, breakthrough moments, throughout the world of
like:mathematics, that ultimately led to 'attention is all you need', and
like:what is it, seven years later now?
like:the gold rush that we're observing.
like:but just before we jump into that, there is a little bit of vocabulary
like:and that one needs to have in order to basically talk or even read
like:a lot of this papers, could you.
like:Talk us through briefly that vocabulary.
like:I'm talking about phonetics, syntax, semantics, pragmatics, morphology, that
like:until I read your book actually made me think mostly of blood tests and semiotics.
like:Could you give us like the MVP version of what you need to know about these
like:things to be able to read papers?
like:Oh, absolutely.
like:Matt has been learning a lot of this too, he might be better at it than me.
like:I will throw other jargon into it.
like:writing this book with Chris over the last year has been, mind-opening for me.
like:until you can Understand these words like you were saying it's really
like:hard to dive into the deep end but we go over in our book just because
like:we do find it so valuable, It really helped me understand very quickly.
like:"Oh, this is what my LLMs are good at.
like:This is what LLMs are not", and that was one of the first things we started with
like:but the first one semantics, that is just like the structure of words, how things
like:go, whether or not it sounds correct.
like:that is what LLMs are really good at.
like:They're really good at making sure like the semantics of words align really well.
like:but after that, you got pragmatics, which is what LLMs have no idea about.
like:That is all the information around.
like:That isn't said, right?
like:So when you say I'm going to find the eggs the Easter Bunny left, right?
like:you have to understand what, Easter is, what the Easter
like:Bunny is, why a bunny has eggs.
like:there's a lot of context around it that you have to understand,
like:and that's all pragmatics.
like:it's information that isn't said.
like:And that's what LLMs generally lack.
like:Actually, I'm gonna, I'm gonna jump in here real quick.
like:Miko, did you like the Velkanot example that I gave in there?
like:Yeah, I thought it was
like:Yeah.
like:Was that pretty good?
like:I just wanted to ask because I remember experiencing that in Slovakia.
like:Like I lived there for years and that was a hugely beneficial portion to me
like:to help figure out that 'no, tons of people have tons of ways of looking at
like:things', and LLMs don't know about it.
like:you would have to explain every bit of it to them in order to get them
like:to understand the same things as you.
like:Anyway, sorry, Matt.
like:I find like those two words in general, semantics and pragmatics, understanding
like:those is going to get you significantly farther and just understanding
like:how LLMs work, what they're doing.
like:there's obviously a lot of other words that we talk about,
like:like morphology and stuff.
like:And I'll hand it off to Chris to talk about what he wants to add to there.
like:I would agree with Matt.
like:Just understanding semantics and pragmatics would get you probably 60%
like:of the way there, and you could read new papers that come out and immediately
like:see like where are they amazing?
like:Where are they failing?
like:I end up using The relationship between those two, just the literal
like:encoded meaning of your words.
like:if I say, "I'm married to my ex-wife", there's immediately,
like:boom, semantic problem there.
like:How can I be married to my ex-wife?
like:The words don't agree with each other.
like:Versus, exactly as Matt was saying, if we talk about Easter, if we talk about
like:traditions, if we talk about rituals that people have, just like the stuff
like:that you say, if you ask someone in Slovakia, they're going to respond to you.
like:That's normal.
like:it's a question, they respond.
like:LLMs don't have that, and you have to have them ingest tons and tons of data in order
like:to even get as far as giving a response.
like:the other ones that we can think about, syntax, I would say
like:that syntax is largely solved.
like:At this point, syntax is your structure around the words, like what order do
like:the words go in for them to be correct?
like:Is it 'I go to the store' or is it 'I to the store go' or all of that stuff.
like:That's syntax.
like:It's the structure that holds your sentences, your utterances together.
like:Morphology is delving into something that I consider to be very important in LLMs.
like:I'm not going to say the most important, cause I think that's still semantics.
like:There's a lot of work there.
like:but morphology would be how words are built.
like:what are the fundamental units of meaning the morphemes do those
like:even exist that sort of stuff.
like:and we don't have to delve really deep into that.
like:That's largely solved by tokenization, but we can see.
like:with newer models that come out that really matters.
like:You have much smaller models that have more novel tokenization, more novel
like:morphology that end up outperforming larger models on tasks that they
like:didn't even train on all that much.
like:if we can put it all together really quick.
like:The model solves syntax.
like:Embeddings try to solve semantics, but semantics is difficult,
like:and so they're not perfect.
like:Pragmatics is stuff like RAG, your Retrieval Augmented Generation, and
like:having repeated sequences within your training data, it gives it landmarks, it's
like:context around the syntax and semantics.
like:Morphology is your tokenization, which, if I would Give that an example, your
like:tokenization provides your model with stuff that it sees, it changes from text
like:into what does the model actually see.
like:And, your embedding strategy is moot if you don't have it.
like:Just your morphology gives your model glasses, if you want to call it that.
like:And then phonetics is the one that we haven't even talked about.
like:Phonetics is the reason why we are doing a podcast and we're talking instead of just
like:texting each other or emailing each other.
like:Can you imagine trying to ingest a podcast that's just emails?
like:It's horrendous.
like:And it's because there's so much richness and depth in meaning in the language that
like:is just lost when you strip it of its phonetic, I'm going to call it a medium.
like:And that can lead people to think that it has to do with sound, that's the
like:most common modality for people, but sign language has phonetics, they have
like:particular places where they, make signs.
like:They have particular ways that they do them to inflect and express more emotion.
like:Their phonetics exists even outside of the verbal modality.
like:that's important because that's where I see the most improvements coming to LLMs
like:in the future is being able to process.
like:phonetic information without having to convert it into text
like:or process phonetic information and compare it against the text.
like:that can be incredibly helpful for your model's understanding.
like:those are the five features of language that we break
like:things down into in the book.
like:And they're largely agreed upon.
like:There are some other linguistic features that are incredibly important, stuff like
like:dialogue, that we haven't even covered.
like:beyond that.
like:Yeah, we can talk about semiotics too.
like:That's, Charles Sanders Peirce, smart dude from the 1800s just created, a lot
like:of structure and organizations we dive into that very lightly in the book.
like:I don't think that you need a grounding in semiotics in order to improve
like:your ability to interact with LLMs.
like:But it is helpful for organizing all of these other concepts.
like:how do we create a mental map for how stuff needs to be processed
like:within a machine learning pipeline?
like:How do we make sure that we're not mixing things up and inadvertently destroying
like:our model's ability to see things, right?
like:If we put embeddings before tokenization, it breaks your process.
like:it's helpful for organizing things and it's also helpful for understanding
like:how conversation happens and how I say something and it moves through
like:your mind to create an interpretation.
like:that's by far like the most theoretical out there concept that
like:we get into in the whole book.
like:And together you came up with this language definition as being, as a
like:concept, "an abstraction of feelings and thoughts that occur to us in our heads".
like:And I'll be honest, I initially thought it sucked.
like:because it's a little bit, it's a little bit wishy washy.
like:I wanted something a bit more concrete.
like:But then, as I looked up all the other definitions in different contexts, I
like:was like, Okay, I can clearly not come up with anything better than that.
like:So I think I'm ready to yield now and say that this is actually
like:capturing it pretty well.
like:Putting abstraction in it, sounds also vaguely techie, so that helps.
like:How did you come up with that definition?
like:I didn't.
like:I would love to take credit for that.
like:No, that definition has been around for a long time within the linguistic
like:community, and one of the best examples of why it really works is babies, right?
like:Babies have no idea how to express their thoughts, but somehow they get it across.
like:when a baby is happy, we can tell when a baby is crying, we can infer that
like:it needs something, babies are able to communicate without language, meaning
like:that language is something that we created to shorten the conversation.
like:The reason I called it an abstraction is we have abstract ideas.
like:You probably come up to a situation where you're feeling something, and you don't
like:know the words to really express it.
like:I think that's a pretty universal human adult thing that has happened
like:at least once in your life.
like:That's happened to me a bunch of times, and it really illustrates that
like:"Oh man, the language that we use is actually describing "what's in
like:here", it isn't "what isn't here".
like:it's a hard concept.
like:Once you get there though, it really helps with LLMs, because you realize that the
like:language that we're using is a crutch.
like:And that's all that the LLMs have in the first place.
like:And so this is another thing that goes towards the miraculous
like:nature of them working at all.
like:Is they're dealing with an abstraction of an abstraction at least.
like:In order to communicate with us.
like:So let's say that I buy that.
like:my first question, would be going back to your baby example, isn't what the
like:baby's doing some form of a language?
like:what's the line
like:I'd like it to
like:what is and what
like:isn't?
like:what's the line between, a language and communication?
like:I like that.
like:That's a question that a lot of people I bet have and It'll
like:probably go in the appendix.
like:We'll probably talk about this in an appendix for curious readers so the line
like:between just straight up communication and a language is the ability to talk.
like:there, there are a lot, but one of my favorite ones is the ability to talk about
like:something that is not physically present.
like:bees have communication.
like:gibbons have communication.
like:Babies have communication.
like:Babies, though, are unable to express any ideas about stuff that is not
like:physically present, you can't talk to a baby about theoretical physics.
like:I mean you can, but what are you gonna get back?
like:You can talk to a baby about my Star Wars posters, right?
like:I can point at them because they're right there, but if I'm in a different
like:room, baby's not gonna be able to talk to me about them And that's
like:the difference, It's one of them.
like:That's the one that I'd like to highlight though is that the fact that
like:we can speak about things that are not physically right here with us, that we
like:can point at, that's the distinction between communication and language,
like:because babies are communicating.
like:But once they get to that point, it really deepens the interaction
like:that you're able to have with them.
like:So now, equipped, with all that knowledge, I'm gonna try to prompt
like:engineer you and give you this prompt.
like:I'm a five year old baby, that has language now, and who's very curious
like:about understanding how we got from bag of words, counting frequencies all the way to
like:LLMs and ChatGPT and people worrying about the Terminator actually coming into life.
like:Could you walk me through the high level ideas that were important,
like:build up to what we're seeing today.
like:The bag of words is really easy to think about, especially if you keep
like:your tokenization incredibly easy.
like:Sorry, this is, I'm already out of five year old territory.
like:You just count words.
like:If I take that sentence, "you ; just ; count ; words".
like:Each of those has a count of one.
like:If I add another sentence, "I like Star Wars".
like:All of those still have a count of just one word.
like:And then if I add another, "do you like Star Wars?"
like:You and star and wars all go up to two.
like:That's it.
like:That's a bag of words model.
like:why is it important?
like:what can it
like:do?
like:I think that bag of words is The first model that we really have
like:to explain being data-driven.
like:It's just keeping track of things.
like:if you look at a bag of words model for your workouts, it's just how
like:often do you do certain things?
like:how often are you doing a bicep workout versus doing a pectoral workout?
like:How often are you doing which thing?
like:it's just being data driven.
like:It's the first step, right?
like:You're not looking at any features.
like:You're really caring about how these things interact with each other.
like:You're just keeping track
like:So I guess with that information from your example, I can guess whether
like:you, are skipping leg days, and I can see what's important to you.
like:Or, if I'm counting, words in U.
like:S.
like:presidents speeches, I can say, like you described in your book, whether it's a
like:wartime or a peacetime president, and what they really try to get across.
like:this is something that you can use for anything you count in soccer
like:which players make goals how often that is a bag of words model.
like:You're not tracking words.
like:It's a bag of goals or it's a bag of, whatever else.
like:So what's the next step from there?
like:bag of words was really monumental just because it's so simple, but it's so
like:powerful because know words you use when you're describing sports is very different
like:from the words you use describing politics And so just picking up on certain words
like:and their counts helps us understand the overall subject of what it is.
like:But it really lacked, any sort of structure, because the order
like:of words also matter, right?
like:So the cat in the hat versus the cat's hat, they both have the word 'cat', they
like:both have 'hat', but mean different things because of the order of the words, and
like:so that kind of led to, n-gram models.
like:instead of just simple words, we would also take n-grams, which are,
like:n number of words in a certain order, and we would start cataloging those.
like:And so, more than just words, we're getting n-grams.
like:And that is improving our understanding of the language because now we
like:have embedded some syntax in it.
like:We understand some ordering of words and that's able to improve our categorization.
like:however, from there though, we're not really able to make any predictions
like:of what next words about to come up or anything like that, when it comes
like:to bag of words or n-grams they're really more for categorization.
like:And so that kind of led to Bayesian techniques
like:and so not to really go deeply
like:into Bayesian statistics, but
like:Yeah.
like:I'm sorry.
like:Sorry to all Bayesian fanboys.
like:We're going to go about as deep into this as we did to pragmatics.
like:it's just you know, based off of the priors of the words that came before we
like:can then predict the next word to come up and so if every single time after
like:in text we saw 'I am a man' then it's going to predict that the next word is
like:man instead of other words that easily could have come up like woman or girl
like:or boy or cook or professional athlete.
like:certain things that could come up that are gonna be a lot rarer Like I am an
like:astronaut like a lot less people have been astronauts in order to say that
like:it's gonna have a very low probability of being the next word predicted but
like:it gives us this opportunity to look at what is the next word predicted.
like:from there, we move on to what's called Markov chains we're swinging
like:back towards the n-gram model But it gives us a bit of prediction next.
like:I actually really love Markov chains because they provide very fast
like:predictive text like Markov chains is essentially what's been fueling like
like:the predictive text like for Google search and things like that has been the
like:technology that's really been leading that charge for a really long time.
like:and it's just a very basic way that we're using Ngrams now to
like:make predictions of the future.
like:You can think about it there,
like:that is obviously I'm
like:reducing it.
like:that's not exactly how it works, but it's a bag of n-grams where you
like:take a state, at each point in a sequence, and look at all the times
like:that Previewings have occurred in that sequence, and then from that you can
like:model probability about what comes next.
like:Instead of just looking at each n-gram by itself, you give it state.
like:and it's a bag of n-grams.
like:It's really fun.
like:It's a probabilistic bag of n-grams.
like:That's how the chains work.
like:One of my favorite parts, and I like that you kept track of this quote
like:here, that Markov models represent the first comprehensive attempt to
like:actually model language, which is funny, because Markov was not trying
like:to model language initially, he was just trying to win an argument.
like:And He eventually used it to, he looked at distributions in
like:particular Russian authors.
like:He looked at distributions in, Russian government official speeches.
like:he knew what he had and he believed in it, and I love that, what a
like:great piece of history anyway.
like:continuous bag of words.
like:Is where we, start essentially taking the logic of a Markov chain where,
like:"oh, if we keep track of where things appear and how often they appear there,
like:then it helps us, be able to model for what could appear next", right?
like:And this is the first moment where we're really coming full circle all
like:together and going right back to bag of words and just adding context
like:for position and adding context.
like:from the context of the bag of words, the literal counting of things, we're
like:able to create embeddings, right?
like:I don't know if a lot of people are aware, but bag of words
like:is how Word2vec came to be.
like:Word2vec was huge in, I think, 2015, 2016, and it stayed huge, Gensim
like:is still one of the most downloaded natural language processing libraries
like:in Python for Word2vec and for GloVe.
like:Continuous bag of words, just adding that one little thing.
like:adds all this context so that we can create embeds.
like:We can create vectors that we can compare between words.
like:this all comes from the logic of I forgot that dude's name.
like:Tell me the company that a word keeps, and I'll tell you what that word means.
like:just what's around the word.
like:influences its meaning, which goes directly against a lot
like:of previous linguists' thought that, syntax and semantics are
like:absolutely not related at all.
like:That's one of the big things from Chomsky, the colorless green ideas sleep furiously,
like:nonsense, there's some semblance to it.
like:There's some sense to it and taking advantage of that with continuous
like:bag of words, we can create.
like:like I said, these vectors that we can then compare, and
like:that's really interesting.
like:that is what fuels LLMs now, is this exact same continuous bag
like:of words modeling technique.
like:It's been built upon a little bit, but that bag of words is still fundamental
like:to how embeddings are created.
like:Bag of words and positionality and, like we can get into, the rope scaling,
like:all of these rotational, plugins that you can use to get longer sequences
like:embedded correctly, or at least better.
like:that's one of the hard things when we're talking about language modeling
like:is what is good and what is better.
like:a lot of people like to appeal to, this is how humans do it.
like:I don't know if humans are incredibly efficient when we do it, but.
like:Like it's fine.
like:then we get into the 1960s, the very first
like:perceptrons,
like:Before we go there, can we spend a little longer on what
like:the embeddings actually are?
like:You mentioned words to Vec, you mentioned the words vectors and embedding, but for
like:somebody, listening to us, from the start, that's probably not clear what that is.
like:can we delve a little bit?
like:Yeah, absolutely.
like:So embeddings are the vectors that come out of models like
like:continuous bag of words.
like:when you look at a modern machine learning pipeline, there are multiple models that
like:you go through and we just attract all of it and call it model, just one model.
like:When you look at GPT-3, ChatGPT, it has a model that they call it, a byte pair
like:encoding model to do its tokenization.
like:And then it has a model to do embeddings.
like:that model is fundamentally a continuous bag of words.
like:It's built on top of it a little bit with, like I said, keeping track.
like:Not just how many times a word occurs, but how many times a word
like:occurs in particular positions.
like:and then on top of that, it keeps track of the, flip.
like:It's either an odd or an even position within a sentence and it assigns
like:it cosine or sine based on whether it's an odd or an even position.
like:in order to try to insert some of that meaning back into it, that was taken out
like:from the tokenization, cause tokenization is just assign each token a number in
like:a dictionary, and you have a way to get all words into that dictionary, and
like:then come back out of that dictionary.
like:So it takes all of the meaning out of it.
like:It's just one number.
like:The embeddings attempt to put some of the meaning back into it using positionality,
like:using continuous language modeling
like:techniques.
like:embeddings really simply, they're not perfect, they're just an approximation
like:of that meaning, and because we are able to put it into a vectorized
like:space, we're able to take these words, put them in a vectorized space.
like:We can start doing things that start to make sense and start to make us feel
like:like we're headed in the right direction.
like:the classic example is, when we first discovered embeddings, we
like:took the embedding of 'king', we subtracted 'man' from it.
like:We then added the embedding of 'woman' and we got the closest.
like:Embedding to that was 'queen' to that, we start to get this vectorized
like:space that starts to make sense.
like:We start to, these words start to have connection to each other and they start
like:to make semantic sense to us as humans.
like:however, embeddings are still an approximation, right?
like:So if you were to do that with kind of every combination, it's interesting,
like:what do you get when you start, taking words, That don't necessarily make any
like:sense, like adding or subtracting them together.
like:what do you get
like:a good quintessential example of that is you take the vector for
like:'king', you subtract the vector for 'wolf', and you add the
like:vector for 'prince', and you get the vector for 'village'.
like:Or at least pretty close to it.
like:That doesn't make any sense,
like:there's still lots of, okay, these are starting to add meaning, not
like:always, but sometimes, like it's an approximation and embeddings ultimately.
like:it's something we're constantly trying to learn and improve
like:If your listeners are wondering how to keep up in space, like embeddings are
like:probably the number one thing to keep track of OpenAI recently released, logic
like:for being able to change the size of embeddings, to me, like being pretty
like:deep into this, it feels groundbreaking.
like:Because normally you have to structure these vectors so that they're all the same
like:size and each point within that vector represents meaning negative or positive
like:and it's very structured and not malleable and so the idea that you could take you
like:all of your embedding space and change the size of it at your whim Is just amazing.
like:that's one of the things that I see as a huge groundbreaking piece of technology
like:that OpenAI is continuing to lead in.
like:yeah, and if you're ever in doubt for oh man, is this paper important?
like:If it's about embeddings and doing really cool things with embeddings, probably.
like:I think the one question for anybody to like picture that, so what's
like:the dimension of all these vectors?
like:Is that the entire vocabulary?
like:Are there different techniques?
like:yeah, currently the, number one, dimensionality that is an
like:unspoken industry standard is 768.
like:that's a number that pretty much every NLP practitioner knows.
like:like the reason OpenAI's embeddings initially were like really cool
like:and they thought they were super dense is they were, what, 536, or
like:1536, which is 768 doubled, right?
like:You're gonna see multiples of 768 all over the place here.
like:And that's not because that number is super significant, that's just
like:the first embedding space that we found that tended to work better
like:than the others.
like:So that's the
like:more art than science part of this
like:for
like:It's the brute force testing.
like:Yeah, before going through and testing, 767, 766, 765 and landed on
like:that one and it worked, that's the best one that we've found so far.
like:Even the doubled embeddings from open AI offer a marginal improvement
like:in that understanding space.
like:I think we can move on to the multilayer perceptrons.
like:Okay.
like:a perceptron is essentially just a linear transformation of data.
like:If you look at it from a statistical standpoint, if you have three things
like:about something, You can just add those things together and you get
like:a description of that thing, right?
like:Just summing them and, that's like abstracting it a little bit much,
like:especially if machine learning practitioners are listening to that.
like:Like we can do linear trans transformations.
like:that's like the easiest way to think about it for me is you perform one.
like:action on a group of features and you get something out of it.
like:That's not by itself.
like:really helpful.
like:once you get into having multiple layers of the, this is the MLP, the
like:multi layer perceptron, once you get into multiple layers where you are
like:adding these transformations together, and in between those layers you have
like:non linear activation functions so that you can, create, you can create
like:nonlinear relationships between sets of linear transformations.
like:You can get into really cool spaces.
like:And one of the first things that any machine learning practitioner learns,
like:at least in a lot of the cases that I've talked to is that just adding
like:more layers does not make it better.
like:In fact, the cool part is finding the minimum number of layers that
like:you need in order to model the relationship between two points.
like:that's a little bit abstract, I think the quintessential example is like
like:detecting which type of iris flower.
like:It is from an image, the, we don't necessarily know how many features
like:there are, but we can vectorize the entire picture of an iris flower.
like:And then we can discover that the, I think minimum number of layers is
like:like five in order to go through and actually get really good accuracy on
like:detecting which iris flower it is.
like:yeah, multi layered perceptrons are The feed forward networks.
like:Those are the basis of everything that comes after it whether it's recurrent
like:or even Transformers have feed forward networks inside them and that's the basis
like:of it right there.
like:How do you choose the sizes and is it all just trial and error
like:as well for the number of layers, the sizes of the hidden layers?
like:Are there
like:Not any
like:rules that always
like:work?
like:Yeah, so going through a feed forward network and this comes from trial and
like:error, it comes from a lot of people trying different stuff, but generally
like:you have your Initial dimensionality could be something like 768, right?
like:Your initial hidden layer.
like:that's a good number for it.
like:That's an embedding dimension that we're familiar with, but then we want the
like:next hidden layer to be double that.
like:And then we want to go smaller and smaller until we hit our
like:final output classification layer.
like:So we want to have a big jump and then small.
like:What to think about that theoretically is you want to model the number of features
like:that you are looking for, and then you want to just model double that is just
like:a good way of saying all the features that we might not know about that we
like:might not even be keeping track of.
like:Let's see if the model can figure them out mathematically.
like:And then we want to narrow it down.
like:Narrow it down.
like:Narrow it down until we get to our actual classification, which in language
like:modeling is what is the next word, right?
like:Got it.
like:So double it and then boil it down to the size that you're actually looking
like:for across a bunch of layers and hope for
like:the best.
like:Okay.
like:and that's why when OpenAI doubled the embedding layers, it was a
like:marginal improvement, but it's predictable because that's normal.
like:People do that.
like:Are there any particular, well known kind of configurations of this neural
like:networks that just work for a bunch of problems that, something that
like:you keep seeing over and over, or is it more custom for every problem
like:you just follow the heuristics that you just described?
like:as far as model architecture, no, it's basically the heuristics that I
like:described, and then people will experiment and tune them and find that, oh man,
like:statistically, If this layer of the model is bigger, then it works better,
like:but it follows that general structure.
like:I think, one of the papers that I would point to for this is a bit,
like:MFIT, where it was, it's basically a methodology for fine tuning.
like:But it experiments with gradual unfreezing of layers where when you're
like:training, you will start with only the very last classification layer and
like:everything else is exactly the same.
like:And you only train that one.
like:And then you unfreeze, unfreeze, and test each layer as you're training.
like:And that tends to help things like even now that is abstracted within
like:the hugging face trainer class.
like:And that's abstracted within pretty much every.
like:model.fit methodology because it works.
like:Awesome.
like:What's next in our journey?
like:probably just the fact that multilayer perceptrons
like:struggle with sequences, right?
like:even if you try to embed things and try and keep some of that
like:positional encoding within your embeddings, they struggle to model.
like:Multiple things where the order of them matters, right?
like:which language, which the order matters sometimes, right?
like:Sometimes it's normal to say gibberish and knowing when is, which is extremely
like:difficult and to solve that, I don't know if we need to necessarily go
like:into recurrent neural networks, but we definitely need to talk about
like:LSTMs, the long term short memories, which are recurrent neural networks
like:to, start with, but they added some really important things, which, for
like:example, when I'm talking, you are.
like:Kind of consciously predicting what I might be saying, you can hear what I'm
like:saying and you're trying to figure it out as it goes on to understand it.
like:we call that active listening.
like:that's what happens.
like:long term short memories, model that a little bit in that they take the sequences
like:and they allow the model to try to predict both going forwards and backwards.
like:instead of just doing the one way.
like:So that bidirectionality it's computationally expensive.
like:It takes a lot longer, which is why I think these are not used as much
like:anymore, but it's really novel and it did help a lot in predicting sequences.
like:it was phenomenal for language modeling.
like:beyond that, they like solving the attention.
like:Within LSTMs, like when attention came out, adding attention to whatever you
like:were doing was phenomenal where it added an extra layer of non linearity when it
like:was going through and trying to search for what word might come next, it not
like:only had all the modeling that we've already talked about, it also had the
like:ability to search now and search for not that exact thing, but something similar.
like:And, that just exploded in popularity because it works, it was phenomenal.
like:However, the difficulty with long term short memories is they're computationally
like:expensive, they're slow, it's a lot of math that you have to do in order to
like:get through every single layer of it, let alone trying to predict and stream
like:those predictions in a sequence, you're going at one token per 30 seconds.
like:And that's difficult for having models that are the same size
like:as transformers, for example.
like:so yeah, it was a lot of really cool stuff that helped us solve
like:basically how to get to the next step.
like:It was just computationally expensive and slow.
like:basically, not very practical in use, but important.
like:talking about practicality, I think it's great that it's accurate, right?
like:I think accuracy is incredibly practical.
like:I don't think that from a customer experience that's practical, right?
like:Customers don't like waiting a long time for the right answer because
like:they might be able to find the right answer in that amount of time anyway.
like:and then from there, do we jump to the attention?
like:at this point, we've gone through the history of, the field modeling
like:language, building up and we finally reached attention, right?
like:And attention is, the backbone of transformers, which is
like:what LLMs are built off of.
like:And, attention just adds a non linearity.
like:And it was just a breakthrough and how we're able to connect the words,
like:so attention really quickly is just, creating these dictionaries,
like:key values of, every word to every other word in the token space.
like:and then it's able to query it.
like:for each other word, we're able to build.
like:importance of the other words that are important to it.
like:And it's in a quadratic space, so it's much more than a linear space, but
like:it's a reasonable amount of time, to compute these kind of dictionaries,
like:the key values, and then query them and understand the importance of other
like:words It's the backbone of what all these, different models are doing.
like:and even as Chris mentioned, like we could inject attention into
like:these previous, RNNs, LSTMs, et cetera, but, it was the backbone
like:of building the transformer model,
like:which, came out, in the catchy paper, "attention is all you need".
like:where essentially all they use,
like:a meme, right?
like:That we've seen a whole bunch of other papers afterwards.
like:They're like, "no, this is all you need".
like:or no, this is all you need, or no, you don't need, but the
like:reason it's a meme is because they
like:took out everything that was, supposedly novel about the long
like:term short memory, the LSTM.
like:They used only attention and feedforward networks
like:Could you give us an example of what that would look like
like:on a very stripped down thing?
like:What does that dictionary look like?
like:for visualization
like:and decode.
like:no, just for the attention itself, right?
like:You mentioned a key value from basically every combination.
like:You have to pre compute every combination within the vocabulary.
like:You can take a sentence that you're feeding in to the attention algorithm, the
like:cat in the hat, since I used that earlier.
like:and so essentially you would have a dictionary where the is comparing to
like:every other word, cat in the hat, and it's coming up with assimilating metrics
like:of the importance of all the other words.
like:And then you would do that for cat, it's going to do it for the in the hat, and in
like:the cat, the hat, and it's going to come up with A dictionary, essentially, of
like:key value pairs for all the other words, helping you understand, the importance
like:of the other words that are in there.
like:and then the query algorithm, that runs, that essentially helps us
like:understand being able to predict the next word that's coming afterwards
like:based off of how important the, all of those kind of dictionaries
like:are, and adding them.
like:And so all of,
like:this happens to happen in quadratic time.
like:one of the nice
like:novel
like:things about this is that the query And key vectors, your query vector
like:is the word that you're looking at in the utterance and your key
like:vector is the key in the dictionary.
like:those two vectors are not one hot encoded.
like:The way that a lot of we haven't even mentioned this.
like:But that's a vector that is 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, that's how a lot of these
like:things had been represented previously, coming off of the bag of words, The idea
like:that, hey, we can model these things.
like:We can create vectors that are just did this word appear.
like:Or did it not?
like:And where did it appear?
like:That was a positionality and, attention is all you need, you can immediately see
like:a problem with one hot encoding in the it's very sparse, especially as you're
like:getting into 768 dimensions, right?
like:You have just one 1 and a whole bunch of zeros and those zeros don't really matter.
like:And so one of the breakthroughs here was using dense vectors
like:for queries and keys in order to get values that are also dense.
like:I think one of my favorite visualizations of it, it's from Jesse Vig.
like:It's called BertViz on GitHub.
like:I've used this in production environments in order to show that hey, Our model
like:is not understanding this because look at the attention, all of it is
like:factoring in, all of the queries are related to the key of the wrong word.
like:If you look at words with semantic ambiguity, I think the quintessential
like:one is "time flies like an arrow".
like:Where flies is also another word that could mean multiple small
like:little bugs buzzing around.
like:How do we know that it's not that word?
like:It's because of the position in the sentence that we know that it is a verb.
like:and it's referring to time and it's referring to arrow.
like:And we can see that predictably within attention, because that
like:word is determined to be important.
like:That query is determined to be important as it relates to the keys of time and
like:arrow within query key value attention.
like:That's what that dictionary looks like.
like:That's why it's
like:useful.
like:And, I guess the representation of the importance, how do
like:we actually come up with
like:that
like:I think it's dot product.
like:we're comparing the vectors between the query and the key.
like:dot product attention is, I'm pretty, that's not where it started, but I
like:think that's where we're at right now.
like:That's like the industry standard that everybody uses.
like:It's just, multiplying the vectors together.
like:Essentially you take the dot product of the two vectors, and that's
like:where we get the comparison and the relative importance values.
like:it's not magic, it's
like:math.
like:kind of the same thing from time to time?
like:Okay.
like:And then with that, we've got the GPT, the generative pre trained transformer model.
like:What's
like:so groundbreaking about that?
like:as opposed to the original transformer, they only use a decoder.
like:So the original transformer had attention based encoders, which changed your
like:embeddings into essentially another embedding that was then taken by your
like:decoder and used to predict the next word.
like:So it had two networks linked together in the middle in order to produce
like:Your next word and the reason this is important is it goes back to that
like:original idea that we talked about a language as an abstraction, right?
like:The authors of attention is all you need looked at that abstraction and
like:we're like, Hey, can we model that?
like:And that's what an encoder is.
like:When you look at models like BERT, it's taking your input and putting it
like:into a new abstract space with lots of nonlinear trans transformations and
like:it's taking your input and putting it into a new abstract space with lots
like:of nonlinear trans transformations and it's taking your Incredibly useful.
like:And so the GPT models were groundbreaking, because they
like:were like, we don't need that.
like:we just need the decoder and we're just going to use syntax basically.
like:And the thought process there is that syntax is related to semantics deeper than
like:linguists are able to really conceptualize in an easy to understand way.
like:We know that it's true, And we know that it's predictive with especially
like:looking at how good GPT-3, GPT-4 are.
like:And even looking at the open source stuff, LLAMA is a decoder
like:only network and it rocks, right?
like:I have a suspicion that we're going to hit a point later where, Google
like:is going to blow everybody out of the water with another T5, like another,
like:version of that puts the encoder back in.
like:I don't know how we're going to get to that point, though, because the
like:decoder only models work so well.
like:And they're faster, they're less computationally expensive, because
like:you're taking, probably, a third of the model and just throwing it away.
like:So you mentioned Llama, and I think that might be a good
like:segway from what essentially is, about a third of your book.
like:so for everybody else who wants to go and jump into more details and
like:see actual Python implementations of a lot of what we just covered,
like:the book is called Production LLMs.
like:It's available on manning.com, and I'm pretty sure you're going to love it.
like:So going back to Llama, let's do a little hall of fame, rundown
like:of the kind of landmark important models from the last few years.
like:Where should we start?
like:I would probably start with the original transformer, like
like:they deserve credit.
like:A lot of the, Vaswani and all, a lot of the people who wrote that paper have
like:gone on to found or co found companies that are now competing in this space.
like:Whether that's Anthropic or Character.
like:ai, those are the people that created that Transformer and
like:they're still building on it.
like:I think that's the first one that I'd say for the Hall of Fame.
like:what would you say, Matt?
like:think part of this question is what is the first LLM versus what is, the first, Hall
like:of Fame model and yeah, like Transformers, Bert, like Bert, is incredibly powerful,
like:I think, because it's so small, it's not in the LLM space, it's often overlooked.
like:And I think many companies are still looking at these massive
like:LLM models for problems they could solve with a simple BERT model.
like:But because they're only getting into this space now,
like:they
like:think immediately, hey, we have to use an LLM,
like:right?
like:And
like:they didn't care in 2017.
like:And
like:And
like:over what
like:was there.
like:and I go back, I said it before, I love Markov chains, like they're
like:amazing and they're really powerful for what they do really well.
like:And even then, a lot of people could just use Markov chains for a lot
like:of the problems that they're trying to solve with LLMs, but, LLMs.
like:They do give that flexibility, just their massive levels of computation.
like:I think if I was to point, to a model that I thought was just really powerful, it.
like:It would be Bloom, actually.
like:Bloom was essentially the first, LLM massive, large model that was built.
like:And it was built, completely transparently.
like:it was a research, project.
like:funded, a large part by, the French government.
like:And just, it was built completely transparently and
like:completely in the open space.
like:and even though the bloom model today, isn't seen as, a very competitive
like:model, but like a lot of the open source learnings, a lot of what
like:we have nowadays is because of what those researchers figured out
like:while they were working in bloom.
like:we got amazing, libraries out of it from like deep speed
like:and other things like that.
like:it really boosted the open source community, which has been one of the
like:major driving factors of LLMs today, and probably a large part of why
like:we could even write our book, cause the open source community wasn't.
like:At where it is today, like there wouldn't be much we could really
like:tell people other than oh, You got to go work for Google or Microsoft or
like:how would We, know any of it, right?
like:Yeah.
like:we know, about it largely
like:because, we've been involved in the open source and we, built off of
like:what those scientists at Bloom did.
like:Big
like:science.
like:So that's 2022, right?
like:That's a couple of years now.
like:Yeah.
like:and then we had llama that became important, and llama2
like:Yeah,
like:even more important.
like:Yeah, and it's largely just because, I don't remember the username of who
like:did it, but whoever put that PR on the original llama GitHub that had the
like:torrent link to leak the weights, that's the hockey stick moment for LLMs, right?
like:That's what made them available to everybody.
like:That's what enabled Stanford to create alpaca and show that, oh man,
like:you can make the model better with like only 50 K responses like you
like:don't need tons and tons of data in order to fine tune and get very good
like:results and improve in every metric.
like:yeah, that everything since then has just been building off of that
like:exact same momentum of whoever leaked that first llama and Meta
like:has benefited greatly from it too.
like:they now have a very open, I wouldn't say completely, but a very open attitude
like:towards the space because they recognize how, advantageous it is to have other
like:people building on top of their model and be considered an industry standard.
like:Yeah they've really leaned into it recently, right?
like:And like
like:how big was their
like:stock jump?
like:right?
like:all of the underlying architecture, right?
like:Like these open source programmers or even just like the video programmers, like
like:they're able to go in and because they know everything about Lama, they're able
like:to optimize, cuda kernels and everything.
like:And so Lama has gotten faster and more proficient, Lama CPP, we're able to run
like:it with, just on a CPU, there's lots of benefits that because they, gave
like:us the architecture, it was leaked, but now, they've, leaned into it.
like:They essentially they've given it to us.
like:And so
like:Yeah, we just need them to release the data that they used to train
like:on it And it's completely open,
like:right?
like:but even the data, they've told us a lot about what the data is, right?
like:we don't have the exact data, but we know essentially red pajama, what those data
like:sites were built off of, what they were.
like:And so
like:we're able to.
like:replicate it really closely in the open source community.
like:Llama, I don't know, if we have a really good list of Hall of
like:Famers because
like:it's difficult to see what's going to stick around partially because
like:it's so difficult to evaluate these models as opposed to BERT right?
like:large BERT had 300 million parameters.
like:You can run stuff to see how well those parameters are,
like:like you can hyper tune them.
like:you can run evaluations to see how each one is performing
like:and still go relatively fast.
like:When we're getting into the 7 billion parameter range and the 13
like:billion parameter range and the 70 billion parameter range, it's much
like:more difficult and computationally expensive to evaluate on that level.
like:And we don't even have the ability to describe what all
like:the parameters are doing.
like:and so our evaluation metrics are difficult to gauge.
like:You look at MMLU, you look at a lot of the benchmarks that people
like:are running, and they're useful.
like:But ultimately at this stage, we still have to go download those models
like:and test them against our own use cases to see if they perform better.
like:And that's incredibly time consuming.
like:like we could talk about a lot of the models that have come out, like Capybara,
like:we can talk about New Zermes, we can talk about WizardCoder, and they're all great.
like:I don't know which ones are going to be the hall of fame.
like:The next industry standard though,
like:there's definitely some other models that we love and we talk about in our
like:book, like Falcon, which came out of
like:the TII and Abu Dabi, right?
like:Like amazing model.
like:It's,
like:Micu.
like:the latest Falcon is one of the largest open source models and it's
like:come, under the Apache 2 license.
like:So it's completely open source.
like:the very first model that's fully open source.
like:there's definitely amazing, progress being
like:made and lots of different models to be paying attention to.
like:But yeah,
like:One of the biggest ones to
like:pay attention to.
like:right now, I think is Olmo, not because it's competitive and
like:performant, but because like Falcon, it is 100% open source.
like:You can see the data they trained on.
like:You can replicate exactly their experiments.
like:that's going to be one of the biggest drivers in this field where, you look at
like:a lot of the, innovation that's happening and it's happening over on files that
like:people are passing around on torrents.
like:It's happening on like random users on Reddit are coming up with NTK aware
like:scaling and rope scaling after that.
like:And they're coming up with more stuff because.
like:They have time, and they want to help and a lot of these people are experts
like:and they're just anonymous and that's Incredibly important for the space because
like:we're finding that people who deal with these models and use them 24/7 Have skills
like:that the researchers don't necessarily have and that's difficult to admit being
like:on the research part of it But it's true.
like:so that's the one coming from Allen Institute for AI, right?
like:The one it has, yeah, I think they're also open source in the
like:actual training code as well.
like:the whole
like:they are the whole
like:thing.
like:That's pretty awesome.
like:So with that caveat out of the way, hedging your predictions, we don't
like:know what's going to happen tomorrow.
like:Do you see any one company kind of getting ahead of the others?
like:The GPT-4 is still holding up well against a lot of these models, which makes me
like:think personally that they have a few.
like:Tweaks and hacks they haven't shared, which helps with
like:their multi billion valuation.
like:Do you see anybody like running away from the crowds or is it too late now?
like:The cat's out of the bag and the progress is going to come from the mass of people.
like:I don't know I know that, I was texting with a couple of people the
like:other day talking about GPT-4 and, how it is still relevant, even, people
like:talk about the performance decrease, but it's still relevant, and every
like:week, every model is, that's coming out getting compared against GPT-4.
like:And they're finding that most models are more performant in GPT than
like:GPT-4 on certain things, right?
like:It's comparing the Rain Man to an average human where, and asking like
like:what tasks they're good at, right?
like:If you, if it's going to McDonald's and ordering your
like:own food, Rain Man is not great.
like:And you just got to find the model that's better.
like:a good example for that with GPT-4 is math.
like:if you need a model to perform calculations for you.
like:That's not it.
like:you have Alpha Wolf, you have, Goat, you have, even just Vanilla Llama 2 is
like:better at math than GPT-4, even though they weren't explicitly training on it.
like:And I think that they currently have that first-to-market
like:advantage more than anything.
like:That's not to say that it's bad.
like:That's not to reduce the work that OpenAI has done because it is phenomenal.
like:But that's what's keeping them really afloat is the first
like:market and the ease of use.
like:One other question I was holding, as you were speaking with, you
like:mentioned mixed role and, What is it
like:called?
like:Mixed of, mix of experts.
like:what's
like:Yeah, mixtral.
like:Yeah, it's routing.
like:it's being smart and saying, hey, we don't need a dense feed forward
like:network for every single thing.
like:Let's have a whole bunch of sparse networks and just based on the input
like:route it and tell it which expert is actually going to be the best.
like:It results in much larger models that are smaller on disc and faster to run.
like:Is that more similar to how the human brain works?
like:Because it's obviously not fully
like:connected.
like:It's got different regions and stuff like that.
like:I would love to appeal to that.
like:authority.
like:that didn't rock.
like:I don't know though, because like you look at MRIs and you can see, Oh man,
like:this portion is lighting up when you're experiencing that emotion or seeing that
like:input.
like:But
like:who
like:we don't really have a really great mapping of
like:every person's brain.
like:I think the connection between a neural net and actual neurons has
like:been lost a long time ago, right?
like:how does the human brain work and how does it really compare to modern day models?
like:Like it's hard to really make that argument, we're still
like:learning about how we learn.
like:And as we do, and as neuroscience filled advances, like ultimately leads to
like:advances in the AI space and vice versa.
like:there's definitely connections there.
like:but yeah, as far as your question goes, I think it's anybody's guess.
like:I think this is a perfect note to end.
like:A little bit of suspense.
like:we're going to have to get you back at some point when you've finished your
like:book and talk a little bit more about the actual technical problems and challenges.
like:We haven't really touched upon any of that yet, but today I certainly
like:learned a lot from you and I hope a lot of our listeners will as well.
like:It was an absolute pleasure to meet you both.
like:Thank you so much and see you next time.