Chris Parmer (Chief Product Officer & Co-Founder, Plotly) and Domenic Ravita (VP of Marketing, Plotly) discuss the evolution of AI-powered data analytics and how natural language interfaces are democratizing advanced analytics.
Key Topics Discussed
Philipp (00:15.068) Good thanks, Damian. Happy to be here.
Damien (:Domenic Ravita (00:30.575) Thanks very much.
Chris Parmer (:Damien (00:33.934) Awesome, so before we get started, some quick bios. Chris is the Chief Product Officer and Co-Founder of Plotly. Since 2013, he's guided Plotly's transformation from an open source graphing library into one of the leading data analytics and visualization platforms, and more recently into an AI-powered development platform for data applications. He's also the creator of Dash. Chris has spearheaded development efforts in Plotly to establish the framework as the fastest way to build, deploy, and scale interactive analytics applications.
Domenic has over 20 years of experience scaling B2B data infrastructure and cloud service companies and is currently the VP of Marketing at Plotly. He is a Silicon Valley veteran who serves as a go-to-market leader for data and AI infrastructure solutions. Throughout his career, he's held key leadership positions spanning engineering, sales, and marketing across the tech industry, and has established several new market categories in the ultra-competitive cloud and data sectors.
when you co-founded Plotly in:Chris Parmer (02:00.827) So myself and my co-founders, we're traditional engineers and scientists. I was an electrical engineer. And in every job I had, I was dealing with a lot of R&D data. Around 2013, there were a few big technological changes in the world that were happening. Everybody in the industry was moving over to Python as the language of choice for data analytics across every sector – for me, especially in science and engineering, but also in data science.
The second thing that was happening was the web browser itself, thanks to Chrome and the V8 engine that came out a few years before, was becoming remarkably powerful and was becoming the new application platform. With those two things, we saw a gap in the technology and started building out a web-based visualization layer that had sophisticated interactive data visualizations rendered entirely in the web browser, which was totally new at that time. Over the years, we've expanded our scope from data visualizations into data applications and now into AI-powered data analytics.
Damien (:Domenic Ravita (03:42.465) Yeah, a lot has changed in this area. You'll hear it spoken of as business intelligence or data analytics, advanced data analytics, general reporting. I think what's changed recently, obviously, is that AI is starting to impact all of these adjacent categories.
So you have your traditional business intelligence sorts of tools generally used for reporting descriptive analytics. These are things like Power BI and Tableau. That category went through an innovative shift maybe 20 years ago with drag and drop capability many people called visual analytics, where you could just take a CSV or Excel file and drop that in. It felt like this magic thing at that time 20 years ago.
I feel like we're at that point again. We're seeing that same kind of eureka moment as users start to use an AI-enabled data analytics tool. It is that next quantum step we think is happening. And because AI can do so many of these things well, it's starting to merge what is thought of as data analysis or business intelligence with adjacent categories like data science platforms, which have typically been code first.
If you're a statistician, you've probably been working in R, because R has such a rich set of packages and libraries for statistics. And if you've been working in machine learning, you're probably coming through the Python realm. Well, in the last few years, Python has dominated all of that. And AI makes this more accessible.
The third part of that, which is really new in the past maybe a year and a half, maybe two years, has been AI coding tools, mainly initially for software developers, things like Cursor and Bolt, and then AI generating tools for other types of users, like designers, so things like Lovable. So these are the three categories that we see: business intelligence, advanced analytics or data science platforms, and AI coding tools.
Domenic Ravita (:Philipp (06:38.422) Domenic, you already talked a little bit about the past and your journey with Plotly. If you're looking back now, was there a decision or a moment that changed the trajectory of Plotly more than you would have expected?
Domenic Ravita (:And that more recent push has been because of AI. AI is generating Python code, even when you don't ask. If you ask for a task like, "Create a synthetic data set and then give it to me as a file," it will write Python code to do that for you. And you don't have to know what that is, but you get the result and you can download the file, right? So Python plus AI adoption has propelled Plotly over the last, most recently probably the last couple of years and has led to our transition and transformation to take advantage of that wave.
Philipp (:Chris Parmer (08:17.441) So at Plotly, really since the beginning, we've focused on having code-first approaches to data visualization and application development. We released our first Python library for creating visualizations in 2014. It's downloaded tens of millions of times a month right now. And what's amazing about that in the current moment is that one of the things that AI is remarkably good at is generating code.
And a lot of our libraries in Dash or in our Plotly graphing library are open source written with tens of thousands of examples over the years, and they're all indexed and trained on by LLMs. And so over the last year, LLMs have become pretty good at knowing and writing Python code to generate applications and graphs.
And so we've built upon that in our latest product, Plotly Studio, which is an AI-native application for generating visualizations and dashboards and data apps. But what we found in building out this AI-native product is that the LLMs are kind of only 30% of it. And the rest of the 70% is the tooling that you build around the LLMs. So it's not just generating the code, but actually running the code, verifying it, testing it, doing this all in a loop, which is how some folks define agentic AI, where it's not just generating code, but actually generating code in a loop, in an environment where it can run things.
One of the things we do with Plotly Studio is it's all bundled and it's all self-contained. You download it and Python is part of it. When it generates code, it renders the visualizations for you. It auto-corrects it. It provides a nice user interface for updating those visualizations and graphs. And so we're adding this extra level of packaging and feedback and auto-correction that really makes this underlying capabilities of agentic analytics actually accessible to anybody.
Philipp (:Chris Parmer (11:06.273) Yeah, I think so. Where we're focusing immediately is going from having a data set data connection and allowing data scientists and analysts to explore that visually. And those visualizations can be of the raw data itself, but it could also be of simulations that run on top of that raw data, scenarios that run on top of it. And it's those types of analytical capabilities that sit on top of the data that is uniquely powered by having a coding language run custom code as part of the platform.
Many BI tools are really great at visualizing the data, but not as much about running analytics on top of it. And that's one of the really special things about doing code-based analytics. And now with AI, you can kind of instruct that code-based analytics in natural language.
Philipp (:Domenic Ravita (12:31.135) Yeah, I think like most shifts in the software market over the years, we see a convergence while we're also seeing a divergence and fragmentation. And so I do think that you're of course seeing AI being embedded and bolted on to existing tools. And there's lots of examples in the data analysis and BI space and other categories of software. But we are also seeing totally new re-envisioning of what can be done.
At Plotly, we talk about that as being AI native. So rebuilding from scratch using the LLM as a foundational building block. It's a new way to do software engineering. Some people are calling this the agentic SDLC, software development lifecycle process. But then that's the internal product building perspective for users. It enables new kinds of ways to engage with data, for us focused on data analysis.
And we've seen the innovation over the years from Xerox PARC with the mouse to the GUI interfaces. And now in recent years, the AI chat kind of interface. And we see that being added to a lot of products. What we're envisioning and what we've implemented in this product that we're soon to be released is a new kind of way to engage with data.
And we're thinking of this as spec-driven business intelligence. And that is the kind of specialization and divergence part of this, which is in every category for every sort of task that people are doing, you're going to start to see more and more specialized AI apps or AI-native apps. Some people call them LLM apps, but whatever the name, it's basically building on the LLM as a new capability that gives us new, faster, accelerated, augmented capability.
Some people are projecting that this will become fully autonomous for whatever that task is, like the autonomous data analyst, let's say. We may get there as an industry in the next few years, but even if we don't right now, it's enormously advantageous. And every data analyst should be thinking about how this changes their workflow, because they should be using this to their advantage for that acceleration and augmentation.
There's a concept that's out there that's been floating around in the last few months that's sort of a riff or derivative of this idea of vibe coding. So vibe coding is what software developers do with an AI coding IDE to quickly create AI-generated code. Andrej Karpathy coined this back in January. There's an analogous term that's sort of floating around called vibe analysis, where in vibe analysis, it's more akin to what a data analyst would do to find insight more quickly.
And it's something that the concept reflects a lot of what we're seeing as a way to encapsulate this changed way of working. And it doesn't explain everything you need to do in data analysis. But as far as that initial insight, it's so fluid and so simple that it keeps you in the flow. And so that concept of vibe analysis is – there's a visiting professor at MIT, Michael Schrage, he's written an article recently in just the last few weeks to provide some insight on vibe analytics and what it is and what it means for workflow. It's published in the MIT Sloan Management Review. I think your listeners would want to check it out. A really great case study. He looks at three different examples. Now in that case, they're piecemealing different things together to create this new analytic process. But that's in the direction of what we see as, in the data analysis space, more specialized tools and like what Plotly is releasing in that vein.
Philipp (:Do you see that the data scientists and business users might at some point be using the same interface? And what's your view on the differentiation between data scientist work or data analysis, like deep mathematical data analysis versus data storytelling?
Domenic Ravita (:I do think that is a large part of where data storytelling is applied most often – how do we understand what's happening with these numbers? And mostly they're not exploring new questions. There's no data science there. That's just business intelligence about what's happening. And the data storytelling is applied. That's not to say that data storytelling doesn't fit in when exploring new patterns in data with machine learning. It certainly does. It's just for, I guess, different audience in a different situation. So that's my take, just in terms of what I encounter most often is the operational and descriptive analytics use of data storytelling.
Philipp (:Chris Parmer (19:11.02) Yeah, I've always viewed data science as a really broad category from a technical perspective. I personally kind of define it as anything that involves another level of computation on top of the data. And I think in many times it's been really associated with things like machine learning, but across industries, there's a lot of computational work – science and engineering and finance might not necessarily be machine learning, but is heavily computational. Things like bioinformatics, or if you're in finance, you're doing quantitative analytics, optimizing, evaluating different portfolios.
And then I think on the business side, looking at different scenarios that might be, could be in Excel formulas that are pretty complex, or could be something more sophisticated that you can only do in code. And I've always long believed that all of those types of computational work, they always have a lot of parameters to them, right? You're not just looking at one scenario. You want to explore those different scenarios.
And the self-service visualization tools that have been around forever, the drag and drop kind of tools that Domenic talked about earlier, can't really run computations at all, which is what you're doing in all these industries with programming tools like Python. And so at Plotly, we've built out these first-class interfaces with our application framework Dash that enable stakeholders to play around with those scenarios that the data scientists, quote unquote, are creating, whether that data scientist is a quantitative analyst or a scientist or an engineer.
And what's remarkable now is that the barrier to entry to doing quantitative computational work is now becoming a lot lower thanks to AI. And so what we've seen with our latest product is that data scientists can define these types of models in natural language. The end user, the stakeholder can still interact with the published application, play around with the scenarios in the GUI that was generated by Plotly.
But then if they want to remix and create their own analysis, now they can do so using the same sort of natural language prompts and create their own scenario. There might be a little bit of a limit of complexity today for the types of data science models that can get created, but that's getting smaller and smaller. That gap is getting narrower every day.
And so it's tremendously exciting because the backend, which is Python, is the same sort of backend that can be used by the stakeholder as the data scientist, right? And there's a lot less switching of technologies. Everybody can kind of be working with the same technology through code directly or through natural language.
Philipp (:Chris Parmer (22:45.481) It's a really good question. I think the way I look at it today is that a picture still tells a thousand words, right? And so the primary interface for understanding data and understanding scenarios is going to be through visualization. And with AI, this can now generate the code that generates new visualizations. And it allows you to look at the data a lot of different ways really rapidly, really quickly. And I view that as the primary interface.
I think right now, the insights itself are still sort of created by humans, by interpreting the visualizations. That may change in the future where then vision models are interpreting the graphs themselves and the analytics that get produced at the end of a pipeline and then fed back. But today I view these tools as more rapidly accelerating the different ways that you can look at the data and come up with insights yourself.
Philipp (:Chris Parmer (24:08.374) I'd say the aha moments I have are when the AI creates visualizations that I might not have thought of myself. There's a lot of sophisticated or even basic visualizations that I might not have in my mind's eye before I create them. And the AI systems that we've been building produce a set of charts and present them to a user. And there's a wide variety of charts that are presented.
Thanks to an LLM's world knowledge, a lot of the charts are pretty relevant to the industry or the domain of the data. So I'll come in with a data set and kind of have an idea of how I want to look at it. But then I'll be presented with a lot of different ways to look at that same data that will then spur additional questions. And that's a pretty remarkable workflow because I can come in and then see new pathways to explore automatically.
Philipp (:Chris Parmer (25:24.811) Yeah, one of the data sets I've really been enjoying using the tool with is in San Francisco, there's public data around 311 calls, which is the city complaints call. If there's trash on the sidewalk or somebody's blocking your driveway, things like that. And just throwing in that data set and then immediately seeing some analyses of doing neighborhood comparisons, one neighborhood versus the other neighborhood and seeing different response rates and how they might differ between neighborhoods.
Seeing comparisons like how has this year been different than last year? We have a new mayor and administration in the city, so has that changed any of the response times in different neighborhoods? Has it just cleaned up the area around the civic center where the mayor's offices or not. So you know I went in kind of with some broad curiosity about the data but then was immediately presented eight different charts that showed me different directions that I could then go explore including some that I didn't immediately think of.
Philipp (:Chris Parmer (26:47.478) Yeah, definitely. I think a lot of that then comes down to the data itself. And that's the classic thing, right, in data science is that data preparation, data quality, and cleaning is 80% of the job. So we've looked at some data sets, and there'll be these missing bars in the data or in the visualizations and have users think, maybe the AI messed up this graph. There's this missing data. But then you look into it and you see, actually, for some reason, March actually just isn't in the data itself, right? And so the AI will generate code that will visualize the data and that visualization will kind of only be as good as the underlying data itself.
Philipp (:Chris Parmer (28:11.734) Yeah. And when you're looking at data, you're presented with a lot of different options and assumptions with exactly the things that you mentioned. How do you bin the data? If you're taking a moving average, what window should you choose? If you're filtering the data, what should the filters be? If you're comparing two different visualizations against each other, should the y-axis of the visualization be shared so that you're viewing the same range of the data, or should they be independent?
And in a lot of these questions, there isn't really a right answer, right? I mean, there's... And the AI will pick something. I mean, just the same way if you give a visualization assignment to yourself or to a colleague, those assumptions will get made.
As product designers, the way we're thinking about this is trying to present the ambiguity to a user in the form of different controls. So when there's options to be made like any of those things – the binning, the moving average window – instead of just picking an assumption, we will encode that in a dropdown within the visualization so that the user can immediately see, okay, the binning is one month, the binning is two weeks, the moving average window is seven days, the Y axis are shared or not shared.
And then the user is immediately seeing a user interface where they can see these underlying assumptions bubble up and are presented with them immediately. And then can choose from there how they might want to view the data. And I think that's a really cool approach that we're taking. In other AI systems like chat or things like that, a lot of those assumptions might be encoded deep within the language models' understanding or thinking or reasoning, or in the code that it generated, not really surfaced to the user. And some of those assumptions could be dangerous.
Philipp (:Domenic Ravita (30:31.983) Yeah, I want to return a little bit to that discussion we were just having around messy data. Most data turns out to be messy. And as Chris said, it turns out to be like 80% of the job of a data scientist or analyst. And even in that realm, under discovering and exploring the data, the AI really, really helps to accelerate that.
So to identify, I have missing values here. In Chris's example, I'm missing March. Getting to that faster helps to speed up that exploratory phase of understanding what's in the data. But if you think about the full process from getting a data set, gathering it, establishing some hypothesis, doing the analysis of your problem, having a diagnosis and then maybe synthesizing that and sharing that. That's what we see as the opportunity at every one of those steps to enable this to be augmented and accelerated.
So that when you do this at each of these steps, it just changes the nature of what it's like. And that the engagement through that whole process with the data is what we're looking to innovate. So I don't want it to be lost on folks that it's just helping create visualizations faster. It's much more than that. I mean, even with our own products, what comes out as the end result is a completely functional data application. So if you're trying to communicate and synthesize ideas, you have a ready-made artifact that now you can communicate that with or extend and modify.
Philipp (:Chris Parmer (32:51.291) Yeah, it's totally essential to this product. And so there's pieces in the user interface in the final data application that gets generated like I talked about, the encoding in the different parameters of the model so the user can play around with it and get a sense for it. The other unique thing we've been doing is we auto-generate a specification file in natural language and in the language of the user's choice that describes exactly what the code is doing. And that's done by a different AI agent that's running. So it's not biased by the previous code that generates it. It looks at the code verbatim and it starts to allow folks that might not be as code literate to understand the underlying analytics and to see what other assumptions might be encoded.
Another area we're building out is a very transparent logs interface that will show the user the data transformations that are happening step by step in the code that gets generated. I think that when AI and ChatGPT came out and captured everybody's imagination a few years ago, the thought around data analysis was you just give a raw data set to an LLM and then you'd say, hey, what's the answer? And you'd hope that the LLM would look at this long string of numbers and then come up with a result. But that doesn't functionally work, right? LLMs are just generating tokens. And so they're not actually running computations on numbers.
The era that we're in now is that the LLMs aren't processing data. The LLMs are generating Python code and that Python code is processing and analyzing data. The LLMs have some knowledge about the data set, the column names and the types. So the code it generates is domain specific and it's taking in what the user wants to visualize. But the actual processing of the data itself isn't done by the LLM. It's done by the code that it generates. And so it's a lot more rigorous.
And so the tools and the interfaces that we're building are really actually about providing more transparency to what the code is doing. And that's things like putting in these debuggers and putting in these logs and trying to make a lot of these things that were available in software engineering for a long time much more accessible and transparent to a much wider audience.
Philipp (:Chris Parmer (36:17.077) It's a good question. I think the assumption that we take right now is trusting the user and the storytelling that they want to do. I think that there's always been a risk in storytelling, even without data, about the stories that you want to tell. And if a user wants to tell their own story, they have tools, they had tools before to do that, and I think they'll have tools to do that with AI as well.
that Steve Jobs presented in:At Plotly we don't have 3D pie charts, so you can't do that type of thing. We put a lot of care into the defaults that we provide in our visualizations and in the AI-generated application so that those types of awkward storytelling isn't really possible. Our pie charts that we have, we order the sectors from largest to smallest and we label them. So we put in a lot of care into the defaults. And I think that enables good and honest storytelling. But if a user wants to lie with their data, they can certainly lie with their data.
Domenic Ravita (:Damien (38:26.742) Yeah, Sam Altman did that quite brilliantly. He straight out of the Steve Jobs playbook that one. Yeah.
Domenic Ravita (:So to your point, I think clearly that's not going away immediately. It's easier now than ever to create these visualizations in these different ways. It is still up to the creator and the communicator to make sure it's accurate, to verify the results.
Chris Parmer (:Damien (41:00.044) Yeah, so it seems like the only point of possible risk here is obviously that the LLM or the agent is generating the code. That's the only part of the lifecycle. So if that is the case, how does the non-technical user who has no clue about coding, they get a fantastic chart – how can they tell if the code that's been generated has been accurate or do you have any idea how accurate is the code that your system is generating to do the analysis?
Chris Parmer (:With all the tooling that we're putting around it today in the auto-correction loops and things like that, the accuracy is in the 90s. So more than nine out of 10 datasets that you put in will create applications where eight out of eight, six out of six of the charts will generate and if there's an error, maybe one out of those eight, one out of those six might have a syntax error. And that number's going up.
But then there's the deeper question of, how can you ensure that the analysis is correct itself? And that's where building in additional tooling for really transparent verification of the code is really important. And we're putting that into the product. So, an English language description of what the code is doing. And that's actually relatively straightforward for analytics because analytics code are kind of scripts. They're a set of steps that happen. It'll say things like we are going to bin the data by hour and then we're going to take the average and that's pretty much one to one with the lines of code that get written and there isn't much hallucination that happens in between those two steps.
But then actual verification of the transformation of the data as it happens step by step. So users can see the raw data and then they can see the data that's been binned and then they can see the data that is transformed in allowing them to really easily spot check the data step by step, which is truly how you would verify any data analytics project, even if it was created by a human.
Chris Parmer (:Philipp (44:44.774) I think, I mean, you mentioned earlier the intent of the user. So you could think of scenarios where users, non-technical users just accidentally communicate a certain agenda or outcome that they want to see to the AI and it tries to please the user and move or work the analysis towards that. But you could also see like really people sitting there with bad intentions, who have a climate dataset and wanted to come up with some plot that confirms to their supporters that it's not real or something like that. In that kind of context, do you think AI should also sometimes deliberately push back against the user's narrative or confront it somehow?
Domenic Ravita (:In the computer industry, when we talk about reasoning, I take that as a metaphor. Because I think if you want to do logical reasoning, you're going to use, there's already an area of computer science that solves for that, which is symbolic AI. And that's not what we're dealing with with LLMs. They're neural nets. It's a totally different model. There are some people, some companies and researchers that are trying to find a synthesis of those and make sort of a neuro-symbolic kind of AI. That's an area of active research. I have not seen a lot of commercial products that do this yet, but I just think it really is up to the human in the loop right now to interpret the value and to make a value-based judgment about it. And in that way, we can't rely on the AI to think for us. But it's a great tool to accelerate all these steps to be guided by somebody who knows what they're doing in a particular domain.
Chris Parmer (:But for our system, there's this level of interpretation that happens to the visualizations. So in that scenario, it would generate code for me that shows this year versus last year and visualizes it for me. And it won't interpret that chart for me. It'll show me this year versus last year. And that's going to be based off of the underlying data. Now, could it end up making up data in the meantime and really trying to tweak that? Yeah, probably. We discourage that from happening and the end viewer of the visualization should be able to see the type of data that it was based off of to know whether that was right or not. And if somebody wants to make up data today to tell their own story, they could do that without AI as well. People have been doing that forever.
Damien (:Philipp (49:46.271) Do we have any technical guardrails in place in the AI pipeline to prevent misleading results?
Chris Parmer (:Philipp (50:51.644) Makes a lot of sense, yeah. Maybe a couple questions on topics that we have already touched on. Both of you mentioned that AI-generated data analysis is more relevant in some sectors or industries than in others. Could you give some examples on that?
Domenic Ravita (:And often those proprietary formats are increasingly being replaced with an open format specification, right? I mean, even in just look at the world of databases, like internal database storage formats used to be some arcane proprietary data structures that have only a few database engineers in the world. In the last few years though, there's just this abundance of open table data encodings for optimal compressed storage and the database has sort of been turned inside out. I think if it can be done in general purpose databases like that, it will certainly be done in these specialized applications that you find in industries like manufacturing and etc.
Philipp (:Chris Parmer (53:41.392) Yeah, I think once you start working just with natural language, it sometimes isn't always clear what's possible or not possible with the underlying code itself. If you're coding directly, you can kind of have an idea of what you're doing is possible or not because you're having to write that code yourself. You're having to come up with a strategy. And so we've seen some cases where folks are asking for things that the code can't do. And it's a little bit hard to get the feedback of whether it's actually possible or not. User interactions in the chart or things like that. And you can kind of get in a little bit of a rabbit hole that way and spend a lot of time spinning wheels.
This is also kind of related to the jagged frontier concept in AI today, where it's remarkably great at some things and less capable in other areas and not apparent at all to anybody what those areas are until you start to have experience with it. And this is something I tell our users too when they're wondering about how much of their previous expertise in Python or data science or analytics or Plotly's libraries itself is still useful in an era where AI is writing more of that code. And the better you know the underlying fundamentals and what's possible under the hood, the better you'll be at steering the AI and coming up with answers that are actually able to be achieved.
Domenic Ravita (:And even internally within the marketing department at Plotly, our content marketers analyzing data as part of content as part of SEO and we're looking at what performs well for managing campaigns. So there's data analysis even in the content marketing role and with no instruction was able to just get Plotly Studio running and start to do analysis and share it in the team call, just ad hoc. And that's in general like.
As a marketing leader, I want everybody in the whole marketing team to be hands-on with the product that the company is providing. That's hard, depending, especially if it's B2B data infrastructure. But in this case, it also just validates the point that this kind of AI-powered data analytics really opens the possibility to the one billion knowledge workers on the planet.
Just anybody with data can get started. As we've discussed, as you get further along in that analysis, you will start to encounter some edge cases and speed bumps. But our objective with Plotly Studio is to really help you smooth out those friction points for you, anticipate them before you get there, and basically let you keep vibing to get to those insights.
Chris Parmer (:And that chart builder is completely different from one tool to the next, whether you're plotting X versus Y or in Tableau, rows versus columns, which I never really wrap my head around, or dimensions versus measures. I mean, every single tool has its own complicated UI-based chart building tool and natural language is kind of the equalizer here. And so we've pulled in all of this data from different systems and visualized it. It's remarkable to be able to now visualize data from different systems so easily because you can just ask for the visualization of the data app that you want in natural language without needing to go through the learning curve of learning a new tool.
And I think we see that in so many industries and companies where there's a couple of specialists within the company that are the Salesforce expert. And if you want a report made, you ask that expert to make that report for you, or you spend a couple of days figuring out that new tool, right? And I think this has the opportunity to really enable anybody at any company to visualize any data because it's a really simple, common interface. It's just natural language.
Domenic Ravita (:All that specification is all natural language. And it's very easy to just use the content in the specification, any piece of it to refine and remix or regenerate to do your next iteration. So in that way, it's sort of teaching the user, giving them the nouns and verbs to use for the next regeneration, for the next refinement and giving them that vocabulary. So the way of learning how to do this is just by doing, rather than, oh, I need to go consult a documentation page and learn the syntax of this language.
Years ago, now I'm really dating myself. I worked for a telecom company, and my job as the new developer on the team was to develop SQL reports on the performance of database queries for a call management system. So it's like a call center app, essentially, for a regional telco. And they had a technology called Brio technology. And they had a reporting scripting language called SQR, which I think stood for something like structured query reporting. And that sort of proprietary script report scripting languages, like what you see now in essentially like Power BI DAX. It's just an evolution of some proprietary language you have to learn some syntax. And this is the big step up or quantum leap that we're seeing now. It's like you don't have to go study and read a book and go through the tutorial. You just get in the tool and use the specification that's created to iterate.
Chris Parmer (:Philipp (01:02:35.472) Yeah, it sounds almost like both of you are imagining a future where the business leaders would just skip static dashboarding or reporting and instead use an AI tool to perform ad hoc data analysis. Or maybe even when being in meetings and making strategic decisions, you could already have a tool like that on your side that provides the insights basically on ad hoc and in real time.
Domenic Ravita (:Damien (01:03:59.599) We've got 10 minutes left guys, so I'm just wondering if there's anything that we, because we've been kind of freewheeling here, which is great. Is there anything that we haven't got to yet that you really wanted to talk about?
Domenic Ravita (:Damien (01:04:55.502) So let me phrase that as a question then so we've got it. So a question for you, Chris, can you give us an overview of some of the under-the-hood development you're doing to the platform that makes it or sets it apart?
Chris Parmer (:What you'll see if you're using other vibe coding tools out there is sometimes it'll generate code that's in a single file, for example, thousands of lines long, which makes refactoring really difficult. So some of the steering, for example, that we do is we split everything out into separate files. We use a really nice project structure by default. We provide nice templates so that the code and the applications that get generated are pretty consistent. And with that consistency, you get a high level of accuracy.
There's also just a ton of tips and tricks that we've learned over the years of building apps ourselves and for our customers that we're also encoding, so different parameters to use in Plotly charts, for example, like we have WebGL-based visualizations that are much faster than our SVG visualizations. And you can turn those on with a certain flag. And we just do that type of stuff by default within the product.
I think LLMs today are really powerful if you're a really good operator and you have a lot of that existing knowledge. The better software engineer you are, the better you'll be at using an LLM to do software engineering with you. And so we're taking a lot of that knowledge and just encoding it by default into the product itself so that our users get all of these hard-learned lessons and defaults included in the out-of-the-box experience.
The other thing we're doing is just packaging up all of this stuff so that it just works when you download it. Installing Python is a notoriously difficult process, especially for new users. And we're just packaging that up as part of the application so that you can download it and you don't even necessarily need to know that it's included. It's part of the runtime. It just works. And there's been a lot of engineering to get that to work across platforms out of the box, dealing with different issues with certs, self-signed certs, corporate networks, permissions, and things like that.
And then once we take the code, we actually run it, we auto-correct it, we're doing a lot there in providing the right context to the LLM if there's a syntax error in the code so that it can do a much better job correcting itself. So pulling out the scope of the variables at the time of the error, providing really nice debugging traces automatically in the loop so that things can correct itself.
And we're structuring all of the code in a way that can be run by multiple agents in parallel. So we're generating two to five thousand lines of Python code for these applications. And with today's inference speeds alone, if you were to generate that amount, that would take about 10 minutes. But we are structuring our code in a way that it can be run and generated in parallel. And that means that these applications can be generated in a minute and a half to two minutes.
And so something I said earlier, I think is really true is that in these AI products that you're building, 30% of it is the LLM. And then 70% of it is this tooling that you're building, this agentic tools that run in a loop to provide a really nice product out of the box. And a lot of the errors that you might see with agents today, they compound, right? If you have an agentic AI loop that takes 10 steps and each step is 99% accurate. Well, after 10 steps, the accuracy of that full process completing is 99 to the 10th power, it's going to be 90% or something like that accurate, right? And so that's one of the really hard-learned lessons that the industry is facing right now in building autonomous AI agents is that these errors compound in a really tricky way. So we're doing things in a way to limit the number of steps and to improve its self-correction ability so that we can have a higher level of accuracy out of the box.
Damien (:Chris Parmer (01:10:27.99) Yeah, it's our hand-rolled engine around this that will take the code, will run it in our own environment, and we provide and expect a common structure around it. So it's not necessarily a freewheeling agent that will run on its own. We provide certain instructions, expected inputs and expected outputs, and really constrain things so that the accuracy can be high and so that the way that we test the system can be consistent.
The AI systems work best if they can work in a loop so that it not just generates code, but it can run that code and then it can test that code and then it can fix itself. And if you ask an LLM to generate code, it won't do that by default. And so the structure we provide is around generating the code in a way that can then be easily tested. And then we have our own testing engine that will then test that code and then provide the right error correction loops for auto-correction.
Damien (:Philipp (01:11:43.44) Maybe one question, like if you're looking out into the future, is there one thing that you think people will get wrong over the next years in terms of AI and data visualization?
Chris Parmer (:Domenic Ravita (01:13:05.913) My view on it is that I think over the next maybe year to 18 months, a mistake that could be made is a lot of people thinking that to do this effectively, you have to have this extra set of data infrastructure. You need a well-formed data warehouse. You need an external semantic model or semantic layer. I'm already seeing these sorts of statements about, to make it useful, you must have this other thing. And I think we have to just think about what those data sources and data structures provide.
And as we talked about earlier, garbage in, garbage out is still true. If your dataset is just messy and missing values, the first step is to clean it up. With Plotly Studio, you can even help it. You can identify that more quickly. But in terms of the structure and what's a metric or measure that you can rely on, it needs to be in that data. You can essentially create, by what dataset you feed Plotly Studio, create the right semantics. You could also create the wrong semantics, but it doesn't necessarily mean to do that you have to have this other third data element or structure. It can just exist in a regular Postgres database. It could exist in a parquet file. And of course it can exist in some semantic layer. My point is just that you can get that served without having necessarily to have a different infrastructure there.
And one thing too is, well, under the covers, we're essentially also creating a semantic model for the dataset that's being provided. And that is where we take that is, to be determined how far we go with that. But also our vision as we connect to data sources is that some of these data sources we connect to are your semantic model. If you've spent the time and effort in modeling your corporate data, then we'll be able to access it.
That said, having worked around a lot of enterprise data projects from web service discovery systems to metadata management systems, master data management systems, data warehouse and everything between. The vision of those often is that it's the one source for all corporate data, for all subjects and all domains. In reality, it is one slice and one perspective. And if you can get it right for a couple of subjects, you're doing great. But there still is other data in other systems that's, and you need ad hoc analysis to get through. And they're always playing sort of catch up. So I don't think it's realistic in enterprises that there's this one universal data source that serves all of the data analytic needs, the data lake, the data warehouse, etc. There will be those there, but you also, or one semantic model, right? You're going to have additional sources.
Damien (:Domenic Ravita (01:16:52.88) Thanks so much. Enjoyed it.
Chris Parmer (:Damien (01:16:58.024) And thanks also to my co-host, Philipp Diesinger, and of course to you for listening. We look forward to catching you on the next podcast.