Artwork for podcast SaaS Fuel
170 Sandy Ryza - Data Fusion: Minds on Orchestration and AI Innovations
Episode 17011th April 2024 • SaaS Fuel • Jeff Mains
00:00:00 00:48:35

Share Episode

Shownotes

In today's SaaS Fuel Expert Series episode, we have the privilege of diving deep into the realms of data fusion, where minds meet to orchestrate and innovate with AI at the helm. Join Jeff Mains, alongside our esteemed guest, Sandy Ryza, the ingenious lead engineer at Dagster Labs.

Together, we embark on an exhilarating journey, exploring the intricate dance of data orchestration within modern organizations. From reconciling the myriad versions of truth to sculpting flawless data pipelines, Sandy shares his wealth of insights, unveiling the challenges, solutions, and the compelling future of data management.

Tune in as we unravel the enigma of data orchestration, dissect the transformative power of AI in refining data pipelines, and navigate the ever-evolving quest for reliable data in today's bustling business landscape.

Get ready to spark your curiosity, ignite your imagination, and fuel your passion for innovation here on SaaS Fuel!

Key Takeaways

00:00 Shifted career to work with big data.

04:02 Embed machine learning into apps for data facilitation.

07:09 Data engineer focuses on productive data pipeline.

12:48 Making specific decisions often requires creating intermediate data.

18:26 AI and generative models reshape data handling.

21:55 Data monitoring and quality checks are crucial.

26:14 A Centralized data platform streamlines decision-making and productivity.

28:43 Diverse data platforms require a common, flexible layer.

30:47 Manage data complexity with code and trust standard processes.

37:06 Data diversity challenges healthcare organizations - innovation needed.

38:35 Data orchestration simplifies complex data transformation process.

41:13 Develop a vision for the data platform and stay flexible.

Tweetable Quotes

"The best thing you can do to improve the performance of your machine learning model is to get better data for that model." — Sandy Ryza 00:11:06

"So things very heavily in terms of data assets, both the source data and the final data and then also these intermediate data assets that can be useful for a bunch of different things."— Sandy Ryza 00:13:36

"I think you mentioned AI, and that's, of course, a big one. The rise of generative AI and large language models create this whole new world of data and this whole new world of data pipelines." — Sandy Ryza 00:18:26

You can't make decisions if you don't have confidence in the data. And I think it's one of the things that happens a lot today is, you know, an organization may say we're data-driven, but then if you really kinda look in into a way to make decisions is there may be some data, and sometimes they pay attention, but a lot of times it's, yeah, I don't know about that. I'm just gonna go with my gut.— Jeff Mains 00:23:26

"The human connection into that really helps to drive AI, and they feed off of each other. So either one alone is not nearly as powerful as the combination of the 2 together."— Jeff Mains 00:35:22

SaaS Leadership Lessons

1. Importance of Unifying Data Platforms: Sandy Ryza emphasizes the challenge of reconciling different versions of truth across various teams in an organization due to separate data platforms and datasets. A key leadership lesson is the necessity of advocating for a single data platform to increase productivity for machine learning people and data engineers and improve data governance for the entire organization.

2. Embracing Code for Data Transformation: Sandy stresses the importance of using code for managing complex data transformation and data pipelines. A leadership lesson from this is the need for SaaS leaders to embrace code and technical solutions for managing data effectively, rather than relying solely on web UI software.

3. Recognizing the Role of Human Intervention: The discussion highlights the role of human intervention in labeling for AI systems, addressing edge cases, and backfills in data engineering and data pipelines. SaaS leaders should recognize the value of human intervention as a driving force for AI and as a necessary component for handling specific challenges in data orchestration.

4. Leveraging Data for Decision-Making: Sandy and Jeff discuss the importance of using data for decision-making, product development, and building recommendation engines. An essential leadership lesson is the need for SaaS leaders to prioritize using data effectively across their organizations to inform strategic decisions and drive product development, focusing on customer needs and behavior.

5. Prioritizing Data Trustworthiness and Quality: Sandy Ryza emphasizes the challenges of ensuring data accuracy and reliability, stressing the importance of data trustworthiness and quality. A lesson for SaaS leaders is the necessity of prioritizing data quality and accuracy to build trust in their organizations' data-driven decision-making, ultimately impacting the business's success.

6. Building a Network of Datasets for Orchestration: Sandy explains the concept of software-defined data and building a network of datasets for productive data pipeline orchestration. SaaS leaders can learn the importance of creating a flexible and adaptable data platform that allows for seamless orchestration and data utilization across the organization.

Guest Resources

sandy@dagsterlabs.com

https://dagster.io/about

https://www.linkedin.com/in/sandyryza/

https://twitter.com/s_ryz?lang=en

Resources Mentioned

Advanced Analytics with PySpark

Cloudera

Remix

Clover Health

KeepTruckin

SPACE UNICORN SONG

Episode Sponsor

Small Fish, Big Pond – https://smallfishbigpond.com/ Use the promo code ‘SaaSFuel’

Champion Leadership Group – https://championleadership.com/

SaaS Fuel Resources

Website - https://championleadership.com/

Jeff Mains on LinkedIn - https://www.linkedin.com/in/jeffkmains/

Twitter - https://twitter.com/jeffkmains

Facebook - https://www.facebook.com/thesaasguy/

Instagram - https://instagram.com/jeffkmains



This podcast uses the following third-party services for analysis:

Chartable - https://chartable.com/privacy

Links

Chapters