Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Vikas' LinkedIn: https://www.linkedin.com/in/vksnov9/
Vikas' Twitter: @vikaskumar9 / https://twitter.com/vikaskumar9
Vikas' email: vikaskumar9 [at] gmail
In this episode, Scott interviewed Vikas Kumar, AVP and Head of Data, AI, and ML at CNA Insurance. To be clear, he was only representing his own views in this episode.
Some key takeaways/thoughts from Vikas' point of view:
According to Vikas, 2010 through the early 2020s the focus has been on moving the data to the cloud to better drive value. And now that more and more of our data is in the cloud, we are starting to see much broader adoption of things like ML and AI. The cloud gives us the promised but under-delivered scalability of the "big data" technologies along with the flexibility to move quickly and experiment. Cloud can also mean it's easier to bring non-data people into the mix to drive better collaboration between the data people and the business people/domain. So cloud gives us this massive scale and data availability but we still have to learn to better leverage our data, drive value from it - we are still in pretty early days there as an industry.
A big outcome of the mass movement of data to the cloud is how much time is spent on data management versus getting value from the data according to Vikas. DBAs used to spend 60%+ of their time just managing the data but data people's time is now focused on getting value and probably only 10-20% is spent managing the data specifically. But cloud can be a double-edged sword too - if it's very easy to create new data products or beta data products, you have to be very careful to not create overlap/duplicate work/data products. It all comes down to governance and your operating processes to prevent that.
As an industry, we are getting much better at serving data reliably at scale according to Vikas but we still struggle with the gap between the data is available and the data is able to be used by consumers in the business domains. We are still working on figuring out where to meet in the middle between handing people reports and maybe dashboards - a kind of old school approach - versus upskilling them to very high data fluency so they can build everything themselves.
When asked that question - do the data people have to learn all the business context or vice versa - Vikas gave the very data mesh answer of "it depends." But that makes sense because there shouldn't be a single prescribed method, you have to look at how your organization works and fit with that model. And you probably want to meet somewhere around the middle. Otherwise, you will cause unnecessary friction. So look to your general ways of working, cross train people, get people exchanging context about what they are trying to achieve and instill a culture of feedback and collaboration. That's how you can actually execute well on a data mesh strategy.
Vikas talked about your data strategy north star being about getting value from your data, reliably and at scale. So, you need to be realistic about where you are in that capability journey right now. As a data producer, you need to assess can your data consumers do everything necessary if you give them raw data or should you be curating it for them so they can actually leverage the insights. Work to find the high value return data work early instead of trying to do the most complicated aspects of data. It's okay to start small, no shame there.
A data product should always map to a target business outcome according to Vikas. But that shouldn't be the only factor. The reason for creating a data product should be trying to achieve that outcome so use that as the north start for the data product but we must build in a way where data products can be reused - sometimes with some additional work - for additional use cases. And it's really crucial to have a data product owner that is discovering and focusing on the objective of the data product. How can you provide the business meaningful data that meets their objectives, that should be a key objective of every data product.
When asked how do we balance focusing on the long-term wins instead of the quick - but typically small - wins, Vikas talked about the need to create a holistic view of your data and build a very strong foundation for how you will deal with data in general. That makes it so you can jump on the quick wins when you find them but you also have a steady foundation for making much bigger bets going after long-term big wins. But with a shaky foundational layer for your data, those long-term big wins are much less likely to pay off. And that foundational aspect comes in at the data product level too - build data products that can be easily extensible when it makes sense because they are built to be extensible from the start. Kent Graziano in the recent data modeling panel railed against having to rebuild every time you extend a data product, don't do that :)
For Vikas, there are many value streams for a data product - most people focus on the data set itself but it could be the governance work or the collaboration conversations between producer and consumer. We need to focus less on the data product as the exact output instead of the data product being the vehicle for delivering value but the overall product work itself significantly enhances the value of the data product.
Data governance seems to be the part of data mesh that confuses a fair number of organizations so they ignore at their significant peril according to Vikas. While you might not have to build every aspect of your governance upfront, it's crucial to think about how you will apply governance. And to truly get to the ideal of a self-serve platform, governance needs to be a simple part of the ways of working. Saving that for later is not going to end well for many organizations. And while access control is hard, we need to get far better at understanding who is using what and _why_. How long should someone get access to data? Forever access should be a non-starter. And how do we make it easy to grant that expiring access?
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf