Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Corrin's LinkedIn: https://www.linkedin.com/in/corrin/
In this episode, Scott interviewed Corrin Shlomo Goldenberg, Senior Product Manager of the Data Platform at BigPanda.
It's important to note that BigPanda is not at the stage yet where data mesh makes sense but this is a story of getting production of data into the heads and hearts of the application development team, which is a crucial aspect to doing data mesh well, whether it's done pre data mesh or as part of the journey.
Some key takeaways/thoughts from Corrin's point of view:
Corrin started with the tale of BigPanda and how she started building out their data, ML, and analytics capabilities. When she came in, they didn't have the infrastructure or really the focus on a scalable platform for storing and analyzing their internal data. They were doing a lot of this for external clients but hadn't moved to doing it internally, which is pretty common in B2B startups. But BigPanda wanted to do a data driven transformation of their business model so they had to change the situation around their internal data.
There is always a balance for when you start collecting data at scale in Corrin's mind. At a B2B startup, you need to ask how early should it be for the company but the same is applicable for an early-stage offering at a larger organization. Most development teams aren't tasked with dealing with creating the necessary data until far later in an offering's lifecycle but it would be nice if you could include it at the start. But it definitely isn't free so there is always a balance and the conversations need happen, hopefully earlier than later.
Corrin's tipping point for when you should really start to press development teams on creating necessary data is when it becomes hard to answer simple 'how many' type questions. It is also an easier conversation than a hypothetical one. If it takes more than a day to get basic information on how your customers are using your product, that's obviously an issue that's only going to grow. It's also a pretty tangible place to start.
When they started to build out the data platform, Corrin said it just made sense to start centralized. If the R&D team wasn't really thinking about data, trying to upskill them enough to take over the work entirely was probably a bridge too far. Plus, if your data requirements aren't complex enough to require decentralization, decentralization is often just an extra layer of complexity. So they moved to a high communication model where people can see what data work is happening even if it's controlled by the central team. They can slowly upskill the development teams to understand data instead of trying to hand over ownership prematurely.
Corrin talked about working with the team to understand the product mindset to data. Start from the why - it's easy to fall into the trap of trying to do everything because it might have value. That's what happened with data lakes that became data swamps. Focus people on the why and you can bring them more and more into working with data.
Similarly, while Corrin and team didn't have a lot of pushback on getting things done, she was very cognizant of prioritization and cost/benefit. Again, focusing on 'the why': what is most important and when? Why are the requirements like this? Can we cut the cost down by storing for less time and/or refreshing less often? When you say 'real time', what do you actually mean? Etc.
Corrin has been seeing good results from having strong ownership conversations. While the central team still owns the data, they are partnering with the domains as the domains still need to own the concepts and the understanding of the information. While this might not work at a large scale, it's perfectly normal and functional at a 300 person company. Scott note: centralization isn't the enemy until it becomes a bottleneck 😎
As with all global companies, BigPanda has some challenges around communication, per Corrin. Time zone differences and of course differences in focus are just two of them. So she recommends spending a lot of time to communicate to stakeholders about what you are building and why. It's easy to assume that because you build out a data product, people will use it but you have to work with people to ensure they actually use what you built.
Corrin pointed to the fact that many companies in the B2B space feel they aren't "data oriented" enough. She gave a few tips for how to become more data oriented but also has empathy for people feeling that - it's pretty common, most B2B companies feels they aren't as data oriented as everyone else. Similar to data mesh, where everyone believes all the other companies are far down their path. It's simply optics - companies project a better image than the reality of their situation with data.
Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf