Artwork for podcast Data Mesh Radio
#126 Evolving from Data Projects to Data As a Product - A Data Platform Six Years in the Making - Interview w/ Blanca Mayayo and Pablo Alvarez Doval
Episode 1268th September 2022 • Data Mesh Radio • Data as a Product Podcast Network
00:00:00 01:07:27

Share Episode

Shownotes

Data Mesh Radio Patreon - get access to interviews well before they are released

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by DataStax AstraDB; George Trujillo's contact info: email (george.trujillo@datastax.com) and LinkedIn

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.

Blanca's LinkedIn: https://www.linkedin.com/in/blancamayayo/

Pablo's LinkedIn: https://www.linkedin.com/in/pablodoval/

In this episode, Scott interviewed Blanca Mayayo - Product Manager, Data Platforms - and Pablo Alvarez Doval - Lead of Data Platforms and Principal Data Architect - at Plain Concepts.

From here forward in this write-up, B&P will refer to Blanca and Pablo rather than trying to specifically call out who said which part.

Some key takeaways/thoughts from B&P's point of view:

  1. It's easy to fall into adding fit-for-purpose capabilities to your data platform but don't. Stay focused on managing your platform as a product - all aspects of it have lifecycles - and you can't try to fit every use case, especially before there is a need.
  2. If transformation, especially data transformation, is not tied to the business strategy, that is a major recipe for failure. You likely won't deliver good business outcomes.
  3. "Beware the proof of concept" - too many try to do a proof without the actual concept. What are you trying to prove and how will you decide/measure if you proved it?
  4. You can have everything necessary for a data initiative to succeed lined up - the sponsors, the will, the budget - and still fail. Nothing is 100%.
  5. Three common data initiative failure modes: 1) focusing only on the technology aspect and not does it meet needs and can we maintain and pay for it; 2) only treating it as an urgent tactical needs instead of playing into the broader data strategy; and 3) not considering how to actually do change management.
  6. Your platform is a product too - data as a product isn't just about mesh data products - think about capability lifecycle and how you communicate upcoming changes - especially deprecation - and help users migrate to the new capabilities.
  7. Acclimatize people to change and evolution. Most people in data aren't good with - or at least used to - evolution and preparing for said evolution because the cost of change for data has been so high.
  8. To do data as a product, you need the right balance of curious developers, risk/risk management, and capabilities.
  9. The most likely places to find reusability in your platform will be the mechanisms around data product production and maintenance - the lineage, CI/CD, data quality monitoring, security/compliance, etc.
  10. Reuse is crucial for a data platform - look to have data transformation and storage reuse of course but also really focus on providing templates and then letting users create their own templates. The transformations and handling data don't need to be overly exposed to users.

As important background for the conversation, B&P discussed Plain Concepts' journey from a consulting company towards a product company - they had been working with their clients for years, building out a reference architecture for each client's own data platform implementation. They were doing fit-for-purpose, solving very specific challenges with point solutions, not managing the evolution like a product. So the two pillars from data mesh that resonated with them most were data as a product thinking and the self-serve platform - platform thinking is key. They started to push back on their previous practice of putting in additional capabilities to the platform without a key business reason - it's cool tech! - and began to manage the platform much more as a product.

Per B&P, part of switching to product and platform thinking was starting to focus on "what to remove" from and what not to add to the platform. Now, they think about where they need to go to support all the users - and align everyone developing the platform on the same vision - instead of trying to support every possible use case. And it's okay for people to use technologies or services that aren't part of the platform if necessary. Shadow IT is only really in the shadows if it's not known about.


Another big learning has been the transition of pieces of the platform to keep up with the best available offerings. For example, they started using Hadoop and now mostly use Spark. The platform team was asked to support very specific use cases and did a temporary solution with an eye - from the beginning - on deprecation. And they recommend you start communicating deprecation as early as possible too and work with users to migrate when the time comes. What - typically homegrown - pieces of the platform are not up to the best practices in the industry? One they mentioned specifically was replacing their own data testing and validation system with GreatExpectations.


B&P discussed a major failure mode around data transformation in general: your data transformation not being a business strategy transformation. On the micro level, why is the use case you are developing good for the business? Then keep thinking about that up to higher and higher levels of transformation. Look at data mesh - if making a major transformation like data mesh isn't part of the business transformation, will you be able to even make necessary organizational changes? And you must constantly communicate about the transformation - the why and the what - and how the transformation will evolve too. You will learn along the way - leaving no room for evolution clearly doesn't work well.


And in general, people in data aren't used to evolution per B&P. In a few of their customers/clients, every other team except the data team has been good with the new ways of change management. And it's understandable, the cost of change in data has been high historically, especially with the data warehouse. To get the data team on your side, thinking evolution is good, you need to collaborate with them and get them to understand the reasons for change. Scott note: I know most listeners, you are the data team that wants to drive change but you are the bold leaders in data :)


When talking about change, B&P pointed to a common thread in interviews on this podcast: most data teams have not adopted modern software engineering practices - we're still essentially doing the same thing we were 20-30 years ago. The data architects and data engineers like to play with the bleeding edge technologies but they often aren't trying to adopt them for the right reasons. In B&P's view, instead of bleeding edge, it's better to use more proven tech most of the time unless there really is a capability that's necessary and only available in something emerging.


When asked about keeping an eye out for what should be added to the platform regarding reuse, B&P said they were seeing the same general patterns repeatedly, and even there is some parallelism into what other more general software or operational excellence initiatives require. It wasn't that it was exactly the same but once you zoomed out, challenges started to look pretty similar. And more often than not, they were mechanisms around data product production, not the exact data transformation and storage technologies; so lineage, CI/CD, data quality, security, etc.


Plain Concepts' data platform is split into three layers of reuse: the first is the technical layer, which they made too technical at first to easily use for people who weren't extremely data engineering capable - don't fall down the same trap. The second layer is all about templates that they've built out through constantly watching for patterns of use. There is even a template catalog. The final layer is all about enabling customers to create their own reuse-focused templates with an SDK.


B&P shared a few major underlying issues that will likely cause failure for your data initiatives. First, if your data strategy is only focused on the technical aspect, only on choosing a cool technology that you can't maintain and is too expensive whether that is time, cloud cost, license, etc. Second, trying to react to everything as an urgent tactical need or a short-term change instead of having it play into the broader vision/strategy. And third, not really focusing on change management at all - you need to align incentives, get your early wins to drive momentum, etc. It's a process to drive change.


"Beware the proof of concept", B&P said - many companies do proof of concepts (PoCs) but they don't have a specific goal or plan. What are you trying to prove out? Why will that drive value? Far too often, organizations cut corners, don't have enough team allocated to the PoC, don't have a strategy or goals, etc. Again, what are you trying to actually prove in your proof of concept? And how will you measure if you proved it? What will you do next if you prove it?


When determining if data mesh or anything similar is right for clients, B&P start from the pain points. What are the pain points and then, much more importantly, what is causing those pain points? Much like Scott frequently says: if data team centralization isn't your pain point, don't decentralize your data team! If you are looking at a new approach or technology, what are you really trying to even solve? And even if you line everything up, the sponsors, the budget, the will, things can still go poorly. Nothing is 100%.


Other Tidbits:

A few additional failure modes mentioned: 1) not continuing investment in your platform - it needs maintenance and innovation. 2) if you aren't having broad-scale collaboration and sharing.


Really consider how is IT seen? Is it just about cost optimization or is it a value driver? Can you move to a product mindset?


Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, and/or nevesf

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under "add payment"): AstraDB

Links