Artwork for podcast Data Mesh Radio
#40 Getting Data-as-a-Product Right and Other Learnings From Adevinta's Data Mesh Journey - Interview w/ Xavier Gumara Rigol
Episode 4010th March 2022 • Data Mesh Radio • Data as a Product Podcast Network
00:00:00 01:09:41

Share Episode

Shownotes

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/

Please Rate and Review us on your podcast app of choice!

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

Episode list and links to all available episode transcripts here.

Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here (info gated)

Xavier's Twitter: @xgumara / https://twitter.com/xgumara

Xavier's LinkedIn: https://www.linkedin.com/in/xgumara/

Adevinta meetup presentation: https://www.youtube.com/watch?v=av6cT_r4orQ

Xavier's Medium Articles:

https://medium.com/adevinta-tech-blog/building-a-data-mesh-to-support-an-ecosystem-of-data-products-at-adevinta-4c057d06824d

https://medium.com/adevinta-tech-blog/treating-data-as-a-product-at-adevinta-c1dce5d394c5

https://towardsdatascience.com/data-as-a-product-vs-data-products-what-are-the-differences-b43ddbb0f123

Scott interviewed Xavier Gumara Rigol who has been helping lead Adevinta's data mesh implementation as Area Manager for Experimentation and Analytics Enablement. The discussed the data as a product concept and learnings from Adevinta's journey thus far. Xavi has put out some great articles and did a Data Mesh Learning meetup that are linked below.

One key aspect to data as a product is to understand the need for data product evolution, both relative to maturity and to what is consumed. This is a common theme in many data mesh conversations as historically, data consumption has resisted evolution and change. Consumers need to really understand that the business is evolving so what they consume will too. If you manage data products well, it won't be a sudden change but if we are trying to share insights into a domain, those insights will change. When thinking about data product maturity, it's totally okay to start by thinking of a data product as a single table or view.

Xavi also mentioned some pitfalls to forced data product evolution - e.g. getting it wrong as changes can be quite costly to backfill. Adding new attributes is easy but computing something for 3 to 6 months in hindsight can cost a lot of compute charges. To do do evolution right, versioning and deprecation plans are key.

To get data as a product right, Xavi recommends start by prioritizing which data you want to make available; this is a process, not a switch to flip. You should figure out which data is important for each domain and at the broader organization level.

Applying data as a product thinking to your data sets is easier said than done. While data mesh is a leading proponent, companies not doing data mesh can also use data as a product thinking - Adevinta started down this path before embarking on their data mesh journey. Of course, data as a product is far easier said than done.

For Adevinta's data mesh journey, they started with every data product being a single table. Data was originally centrally managed so interoperability was already established. However, the documentation was lacking and the general usability wasn't great. They spent their first few quarters just focusing on splitting their monolithic data production into separate pipelines for each domain instead of one giant cluster.

The giant cluster was becoming a major bottleneck as changes were hard and maintainability was getting harder every day. Now, each domain essentially has one data product but with multiple dimensions/tables. Each product is layered and each layer has different granularity and SLAs.

A few other notable points:

  • Xavi believes all data products should be accessible via SQL but definitely not only SQL.
  • Template/blueprints for data products are incredibly useful and important.
  • The tooling/practices to prevent application changes from breaking the data are just very lacking - Adevinta uses data model reviews but it's still not perfect.



Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Links

Chapters