Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn.
Transcript for this episode (link) provided by Starburst. You can download their Data Products for Dummies e-book (info-gated) here and their Data Mesh for Dummies e-book (info gated) here.
JGP's LinkedIn: https://www.linkedin.com/in/jgperrin/
Amy's LinkedIn: https://www.linkedin.com/in/amy-raygada/
Andrew's LinkedIn: https://www.linkedin.com/in/andrewrhysjones/
Andrew's website: https://andrew-jones.com/daily/
Andrew's book: https://data-contracts.com/
Data contract standard project Bitol: https://lfaidata.foundation/projects/bitol/
JGP's blog: https://jgp.ai/
In this episode, guest host Jean-Georges Perrin, Data Innovation Consultant at ProfitOptics (guest of episode #130 and panelist in episode #227), facilitated a discussion with Amy Raygada, Senior Data Product Manager at Swiss Marketplace Group (guest of episode #165), and Andrew Jones, Principal Engineer and Author of the book on Data Contracts (guest of episode #29). As per usual, all guests were only reflecting their own views.
The topic for this panel was all about data contracts and how do we go about getting them in place. Much of it was about the general concept but some of it was specifically about how do we think about data contracts applying to data mesh. This was the first topic I really did a deep dive into in early 2022 and it has evolved but is definitely still evolving.
Scott note: As per usual, I share my takeaways rather than trying to reflect the nuance of the panelists' views individually.
Scott's Top Takeaways:
- Data contracts are about trust and understanding. Trust that there is an owner and there are rules, there is a minder that knows this data matters. Trust that things aren't going to break - at least as often as many things in data have historically and they will be told if it breaks. And understanding that what you're getting isn't perfect and there are rules but also limitations. It's no longer buyer beware, consumers can understand what they should get.
- To do data products well, you almost certainly need some concept of a data contract. Otherwise, you are essentially just putting out a data asset and calling it a product. Products come with guarantees of some sort.
- Data contracts are about ensuring better outputs with less effort for all parties. They are a quality assurance mechanism but also a scaling mechanism. It's a printing press for data in a sense - reusability where you don't have to carve things in wood each time, you assemble the tiles to have it say what you want but it's more about arranging tiles than defining everything - carving from scratch in this analogy. Standardized aspects of contracts help both producers and consumers communicate about the aspects of a data product.
- Like with anything related to data mesh - or really any good data practices - you can roll out data contracts over time. It's not a switch you flip and suddenly everything is covered. Start small and find value. Start with one or two teams / data products, figure how this can work in your organization, and then scale from there.
- While many may see data contracts as additional overhead for data producers, it's quite often a safety mechanism for them. They (hopefully) don't want to break things for downstream consumers but they often don't know exactly how their data is used. Now we have a way for them to understand the impacts of their changes and easy mechanisms to get in touch with the users of their data. Far fewer emergency response tickets to data breakages.
- Data contracts are very useful - potentially necessary? - when we think about interoperability between data products in a larger context. The contract isn't only about what is in the specific data product but how it relates to the rest of your data products, mesh or not. If you have interoperability standards or linking keys, those are important aspects to mention in a contract.
- To realize the vision of data mesh, we have to be technology agnostic. There will be tons of vendors releasing their own versions and visions. But at the end of the day, to actually be able to let teams have the freedom to develop their data products to best serve users, we need approaches over tools. Scott note: If you can't tell, I am skeptical of tooling in this space…
Other Important Takeaways (many touch on similar points from different aspects):
- Define your contracts where it's most likely to be updated. That's probably in the code for the data product, not having to go to some separate tool.
- Circling back to understanding, data contracts set expectations. Literally, they contain the expectations of what you should get with the data product. Expectations setting and boundaries are crucial to good human communication :)
- As always with data work, data contracts don't come for free. They take time for producers to engage with. Reduce the friction of dealing with contracts for producers but also incentivize them to actually leverage data contracts. Otherwise, it's just a request not a requirement.
- There are two different main approaches to data contracts when it comes to breaking changes - to collaborate on changes before they happen or to alert people a breaking change has occurred. It's better to be the first but you might start with only the second capability and that's okay.
- ?Controversial?: Using data contracts only as a blame mechanism when data breaks is missing the point. They can be a GREAT collaboration tool for negotiating between producers and consumers. They are a great starting point for those negotiations and then an agreement tracking and enforcement mechanism.
- ?Controversial?: As I've noted many times before, contracts can be a double-edged sword. If you have consumers that never meet with producers and share information, that can lead to someone leveraging data they don't fully understand. Contracts can give people trust in the data products they discover without digging deep enough. It's a very nuanced and hidden issue.
- Like any products practice, you will probably start out pretty raw and unsophisticated when doing data contracts. It's about getting to good, not starting there. Find value, find scale, find repeatability. Iterate to good, here's your permission to suck when you start.
- Data contracts can behave as great automated communication tools. Instead of trying to find all your users to update them about an upcoming change, it's automatic. Without automation, Amy said data contracts are "just a bit more paperwork."
- Data contract standards are important but must be extensible. Don't expect a standard to solve all your problems or fit all your needs, especially as they are just emerging.
- There are many choices you have to make for your organization around your data contract setup. Who owns the data contract is especially important. It should probably lie with the owner but if you don't have clear ownership of a data product/asset, then it's more likely a fact sheet about your data product/asset, not a contract. That said, consumer-driven testing is great in software, will we have some aspect of it in data?
- Circling back to communication, to get producers to lean into contracts, look to have real conversations with producers about the challenges the organization is having with data and things breaking. Work with them to find a better solution. They are typically software engineers, they like solving problems. Give them the KPIs that let them focus on data and solving things through contracts. It can't simply be more work, it needs prioritization.
- Data consumers need to be accountable to watching for changes to the data products they use. You need a good mechanism to alert them but if they aren't paying attention and something breaks, that's on them. Everyone has accountabilities.
- Should you have your security and privacy encoded in the data contract itself? I think it's early days there. It might live in the contract as the place of record for the platform or not. It's an interesting concept.
- ?Controversial?: Should we try to create the contract automatically and have a human change and validate or the other way around? Probably human with a template works best right now. Automation would be great but there's probably too much room for error.
- We want super clean implementations of something like data contracts - one standard contract across the organization. But it's just not realistic at the end of the day, especially early in your data contract journey. Every organization is messy in its own way especially with multi-cloud and many platforms, this is no different.
- How data quality plays into data contracts is a bit more complicated than people think. There are quality standards but checking if a data product actually complies with its SLAs and the standards is another interesting question that people are approaching differently, whether that quality enforcement is in the contract or not.
Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf