In this episode, Scott interviewed Mohammad Syed, Lead Strategist - Data at Caruthers and Jackson about how data mesh governance has to be different from what we've done historically.
Per Mohammad, data governance in data mesh is very different to doing governance for either a data lake or a data warehouse. The warehouse has a focus on high-level quality and usability but at the expense of context and agility. Data lake is about metadata and lineage but at the severe expense of usability - schema on query is not fun for consumers - and often quality.
For most data organizations, governance has been very macro focused - governing the data warehouse or lake as a whole. That is part of why data governance has become a major bottleneck - the focus is on the macro but the individual requests are the micro.
In data mesh, governance can shift to being about maximizing the value of the data instead of mostly preventing risk. Of course, there is a balance between local maximization - the value of each data product - and global maximization - the value at the overall data mesh level.
A key focus to data mesh data governance is enabling - especially enabling the domains to govern their data products. Mohammad made the point that you need to enable your domains by creating the technical and business definitions of a "good" data product. Then the governance team needs to teach teams about the quality definitions, e.g. data product consumability. There is a need for policies of course but mostly focus on frameworks to enable policy creation and enforcement - decentralize!
A key point Mohammad made was: governance only works with informed governors - you must teach domains to govern properly. Transparency is key to make data governance work.
Mohammad emphasized the "good" data product definition leads to the separation of data quality and data product quality. A data product might be more valuable for other reasons - or less costly - by having relaxed data quality standards. In a data warehouse implementation, there is really only a single definition of "good" quality, but that just won't work in data mesh. We really need to develop better frameworks for what data quality means at the micro level.
To get data governance right, strategy and maturity are crucial - what are you actually trying to accomplish? Data mesh for the sake of data mesh is worthless, just like any other paradigm.
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under "add payment"): AstraDB