In this episode, Scott interviewed Lorenzo Nicora, Principal Data Consultant at data mesh and AI focused consultancy Mesh-AI.
Scott asked Lorenzo to be on to continue the series of interviews on domain driven design (or DDD) for data. It is a topic that many are struggling with so having lots of perspectives on it is crucial. On the episode title, a key output was explicit permission to skip a lot of the tactical patterns of DDD. Others have also said similar things but I wanted to make sure it was explicit.
Before we jump into the DDD parts, Lorenzo made a good point on your data mesh Proof of Concept / starting your journey. You need to start with manageable problems. Start with a consumer-driven problem but a source/producer-aligned data product. There is a lot of nuance in the interview on why this matters.
Per Lorenzo, identifying the domains is crucial but it is the hardest part of DDD. That shouldn't scare you because you can start with things being a bit blurry. It's important to understand your high-level domains but you can get moving without mapping out all of your domains.
A key theme from Lorenzo: the language is at the center of everything in DDD. It is part of the data modeling and it goes all the way down to the code.
Per Lorenzo, DDD is all about communication, knowledge capture, and knowledge sharing. Knowledge capture is about extracting knowledge and then writing it down. Knowledge sharing is about finding scalable ways to share context.
Some advice/pointers from Lorenzo:
Teams have to truly understand the language of their own domain - remove the ambiguities, even if that feels like it's putting in too much work.
Event storming is a great way to approach tackling DDD for Data.
Event sourcing is crucial for modeling the problem of the domain.
Terminology is very key - identify the domain experts who can find/choose the right name for each concept.
Keep a live document of terms and meanings - keep it updated!
Encourage everyone to use the identified terminology when naming and in the code directly.
Find your high-level domain first instead of your granular sub-domains.
Ask your consumers for their specific data asks and then back into what would be a good data product or set of data products to start with. Again, look for high return, low effort/investment to get some wins under your belt and build your muscle memory.
Some key things to understand:
Language changes - it changes across time and across the organization. The same words mean different things or different words mean the same thing. A major weakness of the central/enterprise data warehouse is the inability to easily deal with changes through time or nuance across the organization.
When you first identify your domains, the boundaries might be blurry and that's okay!
Data contracts are really crucial and the semantic issues, not the schema, are the most important - and hardest - part. And you can't just break contracts, there has to be a reason or no one will trust it is an actual contract instead of just a pub/sub model.
Study up on and really think about your data on the inside versus data on the outside. If you aren't familiar, there is a link in the show notes to Pat Helland's work on the concept.