Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Manisha's LinkedIn: https://www.linkedin.com/in/evermanisha/
'A streamlined developer experience in Data Mesh' articles by Manisha:
Part 1 - Platform: https://www.thoughtworks.com/insights/blog/data-strategy/dev-experience-data-mesh-platform
Part 2 - Product: https://www.thoughtworks.com/insights/blog/data-strategy/dev-experience-data-mesh-product
Article on Lean Value Tree: https://rolandbutler.medium.com/what-is-the-lean-value-tree-e90d06328f09
Blog post on mentioned data mesh workshops: https://martinfowler.com/articles/data-mesh-accelerate-workshop.html
In this episode, Scott interviewed Manisha Jain, Data Engineer at Thoughtworks.
Some key takeaways/thoughts from Manisha's point of view:
Manisha started the conversation with her thoughts on how to get going with data mesh, on-boarding any domain but especially your first domain. Work with a small team aligned with that domain to find how data mesh can align with how the organization works and thinks - this will be different for every organization. That alignment is crucial to getting people comfortable and driving buy-in. People have to be comfortable with how it will work and what are their responsibilities. As Manisha said, "…only when they're comfortable with that concept … will [it] make sense to go ahead and explore more."
According to Manisha, when you are bringing teams up to speed, it's really crucial to get on the same page on what you mean and what you expect from them. They often confuse data and data product for example. The differences can be subtle but are important to understand. As Chris Haas also stated in his episode, they are using the Lean Value Tree method to break down target outcomes into explicit assumptions and more manageable aspects of work. What are the bets you want to make and what are the hypotheses you are testing?
Your initial workshop(s) with a domain can also be a lesson in how to deliver value using a data mesh approach and prioritization. Manisha talked about how when working with a domain, you might identify multiple potential use cases. But you need to choose what is a priority to do now and why. This can surface what are the top one or two use cases and also show the domain how to prioritize as use cases continue to emerge in the future. The use case(s) they select to prioritize then directly lead to discovering the data products that needed to support the use case. And then you identify what skills and tooling are needed to actually execute and build and then maintain the necessary data products. Then, you can start to back into what a team working on the necessary data products (and potentially platform) look like. You can use that Lean Value Tree concept to really get specific because far too often in data work, things are left too vague. Scott note: Get specific, get explicit, chase away vagueness - but of course leave LOTS of room for experimentation and iteration as you learn and build.
When asked more about workshop dynamics, Manisha shared how they try to keep them from being too heavy on the domain - get a few people, maybe 2-4, who really understand the domain and can represent the business aspects, not just the data and/or technical aspects. Each workshop has its own goal as an outcome but it's important to first align data mesh to organizational goals, the business strategy. Then you can get into data mesh specifics. They call their workshops 1) accelerate, 2) discovery, and 3) inception.
Manisha shared some crucial dynamics when working with your first domain that do get easier as you bring on additional domains. In the first domain, it's crucial to really narrow in on understanding and definitions including roles and responsibilities. Data product owner is a new role, what does it actually mean? And there's the initial platform work too. But as you bring on your third, fourth, fifth, etc. domain, there is internal learning to share with the new domains. There is more clarity around what a data product is - they can even see already built data products - and roles/responsibilities. But you will need to definitely do a gap analysis to figure out how to best enable each domain as each domain is unique. So there is a balance - look to maximize reuse of platform, processes, organizational changes, etc. but don't look to force new domains to adhere to exactly how previous domains went through the journey.
For Manisha, it's very important for the platform team to think in terms of capabilities. Deliver capabilities, not technology, to the domains. Work with early data product teams closely and focus on what they are trying to do instead of how you want to solve the technical aspects. Focus on specifically what are they trying to achieve? Also, the platform team needs to consider what mesh-level capabilities are necessary when. Don't try to deliver a complete platform at the start - your platform is a product and minimum viable product, make sure you understand what minimum means and don’t go overboard.
The platform team can focus on a few simple things to drive to a good initial outcome/partnership with domains in Manisha's view: 1) how does the work create business value? What do the domains need to do to actually drive value? 2) How will users trust data, what does trust mean and what's needed? 3) How do we make it possible for domains to create and manage a data product that is usable and discoverable? By focusing on the task at hand and then mapping to capabilities to support that task, you can prioritize and deliver something useful and valuable without boiling the ocean. You don't need to try to include every capability at the start, that is a bad anti-pattern. Get close to the use case and find friction. You will also learn to recognize reusable components of the platform but some reusable components might not be evident at the start.
Manisha then went further into finding and identifying reusable components. The things that are most unique to each data product are the data modeling and data transformation in her experience. Almost every other aspect of spec-ing out and building a data product are reusable, merely customized to the data product itself. Finding the necessary SLAs and SLOs by working with consumers, that is a reusable process. How your SLAs are actually measured, the definitions around those SLAs are reusable. The infrastructure and CI/CD is reusable. The overall data product blueprints are reusable. So look to make these reliable as your organization learns how to build data products to make for easy reuse.
On data modeling and interoperability, Manisha shared that it's crucial to let domains evolve how they model their data as they learn. And interoperability, especially to support a use case, is of course important; but you will likely see a need for interoperability standards emerge when it's needed - basically, don't try to build all your standards ahead of time. That might be creating an enterprise data model with a different name :)
When asked specifically about sample data models and automated data modeling tooling, Manisha pointed to them being a double-edged sword. While they can be helpful, most (all?) data products need more custom data modeling to maximize their value. Essentially, the tools can get to a decent initial data model but domains should look to improve them. If Platform teams offer automated modeling tools, they should make sure there is a big caveat to their usage .
Manisha recommends you make sure your initial domain has strong enough data talent - whether existing or embedded - to communicate the basic needs to the platform team. Regular developers are often not going to be data fluent enough at the start to drive to exact data infrastructure needs like a data engineer could. But be careful not to over index towards tech too. Every domain will need people skilled in creating value through data modeling but you probably won't need people as advanced in data infrastructure later - the platform is already built by that point :D
It's important to differentiate what the platform should offer and what the data product developers should handle according to Manisha. The platform, at least the aspects around data product creation, should be focused on making it quicker, easier, and more reliable to create, deploy, maintain, and evolve data products. It sounds easy but it's actually easy to lose focus on that. Look for friction points in the creation and management lifecycle and automate what doesn't add incremental value. E.g. a data product developer shouldn't have to manually add data to the catalog so look to automate it - and yes, not everything should be built upfront :) Scott note: she added some good flavor around data product boundaries but it's very hard to summarize
Within the platform, Manisha believes it's very important to maintain team boundaries because shared resources become a bottleneck and pretty quickly can become very hard to manage. This is why Zhamak has been so clear on the data product as an independently deployable unit of architecture. Manisha gave the example of even the namespace for data products in the data catalog should be reserved for that one team so teams have a dedicated space to put all their data products.
Manisha gave some early mesh journey advice:
1) back to data product specification, you should create something that gives teams a very clear idea of what a data product is and encompasses. Scott note: still waiting for someone to open source their data product creation template…
2) if, as Zhamak says, data products are our unit of value exchange in data mesh, then making it easier to exchange value is crucial. Start to create standardized input and output ports so you can easily ingest and serve data. ETL shouldn't be a concept, it's ingest or serving only.
3) really focus on making it easy to discover and then implement SLOs and SLAs. Being able to understand and trust data is crucial to being willing to rely on it. That trust comes from good communication around SLAs.
Manisha believes learning the language of the business is crucial for data people. You need to extract the actual business value drivers and build to those so you have to be talking the same language - unfortunately for data people, the language that aligns to business value is usually the business language :) Look to ask more business user-focused questions than trying to get technical.
Quick Tidbits:
"… the data product spec should [at a] minimum talk about the data set ports, domain, service level agreements, how do I share my data, what does data sharing look like…" - Make your data product specification easy to understand what someone will create and what a consumer will receive.
Again, focus on a streamlined developer experience that keys in on autonomy. That's the way