Due to health-related issues, we are on a temporary hiatus for new episodes. Please enjoy this rerelease of episode 133 with Ammara Gafoor. There is a ton to learn from this one and reflect back on.
Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
ammara.gafoor@thoughtworks.com
Ammara's LinkedIn: https://www.linkedin.com/in/ammara-gafoor/
Data Mesh in practice article series from Ammara and colleagues:
In this episode, Scott interviewed Ammara Gafoor, Principal Business Analyst at Thoughtworks who has been working on a few client projects related to data mesh including one for well over a year.
Before jumping in, it's important to note that much of Ammara's learnings come from an implementation in a 100K+ employee company split into 21 high-level domains. So the definition of domain in this episode revolves around that context of a very large business unit, not a two pizza team size sub domain.
Some key takeaways/thoughts from Ammara's point of view:
Ammara started off the conversation sharing about how she and her team "had it all laid out" for the plan to standardize how they'd bring each domain up to speed on data mesh - from the introduction of new ways of working to being ready to participate in the data mesh implementation in just six weeks. And then reality struck. Each domain is different and much like trying to explain the benefits or implementation of data mesh, a single approach for all audiences just didn't work well so they adapted. Every domain is unique and required its own unique approach to make implementing data mesh in that domain work. There are of course some commonalities but each of the 13-14 domains that are part of the data mesh implementation thus far has had its own unique challenges.
So, Ammara shared some stories about working with different stakeholders. Often, the first stakeholder they encountered was an IT sponsor for the domain itself - either an IT leader in the domain or an IT counterpart for the domain. This persona typically wanted to bring them in and welcomed them with open arms. And while they were often bought in on data mesh, there was a push - from IT and often the business side - to only speak with IT. So Ammara and team had to work to get permission to also include the business people in the conversations about their proposed data transformation. Because without the business support and knowledge, your data mesh implementation is likely to fail. How many episodes have said tie your data strategy to your business strategy? But, the business people often have what they need currently via shadow IT. So why would they want to give that up? It's an emotional response to be asked to give up what you have for the greater good and the long-term.
There is the concept of immediate returns - you build a dashboard and there is immediate potential value - versus the mid- to longer-term returns from things like building your data platform and building out your data governance capabilities. Ammara has seen many times there is not any incentive to wait and focus on the mid- to long-term returns - if your funding this year is based on results this year, focusing on your results 2-3 years out is often doesn't feel like an option. They won't get rewarded for that long-term work. And most domains don't even have the capabilities to do said mid- to long-term high-value work. But to do data mesh right, we need to incentivize patience - and incentivize and provide the capabilities to do things right for the long-haul instead of just the short-term, low stakes wins.
According to Ammara, as part of a successful data mesh implementation, there is the technical stream - the Team Topologies meaning of work stream - but you must also work on the operational stream at the same time. And the product stream too. If you don't look to change domain's KPIs to align their operational work to data mesh "you won't prioritize it - you cannot prioritize it." You need to put a metric into place to measure progress - it doesn't even have to be a great measure! It's a way to start the conversation. There is too much of a hangup in data mesh around trying to get things perfect the first time. Get it done, measure it, iterate on it, and move forward. Don't let perfect be the enemy of done and/or good. Don't fall to bikeshedding.
The cost of change and the cost of failure in data historically have been very high per Ammara. But we have new economic models with cloud that make that no longer true. We now have "the privilege to be able to fail". Failure wasn't an option historically. But that's such a foreign concept to many, it will cause some to push-back. They have lacked the psychological safety to fail. And we have to understand why they are pushing back and work with them to understand that failure in a highly agile environment is incremental learning.
After picking the 2 most obvious use cases in a domain - again, the very large business unit concept of a domain -, Ammara believes it will reveal a 5-6 of the foundational source-aligned or "source oriented" data products of the domain that will be able to power most use cases. So just start building the MVP of those source-aligned data products because they will support other use cases down the road as well.
On Personas, Ammara laid out a few she and team have run into:
The IT sponsor - typically a Data Architect or Data/Analytics Lead; bought in to data mesh, likely after feeling the pain points as Zhamak has laid out. Trying their best to go wide on getting people bought in on data mesh and has some - but not a ton of - social capital to influence. Their social capital is more with the IT/data people and less on the business side of the domain. They are critical to get things moving.
The Business Owner - generally supportive of the data mesh initiative but doesn't have the time - or the incentive - to spend time on the data mesh implementation. You're trying to get their support by the promise of making their lives easier.
The Sideline Watcher - sees data mesh as probably 'yet another data trend'. Not pushing back but not taking a stance. Waiting for the tide to turn one way or another before making their own waves.
The "Yes to Your Face" - will say yes to you and then just go do whatever they were going to do anyway… These are inevitable - try not to take it personally.
The Product Owners - they are building the dashboards or the analytical solutions, desperate for the data. They really WANT to work with you but don't know exactly how - how can they get the resourcing and we're asking them to rethink the way they do their work. Help them figure out how they can partner where possible.
The data lake (or other historical data paradigm) builders - have spent so much time and effort to build a viable data lake/warehouse/etc. Often fight you because you're going against everything they've built. It's not personal against the data mesh team but it is personal if you put all their hard work aside. But they can build data initiatives very well, try to work with them and let them know you're building off the knowledge they've gained if not their direct work.
For Ammara, a lot of the data mesh literature and conversations feel like they say there are new roles and therefore there isn't room for many existing data roles, like the data warehouse or data lake builders/maintainers. But she thinks that's not a great idea - and Scott agrees. They are subject matter experts in how the domain's data flows and systems actually work and can be excellent guides to bringing more people into the data fold as they themselves pick up new skills. Trying to hire your way to a data mesh is not a great idea… No one is redundant, everyone has valuable knowledge for Ammara.
You need to make your IT sponsor successful in order for your data mesh implementation to go broad in that domain so that means learning the - and communicating in the - language of the business according to Ammara. That might mean you have to deal with the horror of PowerPoint Presentations. And as many guests have said, the selling points and implementation details of data mesh don't stick with the broader audience the first time. Repetition, reframing, holding of hands, etc. You won't succeed if you try to just message once. Be prepared to repeat yourself. And then repeat yourself again.
Ammara gave an example of why data mesh can really help improve communication and drive to common language. In manufacturing, there is the concept of "on time, in full delivery" as a very crucial KPI. And the domain had analytics teams constantly asking to build this for the different manufacturing lines while at the same time, the business side said they didn't have the information. How could that be when there were 10+ completed "on time, in full delivery" projects that had been funded? So once Ammara and team removed the data team from the picture, the business folks were able to talk with the regular IT team and they came to a shared, common understanding of what was actually needed and what was missing. It's pretty easy to lose sight of what the actual need and use case is when people are siloed by function.
It is crucial to understand the three streams of work model, per Ammara. The operating stream is "building the cadence for IT and business to communicate" in order to prioritize. This helps identify which data products will be built. The product stream is identifying the actual data products that need to be built, as in what are the scope and boundaries. The technical stream is about building the data product and the platform needs. Each of the three streams should have equal weighting. This is another way to think about your MVP thin slice, you must encapsulate some of each capability, each stream.
As previous guests have noted, many domains build data products that benefit themselves first in Ammara's experience. This obviously makes it easier because there is more buy-in and no cross-domain communication and prioritization friction. But that is just the initial stages of a data mesh implementation - still in phase 1 before going truly broad. More domains are moving to support use cases across domains so phase 2 might be up soon.
Ammara does not believe source oriented data products, ones that are difficult to understand outside the domain, should not be made freely available on the mesh; they should not be made available to business users within the domain or to other domains. And her reasoning is very sound: if the data products are difficult to understand, it's easy to misuse them and they are more likely to change with the source systems so breaking changes/versions are more common. Other domains can consume the information from those source oriented data products in specially designed consumer oriented data products instead of directly from source oriented data products. Data scientists are a bit of another story as they are data literate enough to do some spelunking but even then, data scientist beware.
Ammara is also seeing an interesting pattern relative to source oriented data products. When you really start to map out a lot of obvious use cases for a domain - and remember, the size of a domain in this context is quite large -, it might seem like you need a large number of source oriented data products. But when you zoom out further, it becomes clear that you can actually shrink those into a much smaller number, that 5-6 data products mentioned earlier for that domain.
The way things are evolving at Ammara's current client is 3 layers relative to data products and use cases. For each use case, there are one or more - typically two it sounds like - consumer oriented data products. Then each consumer oriented data product is derived from or powered by typically three to four source data products. So the domains are able to create multiple consumer oriented data products off the same set of 5-6 data products. But it's still early days and will likely evolve further :)
Encourage people to think business need first instead of data first according to Ammara. Think about what business outcome you are trying to achieve and then work backwards to what data you need to address that. If we are just sharing information without intention, it can lead to misuse of data - will people really...