Data Mesh Radio Patreon - get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Mozhgan's LinkedIn: https://www.linkedin.com/in/tavakolifard/
In this episode, Scott interviewed Mozhgan Tavakolifard, Data and AI Lead for the Nordics at Accenture. To be clear, she was only representing her own views on the episode.
Before we jump in, most of the conversation was about external data marketplaces rather than internal data marketplaces within an organization. It's also important to note that data marketplace technology and implementations are still in the relatively early stages - it's quickly evolving and maturing.
Some key takeaways/thoughts from Mozhgan's point of view:
For Mozhgan, data mesh is a perfect fit with data marketplaces as a data marketplace makes it simple for producers to easily share data in a standardized way and consumers to easily find and consume data with standardized metadata and access. Simply put, data marketplaces are the most sensible place and mechanism for sharing data in her view. They significantly lower the barrier to getting access to data and being able to understand data - including how much they can trust data.
So data marketplaces are good for internal data sharing but even better for being able to monetize your data externally according to Mozhgan. Again, the standardization and clear rules about what is allowable use means a faster time from discovery to value for both data producers and consumers/purchasers. Data having clear and concise SLAs means consumers can quickly go from discovery to trusting the data, meaning they can quickly leverage for their own use.
However, major pain points for external data marketplaces are trust and security - for data producers, they must create the trust in their data for others to use it but there is also a big risk to how data consumers/purchasers actually use data producers' data. Is it compliant/legal use? Is it ethical use? Will those data consumers properly protect the data they consume? If not, what is the risk to the data producer? How can we ensure proper behavior - whatever that may mean to the data producer - by the data consumer/purchaser?
Mozhgan believes blockchain/distributed ledgers might provide a good answer to be able to track compliant usage - are consumers meeting their contractual terms? Smart contracts are supposedly able to track this. However, ethical concerns are still not addressed in smart contracts, at least in a simple and repeatable way. The ways of doing this are still evolving. And she believes we can't really get to large scale data marketplaces without something like blockchain. Note: Scott is much more skeptical given there are few examples he is aware of where blockchain is really working for trust and security - can you really track usage in someone else's systems? What about their security capabilities to not have a data breach? Can we actually track ethical use in data?
Another aspect Mozhgan mentioned is that data consumers can only use data they purchase in ways allowed by the contract. Sarita Bakst mentioned this when talking about externally purchased data - data producers want to maximize monetization so data purchasers have to pay for each individual use case. So data producers want to track that consumers/purchasers are actually adhering to that part of the contract. There are a number of recent examples where data sellers will have wildly different prices for the data in PDF form versus an API. The API probably actually costs less to maintain but there's a strong correlation between consuming via API and getting a lot of value from the data consumed.
When it comes to data consumer trust - can they actually trust the data? - Mozhgan believes we are seeing better ways of tracking data quality all the way up to source. That independent verification is crucial. If data consumers/purchasers understand the exact quality dimensions, that typically makes the data immensely more valuable. Stolen credit card numbers on the dark web go for pennies because you can't really trust the source for example.
Mozhgan gave a really interesting example of where data marketplaces can take us. Utilities need to monitor trees and proactively trim them where possible so they don't disrupt powerlines or phone lines. But each utility typically does not have a great information set internally - often from a lack of the amount of data to actually be good at proactive tree trimming. So utilities are trying to get to a place where they can jointly share information with each other to improve their predictions for where to trim. However, a lack of a standard way to share data is really making it quite difficult to actually achieve the desired results. So how can we learn to quickly share information across organizations without a long and complicated process to do things like design a standard data model? Could a marketplace help?
"Data ethics is a nightmare," even not related to data marketplaces according to Mozhgan. This is not just AI model ethics with bias and the like but there are often unethical ways of presenting the data. Then of course, there are many companies collecting and using data unethically. And we don't necessarily always want to remove all bias - it may have predictive power. But we need to focus more on the impact of our decisions on the input and output/impact side with data. And she believes we can use a lot of the guardrails we use around AI to ensure ethics in data marketplaces.
Mozhgan recognized that ethics will always be a bit messy when sharing data outside the organization. One suggestion to prevent ethics issues is to only share the insights instead of the actual data used to generate the insights. Or you can share pseudo-anonymized data as well. But at the end of the day, ethics falls much more on data producers than most expect. You have a duty to not sell data that can be misused!
For Mozhgan, there is too much of a focus on the value generated from data work instead of the actual return on investment. This happened in AI with massive hype and it's happening more in analytics recently - everyone needs to be data driven, right?! You need to create a business case and look at what the expected costs will be for data work. We don't have really easy paths to predicting exact value but we can get better at that and be realistic about expected costs.
Knowledge graphs will be crucial to sharing data with other organizations and internally for data mesh.
It's crucial to see organizations as living, breathing ecosystems. Design your organization and ways of working to be able to adapt.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under "add payment"): AstraDB