Artwork for podcast Data Mesh Radio
#166 Capital One's Data Mesh Journey and How They Created a New Product Along This Journey - Interview w/ Salim Syed
Episode 16611th December 2022 • Data Mesh Radio • Data as a Product Podcast Network
00:00:00 01:02:22

Share Episode

Shownotes

Sign up for Data Mesh Understanding's free roundtable and introduction programs here: https://landing.datameshunderstanding.com/

Please Rate and Review us on your podcast app of choice!

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

Episode list and links to all available episode transcripts here.

Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.

Salim's LinkedIn: https://www.linkedin.com/in/salim-syed-11981521/

Capital One Software: https://www.capitalone.com/software/

Capital One Slingshot: https://www.capitalone.com/software/solutions/

Capital One blog post on their use of Snowflake for data mesh: https://www.capitalone.com/software/blog/operationalizing-data-mesh/

In this episode, Scott interviewed Salim Syed, VP of Engineering at Capital One Software.

Some key takeaways/thoughts from Salim's point of view:

  1. !Contrarian!: "Data mesh works with central policy, central tooling, but federated ownership." Work to federate your infrastructure ownership, not just data ownership.
  2. No matter what size of domain or LoB, look to have the same responsibilities owned by each domain. They may fall under different people but the role they serve should look the same across the domains.
  3. Get your roles and responsibilities established first, then look to start tackling other challenges.
  4. Do your best to hide the data engineering complexity from your self-serve platform users. Help them do their job, not learn data engineering. Focus on creating a great experience for them in their workflows.
  5. Think about data risk from an opportunity standpoint: if you have strong risk controls, you can feel more comfortable giving more people wider access - you know what is allowed so you can potentially give them access to more data than if you didn't have tight controls.
  6. Doing federated anything right isn't about just giving people the tools/patterns/policies, you should focus on a usability layer. Should every team have to integrate and manage their data tools or should they be able to focus on doing what matters? What is their experience?
  7. Design your data platform to your personas - who's getting something done and how do they do their work? Also, always ensure your infrastructure and your governance are in sync. No one likes to manually update the data catalog or even worse when it drifts.
  8. Discoverable data doesn't have to mean everyone can access everything by default or that access is automatically granted. If people can find data and understand what the data is about plus then easily request access, that is enough. Also make granting access as painless and low risk as possible.
  9. Look at users' most common actions that cause major friction and focus on tackling those, whether they 'feel like part of a data platform' or not. For Capital One a few were reviewing/monitoring risk and supporting manual updates of production data.
  10. Cost controls are a crucial but often overlooked or ignored aspect of your governance - it's VERY easy to overspend in the cloud :)
  11. !Commercial Offering!: Capital One launched Capital One Software and its first product, Slingshot, after seeing how difficult it was to properly manage costs around data mesh, especially given the separation of storage and compute with Snowflake. Lines of business don't typically have cloud cost managers so they wanted to automate the controls and cost tracking/management/forecasting as much as possible.
  12. Don't make people in the business learn exactly how to tune to prevent or rectify cost inefficiencies. Again, back to experience, give them knobs and the right questions to assess their needs so they don't have to be an expert.
  13. Cloud bill surprises are so common, it's a meme. Build out tooling to help people forecast their cost and then track and alert against it. Slingshot helps Capital One teams forecast and control data costs. Don't wait until the end of the month to let someone know they went 50% over budget.
  14. ?Contrarian?: Your central team will likely be reluctant to give up some authority, especially in deciding how infrastructure is deployed. Look to retrain central teams to build self-service capabilities that create the guardrails to give them comfort but still federate infrastructure ownership to the lines of business.

Capital One is on its own data mesh journey and what they learned and built internally inspired them to build an offering called Slingshot. When asked about where he would tell others to start their own data mesh journey, Salim mentioned that we can't solve our data issues - whether data mesh or anything else - through just technology. At Capital One, they started on the organizational aspects, breaking into discrete lines of business (LoB) and then creating units of data responsibility with a hierarchy. However, the data hierarchy wasn't the same in each, they left it up to LoB to determine - a large domain having 3-4 and a small domain having 1.

According to Salim, a big reason why they have so many people focused on risk is similar to what Sarita Bakst mentioned in episode 52: you want to give people as much access to data as you can while minimizing risk. So set yourself up to give people access when it's valuable / necessary for their job. There are even instances at Capital One where you need to complete training to get access to data. And they have active risk monitoring constantly in place as well to make sure they don't miss anything. Again, doing that gives you more piece of mind to leverage more data.

When you start federating your data ownership, you can't only give people the tooling and the authority, that won't work well in Salim's view. So, in addition to all of that, you need to focus on a usability layer. Think about email - it's pretty easy to work with an email client but what if you had to put all the plumbing and add your headers and all that yourself. Just giving people tools, patterns, policies, etc. and expecting them to be able to handle it all isn't realistic, make it easy to leverage, think about their user experience.

As an example, Salim talked about the process for creating a data product. Instead of interfacing directly with 6+ tools, there is a workflow with automated processes to make it a smooth process. As many past guests have noted, reducing friction to sharing data is a key element of driving value from data mesh. It's not just called a data platform, it's called a self-serve data platform for a reason :)


Salim shared some advice other recent guests have touched on: really focus on the job to be done, who is doing it, and ensuring your infrastructure and governance stay in sync. Focus on the job to be done, whatever that job is, instead of the tools. Again, reduce friction. Focusing on the persona of who is doing the work is also crucial - Audun Fauchald Strand and Gøran Berntsen from NAV in episode 37 talked about building a great data platform no software engineer would want to use. Use product thinking, who is using it and how do they typically do their work? And lastly, ensuring your governance and infrastructure stay in sync - if there is an update to the data product, does that automatically update the data catalog? How do you prevent drift between systems or kludgy manual fixes?


Data discovery done right is about a few things according to Salim. Ensure people can find relevant data easily is baseline but also get them to as much understanding as possible. Even sensitive data, what are the quality metrics, the general data shape, etc.? And make it easy to immediately request access when you find data you want to use which immediately triggers a request to the data owner with a business justification. And the relevant policies for that data product are automatically part of the approval process so the data owner doesn't have to remember the policies themselves. Again, reduce friction to getting the job done for all parties.


Salim talked about a few things they overlooked at the start, one for personas and one that was causing a lot of friction. As mentioned earlier, at Capital One there are a number of risk managers but the early iterations of the platform didn't cater to their experience. Which meant access requests were delayed and risk monitoring was tougher to do. So make sure to consider all your personas that will be using your data mesh - ignore at your business value peril. The other aspect was how much manual effort was involved in patching production data so they addressed that as well.


To be able to federate the actual infrastructure management to the domains, Salim and team knew they couldn't just hand over the tools. Again, the personas in the LoBs wouldn't have the expertise to manage the infrastructure. So they focused on exactly what Salim mentioned throughout: the experience. How could they empower the lines of business to own their data infrastructure without the domains having to manage their data infrastructure? So the team built out a platform with capabilities with experience at the core but with additional aspects like DBA best practices, guardrails, and cost management as part of the platform.


Scott Note: from here forward, there will be some discussion of Capital One's Slingshot offering. This is not an endorsement of the product at all but it is interesting and germane to the conversation (read: Salim was not selling, only telling, which is a-okay on the podcast). Cloud cost is near and dear to my heart and it's important - whether you build or buy - to not overlook cost management.


So, all these challenges of how they addressed their cloud cost management via their platform led them to believe there was a market for this type of solution per Salim. Capital One has a history of creating cloud cost tooling as they were the creators of Cloud Custodian (https://cloudcustodian.io/). Scott Note: unpredictable and/or high costs have been a major concern/pushback to data mesh since early 2021.


One general issue with on-demand / cloud computing is cost inefficiencies. There is always a lot of waste and it's often actually more cost effective to ignore than chase it down unless you know where inefficiencies lie. So Salim and team found it useful to not try to automatically clean up cost inefficiencies - that pretty much never works - but to highlight them relatively quickly and offer potential recommendations and/or help. And they lowered their own Snowflake costs quite a bit in the process.


The bigger benefit according to Salim was the team put proactive questions in place for when teams were provisioning their data infrastructure. Often, it can only take a few minutes to save a large percentage of money if only people know the knobs to turn. But you don't want everyone to have to be an expert - extract the information from them based on their needs and create a recommendation system. This is just yet more on the experience side - don't build cloud cost experts in every LoB, make it so they can make the right decisions as often as possible quickly and easily. You should also look to build in cost forecasting tools as part of your experience so people aren't hit with a surprise bill - the surprise Cloud bill is so common, it's a meme on Twitter.


From what Salim is seeing, most companies - or more correctly, central data/platform teams - are pretty reluctant to federate ownership of the infrastructure to domains. He believes that is because of things the LoBs don't understand like cost controls and best practices but that if you allow the central team to set guardrails and best practices, they will be more willing to give up control. Remains to be seen.


Salim finished with "in our experience, data mesh works with central policy, central tooling, but federated ownership."


Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Links

Chapters

Video

More from YouTube