Data Mesh Radio Patreon - get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
In this episode, Scott interviewed Stéphanie Bergamo and Simon Maurin of Leboncoin. Stéphanie is a Lead Data Engineer and Simon is a Lead Architect at Leboncoin. From here on, S&S will refer to Stéphanie and Simon.
Some key takeaways/thoughts from Stéphanie and Simon's point of view:
Around the time Zhamak's first post on data mesh came out in mid 2019, Leboncoin was experiencing many of the pain points Zhamak laid out quite clearly in her article. Their teams were already organized in the "Spotify model" so data ownership was already distributed to many of the domains. But, they were seeing increasing time-to-market - often hitting what Simon called "very long" - for new data initiatives. They already had an organizational model and some ways of working that fit well with data mesh so they decided to give it a try.
So, per S&S, they tried using the data mesh principles for a first use case - building out their recommendation engine. It was a greenfield initiative so it was a good one to test out how well data mesh could work for incremental data needs.
In order to proceed with the pilot, S&S and the rest of the data team had to negotiate with the CTO. Once the pilot was successful, they started embedding data engineers into the teams with the most obvious needs while starting to build out the self-service platform. They already had their CI/CD platform for the operational side so they adapted it to also work with data products. And then they added additional data processing requirements, the governance, etc. to make it as self-service as possible for data producing teams.
The good news, per S&S, was immediate traction with the self-serve platform with the back-end engineers. But they were still suffering from the distance between the data and the software engineering people/capabilities. It was difficult to get the software engineers to see data engineering as a type of software engineering, and many of the data engineers also had a hard time seeing data engineering as a subset of software engineering.
This is a common complaint from many organizations - just because you embed data engineers into domains, that doesn't mean everything becomes easy, you still need to get the software engineers/developers to understand and care about data and data engineering practices; and the data engineers need to learn more about software engineering to best collaborate with the software engineers.
Data pipelines were a major blind spot for a number of the software engineers according to S&S. If the software engineers were doing pipelines, most were not doing them that well with a number of not-so-great practices to put it nicely. So there was a focus on communicating why data pipelines are so crucial to the overall company and how software engineers can learn to do them better. Data mesh can help to facilitate sharing that vision and giving the software engineers ownership over data got them excited in many cases.
S&S are reevaluating if their current internal guild setup is really working with a data mesh approach. It is currently organized only by specialty and that means there isn't a lot of cross-pollination of information - people outside the specific guild don't have easy access to learning new best practices shared with members of that guild. Tim Tischler mentioned the idea of broad group show-and-tells / info sessions around data products that may help with these challenges if done around data practices.
This lack of broader informational best practice sharing is biting S&S and Leboncoin in the behind, especially around testing. While software engineers know how to write really good software tests, most data engineers aren't as good at writing tests and software engineers aren't good in general at writing data-specific tests. But testing is really crucial to be confident in future changes - if you don't know what will happen with a change, that's a bad spot to be.
On driving buy-in, S&S shared that trying to force people along the path to sharing their data just didn't work well. What they found that worked was finding the curious developers and helping them accomplish what they wanted with data. And finding actual projects that can add value, ones with specific use cases - often ones that are directly useful to that domain itself first.
At Leboncoin, many of their data products start off serving the producing domain and then the domain lets others know they've created potentially useful data. This is similar to what Leboncoin does on the microservices side as well with teams often consuming their own events from the enterprise service bus. So the first step for a data product is to build to explicit business needs and then see if additional business value comes from the data product.
Per S&S, another thing that's been helpful is their roadmap process - teams should tell other teams what they will need from them early. If you have need for data, you need to communicate it early so other teams can prioritize it. There isn't an expectation of immediately producing data, which is a healthy way to collaborate.
Leboncoin has an interesting approach to sharing information. On the operational plane, as mentioned earlier, they have a enterprise service bus and teams are supposed to share information that might not be explicitly useful to them - they are asked to consider what might be useful for other teams and to share that at the start of the development process - so there isn't a request to add it in later, it was added from the start! They are doing the same approach on the data side with data mesh. It might not be in data products with strong SLAs but other domains can at least understand what data could be formed into data products.
S&S recommend that when you start building out your federated governance, really start with following the pain. Put data engineers and back-end engineers in the same room to find out what's actually necessary to do and what should be built into the platform. If you can make the tooling enforce governance requirements/needs, that's easier for pretty much all parties.
S&S finished the conversation with a few quick quotes: "bet on curious people", "just have people talk to each other", and "lower the cognitive costs of using the tooling" - if you can do that, you'll raise your chance of success with your data mesh implementation.
Twitter: @steph_baltus / https://twitter.com/steph_baltus
Twitter: @MaurinSimon / https://twitter.com/MaurinSimon
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under "add payment"): AstraDB