Artwork for podcast Data Mesh Radio
#102 Share Data by Default and Other Stories/Advice from Leboncoin's Data Mesh Journey So Far - Interview w/ Stéphanie Bergamo and Simon Maurin
Episode 10217th July 2022 • Data Mesh Radio • Data as a Product Podcast Network
00:00:00 01:11:23

Share Episode

Shownotes

Data Mesh Radio Patreon - get access to interviews well before they are released

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by DataStax AstraDB; George Trujillo's contact info: email (george.trujillo@datastax.com) and LinkedIn

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here

In this episode, Scott interviewed Stéphanie Bergamo and Simon Maurin of Leboncoin. Stéphanie is a Lead Data Engineer and Simon is a Lead Architect at Leboncoin. From here on, S&S will refer to Stéphanie and Simon.

Some key takeaways/thoughts from Stéphanie and Simon's point of view:

  1. "Bet on curious people", "just have people talk to each other", and "lower the cognitive costs of using the tooling" - if you can do that, you'll raise your chance of success with your data mesh implementation.
  2. Leboncoin requires teams to share information on the enterprise service bus that might not be directly useful to the originating domain on the operational plane. They are using a similar approach with data for data mesh - sharing information that might not be useful directly to the originating domain by default.
  3. Leboncoin presses teams to get data requests to other teams early so they can prioritize it. There isn't an expectation of producing new data very quickly after a new request, which is probably a healthy approach to data work/collaboration.
  4. Embedding a data engineer into a domain doesn't make everything easy, it's not magic. Software engineers will still need a lot of training and help to really understand data engineering practices. Tooling and frameworks can only go so far. Be prepared for friction.
  5. Similarly, getting data engineers to realize that data engineering is just software engineering but for data - and to actually treat it as such - might be even harder.
  6. Software engineers generally don't know how to write good tests relative to data. Neither do data engineers. But testing is possibly more important in data than in software. We all need to get better at data testing.
  7. Start with building the self-service platform to solve the challenges of the data producers first. You may make it very easy to discover and consume data but if the producers aren't producing any data...
  8. If your software engineers are doing data pipelines at all before starting to work with them in a data mesh implementation, you can probably expect they aren't using best practices.
  9. It's pretty common for good/best practices to be known by only a few people inside an organization, such as with a specialty-focused guild. Look for ways to cross-pollinate information so more people are at least aware of best practices if not able to fully implement them yet.
  10. Trying to force people to share data in a data mesh fashion didn't work for Leboncoin and probably won't in most organizations. Find curious developers and help them accomplish something with data, that will drive buy-in.
  11. As part of #10, data products often start as something serving the producing domain and then evolve to serve additional use cases. They start by serving a specific business need and evolve from there.
  12. Look to build your tooling to enforce your data governance requirements/needs. Trying to put too much on the plate of software engineers probably won't go well.

Around the time Zhamak's first post on data mesh came out in mid 2019, Leboncoin was experiencing many of the pain points Zhamak laid out quite clearly in her article. Their teams were already organized in the "Spotify model" so data ownership was already distributed to many of the domains. But, they were seeing increasing time-to-market - often hitting what Simon called "very long" - for new data initiatives. They already had an organizational model and some ways of working that fit well with data mesh so they decided to give it a try.


So, per S&S, they tried using the data mesh principles for a first use case - building out their recommendation engine. It was a greenfield initiative so it was a good one to test out how well data mesh could work for incremental data needs.


In order to proceed with the pilot, S&S and the rest of the data team had to negotiate with the CTO. Once the pilot was successful, they started embedding data engineers into the teams with the most obvious needs while starting to build out the self-service platform. They already had their CI/CD platform for the operational side so they adapted it to also work with data products. And then they added additional data processing requirements, the governance, etc. to make it as self-service as possible for data producing teams.


The good news, per S&S, was immediate traction with the self-serve platform with the back-end engineers. But they were still suffering from the distance between the data and the software engineering people/capabilities. It was difficult to get the software engineers to see data engineering as a type of software engineering, and many of the data engineers also had a hard time seeing data engineering as a subset of software engineering.


This is a common complaint from many organizations - just because you embed data engineers into domains, that doesn't mean everything becomes easy, you still need to get the software engineers/developers to understand and care about data and data engineering practices; and the data engineers need to learn more about software engineering to best collaborate with the software engineers.


Data pipelines were a major blind spot for a number of the software engineers according to S&S. If the software engineers were doing pipelines, most were not doing them that well with a number of not-so-great practices to put it nicely. So there was a focus on communicating why data pipelines are so crucial to the overall company and how software engineers can learn to do them better. Data mesh can help to facilitate sharing that vision and giving the software engineers ownership over data got them excited in many cases.


S&S are reevaluating if their current internal guild setup is really working with a data mesh approach. It is currently organized only by specialty and that means there isn't a lot of cross-pollination of information - people outside the specific guild don't have easy access to learning new best practices shared with members of that guild. Tim Tischler mentioned the idea of broad group show-and-tells / info sessions around data products that may help with these challenges if done around data practices.


This lack of broader informational best practice sharing is biting S&S and Leboncoin in the behind, especially around testing. While software engineers know how to write really good software tests, most data engineers aren't as good at writing tests and software engineers aren't good in general at writing data-specific tests. But testing is really crucial to be confident in future changes - if you don't know what will happen with a change, that's a bad spot to be.


On driving buy-in, S&S shared that trying to force people along the path to sharing their data just didn't work well. What they found that worked was finding the curious developers and helping them accomplish what they wanted with data. And finding actual projects that can add value, ones with specific use cases - often ones that are directly useful to that domain itself first.


At Leboncoin, many of their data products start off serving the producing domain and then the domain lets others know they've created potentially useful data. This is similar to what Leboncoin does on the microservices side as well with teams often consuming their own events from the enterprise service bus. So the first step for a data product is to build to explicit business needs and then see if additional business value comes from the data product.


Per S&S, another thing that's been helpful is their roadmap process - teams should tell other teams what they will need from them early. If you have need for data, you need to communicate it early so other teams can prioritize it. There isn't an expectation of immediately producing data, which is a healthy way to collaborate.


Leboncoin has an interesting approach to sharing information. On the operational plane, as mentioned earlier, they have a enterprise service bus and teams are supposed to share information that might not be explicitly useful to them - they are asked to consider what might be useful for other teams and to share that at the start of the development process - so there isn't a request to add it in later, it was added from the start! They are doing the same approach on the data side with data mesh. It might not be in data products with strong SLAs but other domains can at least understand what data could be formed into data products.


S&S recommend that when you start building out your federated governance, really start with following the pain. Put data engineers and back-end engineers in the same room to find out what's actually necessary to do and what should be built into the platform. If you can make the tooling enforce governance requirements/needs, that's easier for pretty much all parties.


S&S finished the conversation with a few quick quotes: "bet on curious people", "just have people talk to each other", and "lower the cognitive costs of using the tooling" - if you can do that, you'll raise your chance of success with your data mesh implementation.


Stéphanie Bergamo

LinkedIn: https://www.linkedin.com/in/st%C3%A9phanie-baltus/

Twitter: @steph_baltus / https://twitter.com/steph_baltus


Simon Maurin

LinkedIn: https://www.linkedin.com/in/simon-maurin-369471b8/

Twitter: @MaurinSimon / https://twitter.com/MaurinSimon


Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, and/or nevesf

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under "add payment"): AstraDB

Links