Artwork for podcast The Data Download
Inside Collibra: Busting myths around data science with Gretel De Paepe
3rd August 2022 • The Data Download • Collibra
00:00:00 00:30:01

Share Episode

Shownotes

Data analysis, data science, and machine learning. The boundaries between these three may not be apparent, but these fields are related and interconnected. So it’s possible to start a career in one and dabble with another. Data has made it easy to connect and acquire information. However, we must be vigilant in upholding privacy.

In this episode, Gretel De Paepe, senior data scientist at Collibra, shares what she’s learned in her data career. She tackles the importance of data in our lives and its incredible value — in the present and the future. Lastly, she tackles myths on artificial intelligence and machine learning.

Tune in to the episode to learn how to handle data correctly.

Here are three reasons why you should listen to this episode:

  1. Find out what inspired Gretel into pursuing data science.
  2. Learn how to appreciate data in making our lives better from both the average user’s and company’s perspective.
  3. Go beyond data bias and our misconceptions around artificial intelligence and machine learning.

Resources

Episode Highlights

[01:02] Machine Learning Projects at Collibra

  • Collibra offers many services to their customers.
  • Data classification helps companies classify fields that contain personally identifiable information (PII) data.
  • Asset recommenders give a list of recommendations based on one’s datasets.
  • Similarity detection looks for similar assets to prevent potential duplication and keeps the database clean.

[02:42] Defining Data Science, ML and AI

  • Data analysts looks at the data to provide a data-driven answer for a business question. 
  • Data science deals with statistical modelling.
  • The leap from data science to machine learning (ML) is small because machine learning is one way to model data.
  • ML is simply a tool in the data science toolkit. 

[04:51] Gretel’s Data Journey

  • Gretel’s progression from data analysis to data science was a natural process.
  • When solving different challenges, you must explore other techniques and build up your portfolio.
  • She invested time and money into learning about machine learning.

[10:19] Gretel’s Natural Interest in Data Science

  • Gretel treats data analysis like a hobby.
  • She easily loses herself in a project because she’s interested in data science.

Gretel: “Usually when I start with a project, there's not much information yet. It's sort of, “Oh, we may wanna do something in this area. But we don't really know yet what it is.” And so, the whole exploration phase of trying to identify what it is that we could do, what techniques we could use. And compare them, just try them out and compare them. It's a creative process.”

[14:05] How Data Gives Value to Consumers

  • We use data in statistics.
  • Data is used often in our daily lives and provides many benefits.

[19:24] The Myths and Unnecessary Hype around Data Science

  • Marketing for artificial intelligence should focus on the fact that it’s only artificial. 
  • A machine’s algorithm is limited by what it’s trained to do.

[23:26] Data Bias

Gretel: “If you have a bias in your data, you will have a bias in your model. So your model is indeed only as good as the data that you train it on.”

  • Big tech companies open source their models, architectures, and patent packages. However, their data isn’t.
  • Obtaining data that’s vast and also diverse is a challenge.
  • Security has to be built-in from the start to ensure the obtained data isn’t biased.

[26:23] Auto-ML

  • Auto-ML only works well when the hyper parameters are already known. 
  • Computer vision enables the detection of objects in images equal to or sometimes better than human accuracy.
  • Another breakthrough was also seen in NLP but language is more complicated than pictures since it’s constantly evolving.

[31:03] Data Science in the Next Five Years

  • People will see value in combining ML with privacy protection.

Gretel: “Machine learning is a little greedy beast. It needs a lot of food. It needs lots of data. It's very data hungry. A little hungry,  little thing. And how do you marry that? How do you combine that with also the increasing emphasis on privacy?”

  • There’s a new emerging field called privacy preserving machine learning.
  • Differential privacy ensures that data can’t be used to reverse engineer other datasets.

Jay: “What's often really interesting about data is finding common things, patterns, clusters of information. Those patterns help to answer questions, make decisions, make predictions, and even recommendations.”

About Gretel

Gretel De Paepe has been working as a Senior Data Scientist at Collibra for three years. She has amassed an experience of over 20 years in data. She started as a data analyst, turned into a data scientist then delved into machine learning for the past six years. Gretel considers herself a data addict who loves anything to do with data. 

If you want to reach out, you can contact Gretel via LinkedIn.

Enjoyed this Episode?

If you did, be sure to subscribe and share it with your friends! 

Post a review and share it! If you enjoyed tuning in, then leave us a review. You can also share this episode with your friends and colleagues. This episode will help them understand the ESG perspective.

Have any questions? You can connect with us on LinkedIn

Thank you for tuning in! For more updates, please visit our website. You may also tune in on Apple Podcasts or Spotify.

Mentioned in this episode:

Collibra recognized as a Leader in The Forrester Wave™

Collibra was just named a leader in The 2023 Forrester Wave: Data Governance Solutions. To learn more about the report, check out http://collibra.com/datadownload-forresterwave-dg

Forrester Wave, Q3 2023

Links

Chapters

Video

More from YouTube