Artwork for podcast Data Driven
*DeepDive* Data Science vs. Data Engineering
21st February 2018 • Data Driven • Data Driven
00:00:00 00:58:52

Share Episode

Shownotes

Frank and Andy talked about doing a Deep Dive show where they take a deep look into a particular data science technology, term, or methodology.  And now, they deliver!

In this very first Deep Dive, Frank and Andy discuss the differences between Data Science and Data Engineering, where they overlap, where they differ, and why so many C-level execs can’t seem to figure out the deltas.

Links

Sponsor: Audible.com – Get a free audio book when you sign up for a free trial!
Sponsor: Enterprise Data & Analytics

Notable Quotes

Frank’s new courses are up at WintellectNow ([01:30])
David Goggins ([03:00])
Dive! Dive! Dive! It’s a deep dive on Data Science vs. Data Engineering ([06:00])
“Clean data” means different things to different people. ([09:30])
“Shaping the data.” ([11:00])
Our conversation with Buck Woody ([12:30])
Andy’s screed on managing NULLs ([14:00])
Andy’s screed on managing dupes ([17:00])
Frank, on aggregation and schema changes… ([21:21])
Attempted NoSQL definition ([23:45])
On MySQL… ([25:00])
Maybe “No” stands for “Not only” ([26:45])
“What sorcery is this?!” ([28:30])
Kevin Hazzard’s article on Database Design if we started today ([29:15])
Andy’s opinion: We’re not using the SSD-ness of SSD’s ([31:30])
“I don’t know how much simpler you can get.” – Andy ([33:00])
Denny Cherry’s company: Denny Cherry and Associates ([34:45])
“… somewhere between useless and lying…” ([35:45])
Frank on HDFS ([38:00])
ClearDB wiped out 13 years of Frank’s blog data, and we’re still bothered by that. ([40:30])
sklearn ([42:50])
Correlation is not causation. ([45:30])
How to Lie with Statistics ([45:45])
Movie/TV Reference: Star Trek TNG ([46:15])
CNTK (Microsoft Cognitive Toolkit) ([48:00])
Frank, on selling ice cream… ([49:25])
On over-fitting ([55:30])
Training the model ([56:30])
Request for feedback! ([57:30])

Links

Chapters