Greg started his career in data science after not getting a proper job with his Ph.D. degree in physics. He joined a Data Science bootcamp and then got a job as a Data Scientist. Watch this interesting interview as he describes his experience.
YouTube: https://youtu.be/vbSbKGwCUcA
Greg Damico is a Lecturer in Data Science at the Flatiron School and has been working there since March 2019. He hails originally from Columbus, OH, but has been in Seattle since 2013. He has an extensive background in academia and has taught and studied various subjects in addition to data science at a number of schools both in the Midwest and on the West Coast. He turned to data science for good in 2016. He is passionate about data science because of its ever-growing role in our daily lives, and he is passionate about education because of the way it empowers people to change their lives.
You’re listening to Mentoring Developers, Episode 91, Let’s go.
Welcome to Mentoring Developers, the podcast for new and aspiring software developers, where we discuss your struggles, anxieties, and career choices. And now, here’s your host, Arsalan Ahmed.
In this episode of Mentoring Developers, I’ll be talking to Greg D’Amico. Greg is a lecturer in data science at the Flatiron School, and he’s been there for a long time. This interview was recorded in 2021. He’s originally from the Midwest in Ohio. He’s been settled out West in Seattle. So he has experience in both the Midwest and the West Coast.He’s taught everywhere.His passion is data science. And today we’ll be talking to him about data science, about.Flat Iron.School and other boot camps, and what it’s like to be in a boot camp as a mentor, as.A teacher. Also as a student. I think that this is a great little interview for people who are interested in boot camps, who are interested in data science, and who’s interested to see. what it’s like to be in a boot camp and learning data science.Especially from the point of view of a person that’s on the other side of the aisle.I Think you’re going to enjoy this. Let’s get ready and get started.
How are you doing, Greg?
I’m doing well, Arslan. Thanks for having me on the show.
How did you decide to become a data scientist?
It turned out to be a very significant letter for me. I was in philosophy graduate school. I did finish my PhD, but I wasn’t finding the work that I wanted after that was done. It’s very difficult in academia to get the high power job that a lot of people are looking for. I moved to Seattle and after I got to Seattle, I decided that I wanted to go back to school. I did have an old physics degree rusting on my shelf somewhere. I thought, Well, I’ve got some math skills. I think I could go do this applied mathematics program at the University of Washington. Shortly after I started that, an old friend of mine from philosophy grad school, Mike, said, Hey, have you studied any R? I started looking into it and I thought, Wow, this R language looks really cool. I don’t have a lot of programming background. I’ve done a little bit here and there. But I started looking into R. I took a course on R. I realized immediately its potential for data analysis and data science. That was my entry. I started reading a bunch of things about data science. I jumped into a boot camp for data science.Before long, I had finished a boot camp and I was able to land a job teaching at flight iron school in data science. That was my entry into the tech world. It’s never too late to start. It’s never too late to jump into tech. I think sometimes people are scared to make that transition. Obviously, I meet a lot of people teaching a boot camp in data science who are scared. They’re making these large career changes. They’ve been doing one thing and they’re thinking, Well, data science seems like something I could do. Maybe my background is quite different and I’m scared to do it, but I feel like I’ve got a chance. They really do. There’s a lot you can learn in a short time. There are lots of resources about R, lots of resources about Python. Python is actually the language of choice at Flat Iron, so I work mostly in Python these days. But it’s one step at a time. There’s always a lot to learn, but you don’t have to learn a lot to be able to make some cool things, to be able to start on some cool projects. Once you can do that, well, then you can get in touch with other people who are working on things.You can share ideas, you can share your own work, and then you’re right there in the community.
A question that a lot of our listeners would have right now is, is it really that useful, data science? Why should I learn it?How did you decide to become a data scientist?
I think data science has arisen largely because of lots of technological improvements to do with data. We are now incredibly good at storing data, at producing data, at having accurate data recordings. You can look online and find stats about exactly how much data is produced every day. I think it’s on the order of quintillions of bytes or something like that, just huge amounts of data. Lots of companies have their own data these days to worry about. Amazon, for example, part of the reason that Amazon is successful is they have lots and lots and lots of data about their customers, about the things that they’ve bought, about other things that they’ve bought, about Here’s a bunch of things that people just like you have bought. Maybe you’re interested in that, too. If you have access to a bunch of data about your customers and you can access it quickly, then that proves its business value pretty quickly. I think of data science as having an immediate business angel, but there’s also this technological aspect to it. It’s just as all of these technologies have gotten better, it’s natural that we have a lot of data. We have to have some understanding of how we can store that data, how we can access it, how we can share it, all that stuff.Data science itself as a discipline, certainly, it helps to have some understanding of things like databases and where data lives and how to access it and so on. But it’s also a bit of coding, it’s a bit of mathematics, it’s a bit of statistics. When we do our boot camps, we try to cover all those bases, at least a little bit.
What does it take to be a data scientist? Can anyone be a data scientist? Do you need some certain skills, certain aptitudes? What do you think?
I think data science has arisen largely because of lots of technological improvements to do with data. We are now incredibly good at storing data, at producing data, at having accurate data recordings. You can look online and find stats about exactly how much data is produced every day. I think it’s on the order of quintillions of bytes or something like that, just huge amounts of data. Lots of companies have their own data these days to worry about. Amazon, for example, part of the reason that Amazon is successful is they have lots and lots and lots of data about their customers, about the things that they’ve bought, about other things that they’ve bought, about Here’s a bunch of things that people just like you have bought. Maybe you’re interested in that, too. If you have access to a bunch of data about your customers and you can access it quickly, then that proves its business value pretty quickly. I think of data science as having an immediate business angel, but there’s also this technological aspect to it. It’s just as all of these technologies have gotten better, it’s natural that we have a lot of data. We have to have some understanding of how we can store that data, how we can access it, how we can share it, all that stuff.Data science itself as a discipline, certainly, it helps to have some understanding of things like databases and where data lives and how to access it and so on. But it’s also a bit of coding, it’s a bit of mathematics, it’s a bit of statistics. When we do our boot camps, we try to cover all those bases, at least a little bit.
What I want to know about Data science is that data science is a separate discipline in its own right. And you’re talking about using data to get some results, get some meaningful results out of the data. So if, for instance, you’re a university and you have data about your students, the students themselves, the courses that they register for, how frequently they do it, and which semester, which course is more popular. If you wanted to say, I want to predict next fall, what are the courses that might actually exceed capacity, where I may need to have an extra teacher to teach this course because there’s so much demand, but I need to know ahead of time so I could schedule it. That’s a thing. So your data goes into some data warehouse where it’s stored in some structure, maybe tables that are probably not normalized would mean I would expect them to be lots and lots of columns and everything in there so you don’t have to do lots of joins because that’s faster to read. And then you would be able to actually ask your system some intelligent questions and it answers it. That’s one aspect that I can think of. The other one is, Hey, Mr.Data Scientist, I have these 100 terabytes of data. Go give me some insights. I don’t know what I’m looking for.Which way is it falling here, this data science?
I think it’s both things I think very often you know I think about data science very often as trying to solve problems with data you know and those problems can take lots of forms often they’re you know sometimes they’re just straightforwardly Financial things like how can our business make more money you know um and sometimes they’re really more investigative things like you know I’ve got a bunch of customers and I want to do some sort of customer segmentation because I want to you know have a sort of targeted marketing campaign I want you know some ads to go to some people who are likely to pay attention to those ads from other ads to go to other people who are likelier to pay attention to that sort of advertising you know so there are lots of different problems but very often what’s happening is the data scientists will build some sort of model some sort of predictor and you’re right very often the data that you start with has some sort of tabular form right maybe I’ve got a bunch of columns of data and a bunch of rows rows will represent one record one observation you know One customer maybe or one house up for sale Maybe and you know each of the columns will be some feature about each of those records you know so if I’ve got a bunch of houses for sale maybe I’ve got a column that represents number of bedrooms or a feature of my rows is you know students at a college then maybe one of my columns is you know grades for a particular quarter or something like that so the general idea is I’ve got all phase columns and one of my columns is sort of privileged one of my columns is the thing that I’m trying to predict the thing that I’m trying to model right and I’m going to use all my other columns all the information that I have there to try to make accurate predictions about what’s in the the column of Interest if I’ve got thousands of rows or millions of rows then it’s very difficult for a human being to sort of you know pick up on the patterns that might be there but a computer is really fast right if I just show a computer well here are the values that I get for these rows in these particular columns and here are the values that I get for the column of Interest and then I say okay now here are some rows that you haven’t seen before and they have these values in these columns what do you think it’s likely to have in the column of interest you know and the computer sort of builds a model and then is able to use that model to make predictions on the Unseen data that that is new and then you can sort of evaluate that model is it good is it accurate and so on that sort of thing is is very often what’s at the heart of a lot of data science problems okay
So if I have to build that model so some piece of software has to do that so are you as a data scientist are you writing custom code just start from scratch and just start building a model or do you use a tool and maybe script.
It a little bit there are lots of tools I think you know I think really python is as is probably the number one language right now for data science and much of the reason for that has to do with the fact that there are lots of well first of all it’s open source so it’s open source project lots of people contributing and there are lots of tools lots of libraries that you can just import into your own workspace that already do lots of really cool data science things right so if I want to build a you know a random forest model well there are random Forest tools that I can just import right into my own workspace introduce them to my own particular data and they’ll build predictions for me you know straight out of the box and I don’t have to sort of create the model from scratch and so because of all these different libraries that are available on python really powerful flexible tool and you can get models up and running with really just a few lines of code because of all the work that’s already been done.
That’s great yeah python is is a great language for beginners but also for for people who are doing data science but I wonder why python because I don’t know what python is doing that is so different than say Ruby or Java there must be something maybe there’s there’s some built-in libraries that do certain things I’m assuming some some math functions that others don’t.
Yeah I think that’s right um so for example there are certain libraries of python that we introduced to our students in the first week because they will use them all the time um so for example there’s a package called numpy numerical python right and it’s basically a tool for scientific computer it’s a tool for doing you know sophisticated mathematics but it’s really fast right you can do sort of these vectorized operations you can add arrays together lightning quick you can multiply them you can do if you want to you can do scientific notation if you want to you can do complex numbers you can do trigonometry you can do all sorts of stuff another tool is pandas and pandas is the name I think comes from something like panel data and it used to be this sort of object type kind of called a panel it’s not really used much anymore but anyway really powerful tool for manipulating tables of data the sort of technical term in pandas is a data Frame data frame is just a basically just a big table of data and lots of really powerful tools for manipulating them quickly adding columns is you know just a line multiplying a column by a number is just a line of code adding columns together whatever you want to do filtering your data you know I’m only interested in you know rows that have this value and so on just really fast through adorable.
Really good yeah no I can imagine there is a reason why everybody is gravitating towards python either you can do really good websites MVC websites in Python you could start off as your first programming language and it’s easy enough but it has these Advanced features like this numerical Library you’re referring to yeah that’s good and the good news is it’s all free if you want to get started it’s free to get started and the libraries are probably also free this is this is the beauty of this open source ecosystem okay that’s all good but what kind of jobs can I do because my because if I’m thinking if someone is listening right now and they’re thinking okay that sounds interesting but complicated and I don’t want to commit to something where I may not see the return so what’s the return here what kind of jobs are available in what kind of experience they need to have.
Yeah it’s a good question I so um you know because data science is still relatively young I think there are lots of job titles that might be relevant um so most obviously things like data analysts data scientists but maybe also business analysts maybe also data engineer maybe also you know quantitative researcher or something like that statistician applied statistician machine learning engineer right so there are lots of different titles that are available I think any boot camp that’s where the salt should at least prepare students for a data analyst role and if you’re tackling a data analyst role probably you’re looking at a healthy amount of data visualization a healthy amount of interacting with databases using SQL maybe some maybe some non-sql databases as well too if you have some unstructured data uh but you know all that stuff I think is really Within Reach it’s a matter of learning the fundamentals of some of these tools which you know I think you can do in 15 weeks you know If you’re sort of dedicated to the study of these things you know learning the fundamentals can go a long way and you can do it pretty fast
Really good yeah no I can imagine there is a reason why everybody is gravitating towards python either you can do really good websites MVC websites in Python you could start off as your first programming language and it’s easy enough but it has these Advanced features like this numerical Library you’re referring to yeah that’s good and the good news is it’s all free if you want to get started it’s free to get...