Help us become the #1 Data Podcast by leaving a rating & review! We are 67 reviews away!
I'm a senior data analyst with 10+ years of experience and I'm breaking down exactly what I did, what tools I used, and what problems I solved across very different industries.
💌 Join 30k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://datacareerjumpstart.com/newsletter
🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://datacareerjumpstart.com/training
👩💻 Want to land a data job in less than 90 days? 👉 https://datacareerjumpstart.com/daa
👔 Ace The Interview with Confidence 👉 https://datacareerjumpstart.com/interviewsimulator
⌚ TIMESTAMPS
00:00 – What nobody tells you about data analyst work
01:00 – Predicting refinery outcomes with math models
04:05 – When data analytics meets machine learning
07:00 – Finding needles in millions of log files
09:23 – How one analysis ended up driving marketing & sales
🔗 CONNECT WITH AVERY
🎵 TikTok
💻 Website
Mentioned in this episode:
May Cohort of the Data Analytics Accelerator — Now Open
🔗 datacareerjumpstart.com/daa The May cohort of the Data Analytics Accelerator is officially open for enrollment. This is my comprehensive data analytics bootcamp that takes you from wherever you are to landing your first data job. Doesn't matter your background, your degree, or your experience level — we're going to help you get there. What you get: 📊 Full curriculum covering Excel, SQL, Tableau, Python, and R 🛠️ 9 real-world projects across different industries to build your portfolio 💼 LinkedIn, resume, and interview prep so you actually stand out to recruiters 🤝 Weekly office hours, coaching, and a community of 900+ aspiring analysts who are in it with you 🎓 Lifetime access — go at your pace, come back anytime May enrollment deal: 🔥 20% off when you enroll now 🎁 6 free months of my unreleased Data Portfolio Builder tool — this isn't publicly available yet, and every May cohort member gets early access The live kickoff call is with yours truly on Monday, May 11th at 7:00 PM Eastern. Make sure you're enrolled before then so you don't miss it. 👉 datacareerjumpstart.com/daa Or just click the link in the show notes down below. See you on May 11th.
Avery Smith-1: I'm a senior data analyst
with 10 plus years of experience.
2
:What did I do in those 10 years?
3
:What tools did I use?
4
:What problems did I solve?
5
:That is the topic of today's episode,
and I'm gonna tell you everything
6
:so that way you know what to expect
as a data analyst in the future.
7
:I've had a really vast career where I've
worked for one of the biggest oil and
8
:gas companies in the world, and I've
also worked for a 10 person biotech
9
:startup that you've never heard of.
10
:Before, so let's get into it.
11
:By the way, if you're new here, my
name is Avery Smith and I try to
12
:share useful data content that will
help you start your data career.
13
:If that's of interest to you, you
gotta check out my newsletter.
14
:30,000 other aspiring data
analysts are already subscribed.
15
:Go to data career jumpstart.com/newsletter
16
:or find the link in the
show notes down below.
17
:So the first company I wanna
talk about is ExxonMobil.
18
:And what was it like being a data analyst
and a data scientist at ExxonMobil?
19
:Obviously this is one of the
biggest companies in the world.
20
:There's like 70,000 employees and
they do a lot of different things.
21
:Now, I worked in the downstream.
22
:Part of the business, which
basically means the refiners.
23
:These are the people that are taking oil
and turning it into gasoline essentially.
24
:And what do we do there as data analysts?
25
:Well, we tried to make a mathematical
model of every single part of the
26
:refinery, and I don't think this is,
you know, groundbreaking to those who
27
:are in the oil and gas business or
any sort of manufacturing business.
28
:If you can create what's called
like a digital twin or like a math
29
:twin of your process, you'll be able
to experiment with the math model
30
:instead of experimenting in real life.
31
:So you can be like, well, if I twisted
this temperature, or I changed this
32
:pressure, or we, you know, added
this new oil, what would change?
33
:Would we make more money?
34
:Would we make less money?
35
:What would go well?
36
:What would go poorly instead of actually
experimenting In real life, you can
37
:experiment with these simulations with
your data model, and that way you don't
38
:actually have to do it in real life.
39
:Now to create these models, there's lots
of different ways that you can do them.
40
:I'm not getting into the
nitty gritty of like.
41
:Modeling these types of things.
42
:But when you think model, the simplest
version that you can think of in
43
:your head is linear aggression.
44
:And if you're not familiar
with linear aggression, you
45
:learned it definitely in school.
46
:It's the simple thing
of Y equals MX plus B.
47
:That's the simplest form.
48
:So basically you have an input.
49
:An X.
50
:If based upon your input, can you
predict what the output is going to be?
51
:If it, you know is a linear relationship,
you'll be able to have the slope that's
52
:the m and some sort of a y intercept,
and basically guess what the output
53
:the Y is going to be based on the X.
54
:Now you can do that a
lot more complicated.
55
:You could do multivariate, linear
regression, which is like y equals.
56
:M1 X one plus M two X two plus X 3M three.
57
:Oh, it's so confusing.
58
:But my whole point here is like we
were doing these mathematical models,
59
:and the simplest form that you
can think of is linear aggression.
60
:So I created a lot of these
models as a data analyst.
61
:And I also used data analytics to try to
understand our simulation results better.
62
:So we'd actually run dozens,
hundreds, thousands of simulations
63
:trying, you know, different things.
64
:Well, what if this pressure went up by a
little bit, or this temperature went down?
65
:To actually look at a thousand
different results is really hard to do.
66
:So we used data analytics
to try to understand the
67
:results a little bit better.
68
:And a lot of this was done in a
Power BI dashboard, so I used a lot
69
:of Power BI dashboards right there.
70
:And to do the modeling.
71
:We actually did a lot in Excel, believe
it or not, and we did a lot in Python
72
:and we even used a more proprietary
software that you don't hear a whole lot.
73
:It's from sas.
74
:It's called Jump, JNP, to do our modeling.
75
:So those are the tools that we're using
at Axon, and that's the problem that
76
:we're trying to solve is basically,
hey, if we wanna make changes inside of
77
:our huge manufacturing system, can we
actually come up with a way to test it
78
:before testing it in real life so we can
kind of know and expect what to happen?
79
:I think that's common for,
you know, manufacturing.
80
:I think that's common for any sort of
like time series data you might have
81
:is if you can create a model, it's
useful for the company to be able
82
:to predict the future and be able to
figure out what's going to happen.
83
:A lot of the times this type of
analytics is called prescriptive
84
:analytics, where you're actually like
trying to not predict what's going
85
:to happen in the future, but trying
to decide if you make these changes.
86
:How will the system basically be affected?
87
:The next data job I wanna talk about was
when I was a data analyst at this nano
88
:biotech startup, like think 10 people.
89
:When I joined the company, this
company made really cool nano sensors.
90
:So think of it as almost like a game
boy, uh, game, like from the olden days,
91
:that's like the size of this little board.
92
:And on this board there was a bunch
of different sensors this, you
93
:know, chemistry company had built.
94
:And the sensors would basically react to
what was in the air and we would track.
95
:How their electricity basically,
or their, their amperage or their
96
:current, through these different
sensors would change when these
97
:different chemicals in the air hit it.
98
:So, for example, if you were holding
it in the air, you know, all the
99
:lines would be kind of stagnant.
100
:But for example, let's say you
brought an orange next to it, it
101
:would basically smell the orange.
102
:And each sensor would react differently
to that orange being nearby.
103
:And when you have, uh, an array of
these 12 different sensors, you can
104
:basically create the equivalent of
like a fingerprint, but for smells.
105
:So think of it as like the smelling device
that would basically take smell prints.
106
:My job as a data analyst there was to
actually look at the time series data.
107
:'cause we'd run these experiments where
you'd have like basically background
108
:noise for a certain amount of time
and then you'd introduce something
109
:like an orange for maybe 30 seconds
and then take the orange away.
110
:And we'd look at these time series and
we're trying to use these time series data
111
:to actually create these smell prints.
112
:And that's a very difficult thing to do.
113
:It actually most of the
time took machine learning.
114
:So once again, this is maybe a
more advanced data analyst role.
115
:'cause most data analyst roles.
116
:You're not really using machine learning.
117
:This type of machine learning is often
called classification, where you're
118
:basically trying to match data to a
certain category based off of its data.
119
:So for example, I could bring
an apple near it, right?
120
:And the sensors would react.
121
:Maybe they'd go all down, and if
I brought an orange next to it,
122
:maybe all the sensors would go up.
123
:And so you can come up with some sort of
an algorithm that would be like, okay,
124
:if the sensors go up, it's an apple.
125
:If they go down, it's an orange.
126
:Now that's really oversimplifying
it because apples and oranges,
127
:those are only two things that
exist in the universe, right?
128
:There's like so many
different things that exist.
129
:We were playing a little
bit bigger stakes.
130
:You can think of it when
you go to uh, TSA line and.
131
:And sometimes they, you know, swab you
and they're trying to see if you have
132
:like any drugs or any bombs on you.
133
:That was kind of the stakes that we were
playing with in some of our use cases.
134
:So I would take this data that oftentimes,
you know, was time series based.
135
:We usually had like 12 to 16 to
24 different sensors on there.
136
:And I would try to make these
smell prints using classification
137
:models in machine learning.
138
:Now, a lot of the time I was
doing this in Python python's.
139
:Great for doing things
in machine learning.
140
:There was even some simple
algorithms that I created that were.
141
:Based in Excel, but
they are pretty simple.
142
:The more complicated stuff.
143
:I was doing Python at the time.
144
:Also, just because we were doing
a lot of these experiments, SQL
145
:would've been really helpful.
146
:We weren't actually using SQL
as much as we should have.
147
:We really should have been using sql.
148
:Uh, looking back on it a little bit more.
149
:The third experience I wanna tell you
about was when I was doing my own,
150
:uh, data science consultancy firm,
and I got hired by a cybersecurity
151
:company to help them with a few things.
152
:So obviously we live in this digital age.
153
:Cybersecurity is really
important, so there's a lot of
154
:opportunity in cybersecurity.
155
:And the interesting thing
about cybersecurity is a
156
:lot of the data is like.
157
:Hidden in logs, because basically anything
you do online, anything you do on the
158
:internet gets logged one way or another.
159
:Like it's, it's in there.
160
:They're capturing everything, but when
you capture everything, you're kind
161
:of capturing nothing at the same time
because it's really hard to figure out
162
:what's the signal amongst so much noise.
163
:And so this company in particular
was basically getting a bunch of
164
:internet logs for companies in what
you can consider their workspaces.
165
:So for instance, all of their Microsoft
logs, all of their Google logs, if
166
:they're using Slack, their Slack logs,
maybe their employee customer history.
167
:Just think of like anything
a company might be interested
168
:in from a cybersecurity stand.
169
:We were just getting a bunch of the logs.
170
:Now in these logs, there's maybe
little needles in the haystack.
171
:There's maybe little gems
that can be pulled out.
172
:It requires a lot of analysis to
try to figure out what's in there.
173
:Just imagine you're getting
like a ton of hay and you have
174
:to find this little needle.
175
:And so my job was to go in there and try
to see if there was any needles, anything
176
:that was like really worth diving into
and investigating more, and also just
177
:summarizing everything that was happening.
178
:This is how many logins
you had on Google today.
179
:This is how many, you know,
logouts you had on Microsoft.
180
:You know, this is how many users
you had from these different states.
181
:Just like from these giant enterprise
organizations where they have thousands of
182
:employees and a bunch of things going on.
183
:Like how do you know
everything's going okay?
184
:Are you sure that like everyone
is where they say they are?
185
:Are you sure you don't have any intruders,
you know, people accessing stuff from
186
:a place that you probably shouldn't?
187
:Those types of things.
188
:So we were basically taking.
189
:These huge dumps of logs that weren't
really important, that weren't really
190
:interesting, and aggregating them and
trying to find the interesting things.
191
:And then also making sure
that nothing nefarious was
192
:going on to do that analysis.
193
:I was actually using all of Python, but I
could really choose what tool I wanted to.
194
:I just chose Python personally because
I'm very comfortable in Python.
195
:I'm, I'm decently good at Python, uh,
and I can do things quickly with Python.
196
:I probably couldn't have done
this as easily, like in Excel.
197
:You probably could have done similar
stuff in SQL if you wanted to.
198
:One thing I really like about Python
is it can do anything, maybe not
199
:extremely well, but it can do anything.
200
:Um, so like I was doing all my analysis.
201
:Uh, in Python and I was creating
data visualizations in Python.
202
:They even used a lot of the insights
I found, like in terms of aggregates.
203
:They basically like aggregated all of
their customers data and would publish
204
:like a, a yearly or, or biannual
report of like cybersecurity incidents.
205
:And so they were kind of like with graphs
that I was creating with some of these
206
:KPIs or metrics that I was monitoring.
207
:That way they could kind of inform
the cybersecurity, you know, fields
208
:all of their customers about like
what the trends and what we were
209
:seeing on a big picture standpoint.
210
:And that was actually really useful 'cause
people would start to like read that and
211
:be like, oh, I really like this company.
212
:I wanna work with them.
213
:And that would bring in new customers.
214
:So even though like I was doing
that analysis for individual
215
:customers at an individual level.
216
:That analysis actually ended up being
really useful for their marketing
217
:team as well to get more sales and
more customers in the pipeline.
218
:Now, I've actually worked for way
more than just these three companies.
219
:I've probably done work for
about 12, including like the Utah
220
:Jazz, Harley Davidson, and some
other really big names like MIT.
221
:If you want to hear more about
those, I'll be talking about
222
:them more in my newsletter.
223
:So you can
subscribe@datacareerjumpstart.com
224
:slash newsletter, and
I'll be talking more.
225
:About these experiences in the newsletter,
but if you want me to talk about it
226
:on the podcast or YouTube as well,
let me know in the comments down below
227
:and maybe I'll do some future episodes
on that if we get enough comments.
228
:As always, thanks for watching
and I'll see you in the next one.