Help us become the #1 Data Podcast by leaving a rating & review! We are 67 reviews away!
I'm a senior data analyst with 10+ years of experience and I'm breaking down exactly what I did, what tools I used, and what problems I solved across very different industries.
π Join 30k+ aspiring data analysts & get my tips in your inbox weekly π https://datacareerjumpstart.com/newsletter
π Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training π https://datacareerjumpstart.com/training
π©βπ» Want to land a data job in less than 90 days? π https://datacareerjumpstart.com/daa
π Ace The Interview with Confidence π https://datacareerjumpstart.com/interviewsimulator
β TIMESTAMPS
00:00 β What nobody tells you about data analyst work
01:00 β Predicting refinery outcomes with math models
04:05 β When data analytics meets machine learning
07:00 β Finding needles in millions of log files
09:23 β How one analysis ended up driving marketing & sales
π CONNECT WITH AVERY
π₯ YouTube Channel
π€ LinkedIn
πΈ Instagram
π΅ TikTok
π» Website
Mentioned in this episode:
π April Cohort β Data Analyst Bootcamp (Starts April 13th)
Ready to break into data analytics? Our April cohort kicks off with a live call on April 13th at 7pm ET where you'll meet your peers and mentors on day one. Save 20% when you enroll now, plus get LIFETIME access to our premium data job board. Join Today β https://datacareerjumpstart.com/daa
https://datacareerjumpstart.com/daa
π Thank you for subscribing
I'm a senior data analyst with 10 plus years of experience. What did I do in those 10 years? What tools did I use? What problems did I solve? That is the topic of today's episode, and I'm gonna tell you everything so that way you know what to expect as a data analyst in the future. I've had a really vast career where I've worked for one of the biggest oil and gas companies in the world, and I've also worked for a 10 person biotech startup that you've never heard of. Before, so let's get into it. By the way, if you're new here, my name is Avery Smith and I try to share useful data content that will help you start your data career. If that's of interest to you, you gotta check out my newsletter. 30,000 other aspiring data analysts are already subscribed. Go to data career jumpstart.com/newsletter or find the link in the show notes down below. So the first company I wanna talk about is ExxonMobil. And what was it like being a data analyst and a data scientist at ExxonMobil? Obviously this is one of the biggest companies in the world. There's like 70,000 employees and they do a lot of different things. Now, I worked in the downstream. Part of the business, which basically means the refiners. These are the people that are taking oil and turning it into gasoline essentially. And what do we do there as data analysts? Well, we tried to make a mathematical model of every single part of the refinery, and I don't think this is, you know, groundbreaking to those who are in the oil and gas business or any sort of manufacturing business. If you can create what's called like a digital twin or like a math twin of your process, you'll be able to experiment with the math model instead of experimenting in real life. So you can be like, well, if I twisted this temperature, or I changed this pressure, or we, you know, added this new oil, what would change? Would we make more money? Would we make less money? What would go well? What would go poorly instead of actually experimenting In real life, you can experiment with these simulations with your data model, and that way you don't actually have to do it in real life. Now to create these models, there's lots of different ways that you can do them. I'm not getting into the nitty gritty of like. Modeling these types of things. But when you think model, the simplest version that you can think of in your head is linear aggression. And if you're not familiar with linear aggression, you learned it definitely in school. It's the simple thing of Y equals MX plus B. That's the simplest form. So basically you have an input. An X. If based upon your input, can you predict what the output is going to be? If it, you know is a linear relationship, you'll be able to have the slope that's the m and some sort of a y intercept, and basically guess what the output the Y is going to be based on the X. Now you can do that a lot more complicated. You could do multivariate, linear regression, which is like y equals. M1 X one plus M two X two plus X 3M three. Oh, it's so confusing. But my whole point here is like we were doing these mathematical models, and the simplest form that you can think of is linear aggression. So I created a lot of these models as a data analyst. And I also used data analytics to try to understand our simulation results better. So we'd actually run dozens, hundreds, thousands of simulations trying, you know, different things. Well, what if this pressure went up by a little bit, or this temperature went down? To actually look at a thousand different results is really hard to do. So we used data analytics to try to understand the results a little bit better. And a lot of this was done in a Power BI dashboard, so I used a lot of Power BI dashboards right there. And to do the modeling. We actually did a lot in Excel, believe it or not, and we did a lot in Python and we even used a more proprietary software that you don't hear a whole lot. It's from sas. It's called Jump, JNP, to do our modeling. So those are the tools that we're using at Axon, and that's the problem that we're trying to solve is basically, hey, if we wanna make changes inside of our huge manufacturing system, can we actually come up with a way to test it before testing it in real life so we can kind of know and expect what to happen? I think that's common for, you know, manufacturing. I think that's common for any sort of like time series data you might have is if you can create a model, it's useful for the company to be able to predict the future and be able to figure out what's going to happen. A lot of the times this type of analytics is called prescriptive analytics, where you're actually like trying to not predict what's going to happen in the future, but trying to decide if you make these changes. How will the system basically be affected? The next data job I wanna talk about was when I was a data analyst at this nano biotech startup, like think 10 people. When I joined the company, this company made really cool nano sensors. So think of it as almost like a game boy, uh, game, like from the olden days, that's like the size of this little board. And on this board there was a bunch of different sensors this, you know, chemistry company had built. And the sensors would basically react to what was in the air and we would track. How their electricity basically, or their, their amperage or their current, through these different sensors would change when these different chemicals in the air hit it. So, for example, if you were holding it in the air, you know, all the lines would be kind of stagnant. But for example, let's say you brought an orange next to it, it would basically smell the orange. And each sensor would react differently to that orange being nearby. And when you have, uh, an array of these 12 different sensors, you can basically create the equivalent of like a fingerprint, but for smells. So think of it as like the smelling device that would basically take smell prints. My job as a data analyst there was to actually look at the time series data. 'cause we'd run these experiments where you'd have like basically background noise for a certain amount of time and then you'd introduce something like an orange for maybe 30 seconds and then take the orange away. And we'd look at these time series and we're trying to use these time series data to actually create these smell prints. And that's a very difficult thing to do. It actually most of the time took machine learning. So once again, this is maybe a more advanced data analyst role. 'cause most data analyst roles. You're not really using machine learning. This type of machine learning is often called classification, where you're basically trying to match data to a certain category based off of its data. So for example, I could bring an apple near it, right? And the sensors would react. Maybe they'd go all down, and if I brought an orange next to it, maybe all the sensors would go up. And so you can come up with some sort of an algorithm that would be like, okay, if the sensors go up, it's an apple. If they go down, it's an orange. Now that's really oversimplifying it because apples and oranges, those are only two things that exist in the universe, right? There's like so many different things that exist. We were playing a little bit bigger stakes. You can think of it when you go to uh, TSA line and. And sometimes they, you know, swab you and they're trying to see if you have like any drugs or any bombs on you. That was kind of the stakes that we were playing with in some of our use cases. So I would take this data that oftentimes, you know, was time series based. We usually had like 12 to 16 to 24 different sensors on there. And I would try to make these smell prints using classification models in machine learning. Now, a lot of the time I was doing this in Python python's. Great for doing things in machine learning. There was even some simple algorithms that I created that were. Based in Excel, but they are pretty simple. The more complicated stuff. I was doing Python at the time. Also, just because we were doing a lot of these experiments, SQL would've been really helpful. We weren't actually using SQL as much as we should have. We really should have been using sql. Uh, looking back on it a little bit more. The third experience I wanna tell you about was when I was doing my own, uh, data science consultancy firm, and I got hired by a cybersecurity company to help them with a few things. So obviously we live in this digital age. Cybersecurity is really important, so there's a lot of opportunity in cybersecurity. And the interesting thing about cybersecurity is a lot of the data is like. Hidden in logs, because basically anything you do online, anything you do on the internet gets logged one way or another. Like it's, it's in there. They're capturing everything, but when you capture everything, you're kind of capturing nothing at the same time because it's really hard to figure out what's the signal amongst so much noise. And so this company in particular was basically getting a bunch of internet logs for companies in what you can consider their workspaces. So for instance, all of their Microsoft logs, all of their Google logs, if they're using Slack, their Slack logs, maybe their employee customer history. Just think of like anything a company might be interested in from a cybersecurity stand. We were just getting a bunch of the logs. Now in these logs, there's maybe little needles in the haystack. There's maybe little gems that can be pulled out. It requires a lot of analysis to try to figure out what's in there. Just imagine you're getting like a ton of hay and you have to find this little needle. And so my job was to go in there and try to see if there was any needles, anything that was like really worth diving into and investigating more, and also just summarizing everything that was happening. This is how many logins you had on Google today. This is how many, you know, logouts you had on Microsoft. You know, this is how many users you had from these different states. Just like from these giant enterprise organizations where they have thousands of employees and a bunch of things going on. Like how do you know everything's going okay? Are you sure that like everyone is where they say they are? Are you sure you don't have any intruders, you know, people accessing stuff from a place that you probably shouldn't? Those types of things. So we were basically taking. These huge dumps of logs that weren't really important, that weren't really interesting, and aggregating them and trying to find the interesting things. And then also making sure that nothing nefarious was going on to do that analysis. I was actually using all of Python, but I could really choose what tool I wanted to. I just chose Python personally because I'm very comfortable in Python. I'm, I'm decently good at Python, uh, and I can do things quickly with Python. I probably couldn't have done this as easily, like in Excel. You probably could have done similar stuff in SQL if you wanted to. One thing I really like about Python is it can do anything, maybe not extremely well, but it can do anything. Um, so like I was doing all my analysis. Uh, in Python and I was creating data visualizations in Python. They even used a lot of the insights I found, like in terms of aggregates. They basically like aggregated all of their customers data and would publish like a, a yearly or, or biannual report of like cybersecurity incidents. And so they were kind of like with graphs that I was creating with some of these KPIs or metrics that I was monitoring. That way they could kind of inform the cybersecurity, you know, fields all of their customers about like what the trends and what we were seeing on a big picture standpoint. And that was actually really useful 'cause people would start to like read that and be like, oh, I really like this company. I wanna work with them. And that would bring in new customers. So even though like I was doing that analysis for individual customers at an individual level. That analysis actually ended up being really useful for their marketing team as well to get more sales and more customers in the pipeline. Now, I've actually worked for way more than just these three companies. I've probably done work for about 12, including like the Utah Jazz, Harley Davidson, and some other really big names like MIT. If you want to hear more about those, I'll be talking about them more in my newsletter. So you can subscribe@datacareerjumpstart.com slash newsletter, and I'll be talking more. About these experiences in the newsletter, but if you want me to talk about it on the podcast or YouTube as well, let me know in the comments down below and maybe I'll do some future episodes on that if we get enough comments. As always, thanks for watching and I'll see you in the next one.

