97: Winning with Data Science; A Handbook for Business Leaders w/ Akshay Swaminathan
February 14, 202448:08

97: Winning with Data Science; A Handbook for Business Leaders w/ Akshay Swaminathan

In this podcast episode, Avery talks with Akshay Swaminathan, co-author of the book 'Winning with Data Science: A Handbook for Business Leaders.

They discuss the vague terms often thrown around in the data science industry and underline the importance of understanding these terms in order to make effective business decisions.

Swaminathan highlights the need to leverage data science in healthcare and real estate.

The episode covers various aspects of data analysis, including prediction, association, and description.


Connect with Akshay Swaminathan :

🀝 Connect on Linkedin

πŸ“˜ Learn About Winning with Data Science


βœ‰οΈ Discover what we wish we knew about landing the dream job

πŸ€– Data Analytics Answers At Your Finger Tips


🀝 Ace your data analyst interview with the interview simulator

πŸ“© Get my weekly email with helpful data career tips

πŸ“Š Come to my next free β€œHow to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp


Timestamps:

(12:14) - The Value of Data Science in Business (14:57) - The Role of Data Science in Problem Solving (20:36) - Understanding the Difference Between Data Science and Data Analytics (25:39) - The Importance of Framing Business Questions for Data Science (30:48) - The Power of Meeting Business Needs with Data Science: A Real-Life Example (38:26 ) - The Limitations of Data Science


Connect with Avery:

πŸ“Ί Subscribe on YouTube

πŸŽ™Listen to My Podcast

πŸ‘” Connect with me on LinkedIn

πŸ“Έ Instagram

🎡 TikTok

Mentioned in this episode:

πŸ’™ Thank you for subscribing

YouTube Channel

πŸš€ April Cohort β€” Data Analyst Bootcamp (Starts April 13th)

Ready to break into data analytics? Our April cohort kicks off with a live call on April 13th at 7pm ET where you'll meet your peers and mentors on day one. Save 20% when you enroll now, plus get LIFETIME access to our premium data job board. Join Today β†’ https://datacareerjumpstart.com/daa

https://datacareerjumpstart.com/daa

[00:00:00] This episode is brought to you by Shopify.

[00:00:03] Forget the frustration of picking commerce platforms when you switch your business to Shopify, the global commerce platform that

[00:00:10] supercharges your selling wherever you sell. With Shopify,

[00:00:14] you'll harness the same intuitive features, trusted apps and powerful analytics used by the world's leading brands.

[00:00:20] Sign up today for your $1 per month trial period at shopify.com slash tech all lowercase.

[00:00:26] That's shopify.com slash tech.

[00:00:30] This episode is brought to you by La Quinta by Wyndham.

[00:00:33] Your work can take you all over the place like Texas.

[00:00:36] You've never been, but it's going to be great because you're staying at La Quinta by Wyndham.

[00:00:40] Their free bright side breakfast will give you energy for the day ahead.

[00:00:43] And after you can unwind using their free high speed Wi-Fi tonight, La Quinta tomorrow you shine.

[00:00:49] Book your stay today at LQ.com.

[00:00:53] I think honestly, all these terms, you know, people throw them around.

[00:00:57] Most of them are pretty vague.

[00:00:58] Data science is such a huge umbrella term that can encapsulate things like data analytics, data visualization,

[00:01:07] data engineering, machine learning, causal inference.

[00:01:10] Welcome to the Data Career Podcast, the podcast that helps aspiring data professionals land their next data job.

[00:01:16] Here's your host, Avery Smith.

[00:01:19] Hello, podcast friends.

[00:01:21] In this episode, you're going to be learning more about this topic, which is winning with data science.

[00:01:28] A handbook for business leaders and to our audio only audience, I'm holding up a book.

[00:01:34] This book, we interviewed the author.

[00:01:36] His name is Akshay Swaminatham.

[00:01:38] He's a data scientist who works on strengthening health systems.

[00:01:41] He has more than 40 peer reviewed publications, which is a ton.

[00:01:45] And his work has been featured in the New York Times and Stat.

[00:01:49] Previously at Flatiron Health, he currently leads the data science team at Cerebral, which we talked about in the podcast,

[00:01:54] and is a Knight-Hennessy scholar at Stanford University School of Medicine.

[00:01:58] So we interview Akshay.

[00:02:00] Really interesting book, really interesting interview.

[00:02:03] This book and the interview is going to be what it says, basically data science, a handbook for business leaders.

[00:02:10] But I would say that it's a little bit more than just business leaders because I found the book extremely useful and I wouldn't really consider myself a business leader necessarily.

[00:02:20] But if you're just getting started in data science, I think this will also be useful because what they do in the book and what Akshay does in the interview is really break down to what actually is data science.

[00:02:31] What can it actually do?

[00:02:33] How do we use it in the field?

[00:02:34] You know, what are the terms that we should know and stuff like that?

[00:02:37] So they sent me a copy of this book.

[00:02:39] I really found it easy to read.

[00:02:41] It has lots of real definitions, real use cases, focused on business outcomes, which I really liked.

[00:02:46] And it has like a fun dialogue.

[00:02:48] I've never seen really a technical book do where it has like two fictional characters that are talking back and forth, which I thought was kind of fun.

[00:02:55] I'm actually going to give this book to my neighbor or at least refer it.

[00:02:59] Neighbor, I don't know if you're going to get it or if you're going to just get the referral because he's like a middle manager at a software company.

[00:03:05] And he recently came over to my house and was like, teach me everything about data science.

[00:03:09] And we sat down for an hour.

[00:03:10] He's like, can we do another one of these?

[00:03:12] Just instead of that, you can just read this book.

[00:03:14] It's quite good.

[00:03:15] Yeah.

[00:03:16] So I think you'd like it if you're in business or if you're just getting into the data science world.

[00:03:20] I think there's some good definitions.

[00:03:23] So I think you guys should be pretty excited about this episode.

[00:03:26] Before we get into it, I have to tell you guys about some fun things that are up and coming.

[00:03:31] So one of the things I want to tell you guys about is I've been working on a really cool presentation about how to land any sort of job.

[00:03:39] Not necessarily a data job, but just landing jobs, the job hunt, the job search in general.

[00:03:44] I'm pretty excited about that.

[00:03:45] And we're going to go ahead and give you guys some free tips in the meantime.

[00:03:50] So if you check the link, the description down below, we have some free email tips that we're going to be sending you.

[00:03:55] So definitely want to check those out.

[00:03:57] I'm pretty excited about that.

[00:03:59] The thing I want to tell you about is Avery GPT.

[00:04:02] If you're listening to this, it just came out.

[00:04:04] It is basically a version of me as ChatGPT.

[00:04:10] And what we did is we took all the information from the podcasts and from my LinkedIn posts and from my social media and stuff like that, and created a chat bot.

[00:04:19] And you can basically talk to me.

[00:04:20] So I do one-on-one coaching, but it's hard because I charge typically when I'm doing like consulting work, I charge close to $300 an hour.

[00:04:27] I obviously don't want to charge that to you guys.

[00:04:30] So I do a discounted rate for all my one-on-one calls, but it's just hard to reach all of you guys.

[00:04:34] So this is like a fun way you could potentially get coaching from me without having to pay for it.

[00:04:38] It is going to be free.

[00:04:40] So you want to check that out at the link for that will be description down below a hundred percent free to use and just check it out and tell us what you think.

[00:04:48] All right, enough stuff.

[00:04:49] Let's go ahead and get into this episode.

[00:04:56] Welcome back to another episode of the Data Career Podcast.

[00:04:58] I'm very excited for my guest today.

[00:05:01] We have an author, the author of Winning with Data Science.

[00:05:06] We have Akshay Swaminathan, who is one of the coauthors of this awesome book.

[00:05:11] Akshay, welcome to the podcast.

[00:05:12] Thanks for having me.

[00:05:13] Really happy to be here.

[00:05:14] Yeah, super excited about the book.

[00:05:16] Just came out, Winning with Data Science.

[00:05:18] How is the whole process of writing the book?

[00:05:21] Actually, before we get into that, if you were to describe yourself, what you do now and how you got into writing the book, what would you describe?

[00:05:28] Sure.

[00:05:28] So I work on strengthening health systems using data.

[00:05:32] What that means is using data to help everyone basically get treatment to the right person at the right time.

[00:05:40] And so I've worked in, you know, the mental health field.

[00:05:44] I'm currently leading the data science team at Cerebral, which is a national mental health startup.

[00:05:50] And my role there involves using data science to improve the lives of clinicians and patients.

[00:05:56] So that's, that's what I'm about.

[00:05:58] And the book really came out of my experiences and my coauthors experiences working on these cross-functional teams where you have data scientists working with business people,

[00:06:09] clinicians, non-technical people to accomplish a common goal to solve business problems.

[00:06:15] And what we found is that for some of these projects, we'd collaborate with business stakeholders who were really data savvy and that just elevated the entire collaboration.

[00:06:25] They knew the right questions to ask.

[00:06:27] They knew which assumptions to question.

[00:06:29] And that made us perform better as data scientists and ultimately resulted in a better, better project, better deliverable.

[00:06:37] And so we wanted to write this book.

[00:06:39] There's so many resources out there for people who want to be data scientists, but not as many resources out there for people who work with data scientists.

[00:06:48] And now, you know, as AI and data are becoming more and more integral to every business model, we need more resources to coach people who collaborate with the technical teams on how to be effective collaborators and stakeholders.

[00:07:02] So that was the motivation for writing the book.

[00:07:05] That's awesome.

[00:07:06] I really like the idea behind that, where data science has become so sexy and so big in the world.

[00:07:13] And a lot of people want to become data scientists ultimately, but not everyone can be a data scientist.

[00:07:18] If we were all data scientists, the world would probably stop spinning and we would just be geeking out on statistics and machine learning and stuff like that.

[00:07:26] We need business people.

[00:07:27] We need product people.

[00:07:29] We need managerial people who aren't going to be actually doing any sort of data science by hand necessarily, but they need to be familiar with the concepts and the ideas and the vocabulary behind this new field.

[00:07:45] That's kind of the idea behind the book, like you said.

[00:07:47] And I've gotten to read a good chunk of the book.

[00:07:50] I think you guys do a really good job of making data science sound easy.

[00:07:55] You do a good job of putting in definitions and putting in scenarios when those words would be used.

[00:08:00] And the whole book kind of follows a fun dialogue between two fictional characters and their roles and how they interface with the data science world.

[00:08:10] It was a really well done book, so congratulations once again.

[00:08:13] I want to talk a little bit more about why data science.

[00:08:17] Why do you think that data science matters right now?

[00:08:21] What does it actually matter deep down?

[00:08:22] Like, why is it so sexy?

[00:08:23] Yeah, it's a good question.

[00:08:24] And I'll give you a couple examples.

[00:08:26] So one reason data science matters is it checks your intuition, right?

[00:08:31] So if I'm a business leader, let's say I'm running a, you know, a restaurant, right?

[00:08:36] And I have some intuition as to, you know, which dishes are selling the best and which dishes my customers like the best.

[00:08:45] But we know that humans, our intuition often fails us.

[00:08:49] I don't know if you've seen those restaurant makeover shows on Food Network where you have someone like a Gordon Ramsay go into a new restaurant and the restaurant owner says, oh yeah, you know, we have the best dishes.

[00:09:00] My customers love these things and Gordon Ramsay tastes them and he's like, oh, this is terrible.

[00:09:04] You know, what are you guys doing?

[00:09:05] So data is often kind of like the Gordon Ramsay where when you, the data doesn't lie, right?

[00:09:12] Well, sometimes it can, we'll get to that later.

[00:09:14] But when you look at the cold hard facts about, okay, what are my sales for each of these dishes and you know, how many people are actually buying these dishes?

[00:09:24] You can't get any clearer than that.

[00:09:25] Right.

[00:09:26] And so data can often serve as a check on your intuitions.

[00:09:29] And so at a larger scale, you know, maybe you're running a business that has a huge operations unit.

[00:09:35] And when you look at the data as to, you know, the, say the expenses that each of your business units is generating, maybe you'll find that, you know, there's this one operational unit.

[00:09:46] That's, you know, the cost are just through the roof.

[00:09:48] And when you dig into it, you've, you figure out that, Hey, you know, it turns out that we have this team that's, you know, we're paying, but they're not really doing much.

[00:09:56] And so what I mean to say is data can often uncover things that we don't realize and it can check our intuition.

[00:10:04] And so that's one, you know, clear use case for data.

[00:10:07] Another one I'll talk about real quick, especially now in the context of AI is that data is often a business's most valuable asset and can actually end up becoming a product or part of a product.

[00:10:18] So, you know, you must have seen, you know, everyone has seen all these companies that are using generative AI to, you know, produce content.

[00:10:26] A lot of people creating wrappers around models like GPT.

[00:10:29] What people don't realize though is that more important than any of these models is the actual data that's, that's powering the model.

[00:10:36] So the data used to pre-train and fine tune these models is much more valuable than models themselves.

[00:10:43] The reason for that is that open source model technology is going to quickly catch up, has already caught up in many types of tasks.

[00:10:52] And so what's really going to differentiate one business from another is the data that they have access to.

[00:10:58] You know, I was just talking to some companies that are in the space of using generative AI to write doctor's notes.

[00:11:05] Right.

[00:11:06] And so this is a really important use case because doctors spend a lot of time writing notes.

[00:11:10] And so if you can use AI to do that, it'll basically speed up the process a lot.

[00:11:14] And so what these companies are saying is that, you know, before they were relying on GPT, but soon they're not going to be anymore because they're collecting so much of their own data.

[00:11:24] Right.

[00:11:24] Their own data from doctor patient interactions, from doctor's notes, and they're going to use that like their own proprietary data to fine tune their models.

[00:11:35] And that is going to be much more effective than relying on any company's API.

[00:11:40] So that's just one example of how as a business, if you can build up your own proprietary database, you can use that database as its own product or to power your own products, which can be significant sources of revenue.

[00:11:56] So those are just two cases for why data science.

[00:12:00] I love it.

[00:12:00] I love it.

[00:12:01] And I think with those two things, with the ability to like actually almost see truth through numbers of like, oh yeah, actually, you know, and that person

[00:12:10] that restaurant owner might be like, oh yeah, you know, everyone loves the beef stew or whatever.

[00:12:16] Right.

[00:12:17] But it turns out if you look at the numbers, you know, more people are getting the vegan curry at the end of the day.

[00:12:21] Like we as humans are pretty flawed in our ability to capture what's actually happening around us over a long period of time.

[00:12:30] We're pretty good at knowing what's happening in front of us right now.

[00:12:33] But if you do that every hour for every day for six months, we actually kind of lose track of what actually happened and we're not very good at describing it.

[00:12:41] So if you can use that data or if you can use data to see reality through that lens, that could lead to a lot of money.

[00:12:48] Obviously, the generative AI can lead to a lot of money.

[00:12:51] So there's a lot of money to be made with data science.

[00:12:54] But then also, like more importantly, maybe like you said is if doctors are spending less time taking notes, that means they're spending more time, you know, seeing patients, helping people get healthier.

[00:13:04] So there's also more than just money to be made.

[00:13:08] I do want to ask, like, why if I'm a, like, let's say a product manager or maybe I'm in marketing or let's just say I'm actually not a data scientist.

[00:13:18] Why should I learn anything about data science?

[00:13:20] Why can't I just leave it to the data scientist?

[00:13:22] So the reason that, you know, say non-technical people need to have some foundation data science is that sooner or later, we're all going to be working with data scientists and people on the data team.

[00:13:34] So let's say you're a marketing manager, right?

[00:13:37] You're responsible for your company's marketing strategy.

[00:13:39] A big part of marketing is running ad campaigns, running these marketing campaigns and measuring the success of those marketing campaigns.

[00:13:46] And then when you and then also doing the analysis of which campaigns are performing better or worse.

[00:13:52] And based on that, how do we pivot our marketing strategy?

[00:13:55] All that involves data.

[00:13:56] If you're a marketing manager who doesn't understand how to track the right metrics when it comes to the success of your campaigns and then how to use data to inform when you pivot, how to do, you know, attribution, how to do attribution modeling to basically say, OK, this client came in thanks to

[00:14:15] campaigns A, B and C.

[00:14:17] If you don't know how to do, how to use data to do those things, you're going to lose out compared to all the marketing managers out there who are working with their data science colleagues.

[00:14:27] Maybe they're not data scientists themselves, but they're working with the data teams to make an attribution model, to identify, to set up dashboards and understand, OK, what are my highest performing campaigns?

[00:14:38] So that's the, that's the long and short of it.

[00:14:42] Basically, you're going to lose out compared to all your other peers who can be effective collaborators with their data colleagues.

[00:14:50] It's almost like having a data, whatever domain you're in, if you have a data scientist, you gain these powers to clearly see reality and hopefully get insight that will lead you to make better decisions.

[00:15:01] And I think one thing that's worth saying as well is data scientists, a lot of the time, I mean, of course, there's some data scientists that are better than others and there's some that are worse than others.

[00:15:11] But a lot of data scientists might be missing domain knowledge.

[00:15:16] When I worked at ExxonMobil, I worked on the data science team, but I was a chemical engineer.

[00:15:21] And so I really understood different principles of refining and stuff like that.

[00:15:25] I knew about different elements and what went together, what didn't.

[00:15:29] And sometimes I worked with my data science colleagues and they wouldn't really know what's going on with the actual refining process.

[00:15:36] And so that background knowledge I had often helped me.

[00:15:38] But if you can be the data scientist's background knowledge and you guys can work kind of in tandem, I think that's a really powerful combination that can lead to some serious impact.

[00:15:48] Do you agree?

[00:15:49] Yeah, I agree.

[00:15:50] And I think oftentimes the most underrated piece of this whole puzzle is figuring out what to work on.

[00:15:59] Right. If you just leave a data team to their own devices, I guarantee you they're going to build something cool.

[00:16:04] They're going to build something interesting and they're going to generate some insights.

[00:16:08] But does it even matter?

[00:16:10] Do those insights actually add value to the business?

[00:16:13] Is that model that they're building ever going to be used in a meaningful way?

[00:16:17] And that's where you need a business partner and a business partner who can speak the language of data science.

[00:16:25] Right. I was part of a project not too long ago where my team developed a model to predict when patients would fall out of treatment, basically when they would churn.

[00:16:36] And that model never ended up getting used.

[00:16:38] And the reason was we didn't have an effective business partner who could tell us from the beginning, hey, you're trying to solve the wrong problem.

[00:16:46] Like this is actually not a problem for us right now.

[00:16:48] No one is going to use this model even if you build it and even if you generate some cool insights.

[00:16:53] If we had such a person, it would have saved us months of work.

[00:16:58] You know, thankfully, we were able to repurpose that work and it became useful later on.

[00:17:02] But oftentimes that's not the case.

[00:17:04] Oftentimes a data team will spend months and the company will spend hundreds of thousands, even millions of dollars to work on a project that ultimately has little to no business value because there was a lack of a strong partnership between the data team and a business team.

[00:17:23] Which is bad. You never want to be spending a million or wasting a million dollars when you could be saving it or you could be generating it.

[00:17:30] And look, they could have just saved all that money had they read the book in the first place.

[00:17:35] Right. They would have gotten all of that good knowledge.

[00:17:39] Yeah, I totally agree.

[00:17:40] Once again, when I was at Exxon, we actually had a sub data science team or almost like an adjacent data science team.

[00:17:47] I was part of the research and development data science team that would go off, like you said, and build really cool models.

[00:17:54] But there was kind of a business data science team and they didn't really even code that much.

[00:17:59] Their only job was to go talk to different parts of the businesses and come back with the use cases that they'd basically hand off to us.

[00:18:09] Or if it was easier, that they would kind of do it.

[00:18:11] But yeah, almost like this layer in between the business and the data science team.

[00:18:15] This whole team existed for communication purposes and kind of figuring out what's the right project we should be working on and what's the ROI.

[00:18:23] Yeah, I mean, this advice goes, if you're a data scientist too, this advice applies to you too, right?

[00:18:28] The best way to level up your data science performance is to gain more domain knowledge and gain a better understanding of the business, right?

[00:18:36] If you want to make yourself look good or make your boss look good, the best thing you can do is say, hey, I actually don't think we should be working on this.

[00:18:44] We really need to be working on this instead, because this is where the biggest business pain point is.

[00:18:50] If you can suggest a pivot like that, that saves your team from working on a project on a road to nowhere and instead build something that actually generates business value, everyone's going to love you.

[00:19:05] The business people are going to love you.

[00:19:06] The data people are going to love you.

[00:19:06] It's a win-win.

[00:19:07] I think that's one of the hardest things about being a data scientist is you've been trained, whether it's through a bootcamp or a master's degrees or just experience or whatever, you've been trained that, yes, I care about P values.

[00:19:19] I care about how low can I get that P value?

[00:19:22] How many different variables?

[00:19:25] What's my R squared?

[00:19:28] Is this variable significant or not?

[00:19:30] Whatever.

[00:19:30] But really, in the day, you have to really think in dollars.

[00:19:32] It's like, I'm not just doing data science for data science sake.

[00:19:37] I'm actually doing it for value sake.

[00:19:39] I use dollars as just the lazy way of saying value because it's easy to quantify.

[00:19:45] But that value could be quantified in CO2 emissions, tons of CO2 emissions saved or lives saved or stuff like that.

[00:19:54] But the whole point is you're not actually doing data science to build a pretty graph or to build a machine learning model.

[00:20:00] You're doing it to make dollar bills or save people.

[00:20:03] Exactly.

[00:20:04] Sometimes us data people, we have to be a little bit humble and realize that all the time that we've put into learning these sophisticated techniques might not be the right tool for the problem.

[00:20:20] A common issue is you have a hammer and you go looking for nails.

[00:20:24] We learned this cool modeling technique and we're searching for a problem to apply it to.

[00:20:30] Instead, we should be doing...

[00:20:31] You got to start with the problem.

[00:20:32] You got to start with the question.

[00:20:34] And even if the solution, even if the right method, the right hammer, the right tool is just an Excel spreadsheet, a table, a two by two table, that's okay.

[00:20:44] Oftentimes, a well-done spreadsheet can save your company millions of dollars.

[00:20:51] So it's easy to get hung up with the tools, but we should be focusing on the problems.

[00:20:57] It's easy to get hung up trying to look sexy and try to use the latest sexy model.

[00:21:02] That's actually one of the quotes I really liked from your book.

[00:21:05] I just wanted to read it.

[00:21:06] Quote from the book, a dirty little secret about data science is that the most advanced cutting edge modeling methods are often not necessary to gain most of the value from data.

[00:21:16] So yeah, sometimes it is just Excel or sometimes it's linear algebra, whatever it is.

[00:21:21] It focuses less on how it sounds and how it looks and instead focus on how much value it brings.

[00:21:27] I do want to ask you a question and I didn't prepare you for this.

[00:21:29] We can take it out if we want to.

[00:21:31] But what is your definition of data science versus data analytics?

[00:21:36] Are those terms synonymous?

[00:21:37] Are they closely related?

[00:21:39] What is your thoughts about it?

[00:21:40] I think honestly, all these terms, people throw them around.

[00:21:43] Most of them are pretty vague.

[00:21:45] Data science is such a huge umbrella term that can encapsulate things like data analytics, data visualization, data engineering, machine learning, causal inference.

[00:21:57] So I think it's not as useful to talk about how do you define these terms?

[00:22:03] What fits in this, what doesn't.

[00:22:04] Let's talk about the actual skills and deliverables.

[00:22:08] So when I think data analytics, that's a little bit more of a precise term.

[00:22:12] So there I'm thinking, okay, what are the tools?

[00:22:14] Tools are SQL.

[00:22:16] You're interacting with databases.

[00:22:18] You're pulling data from these databases based on queries that you're getting from your business counterparts.

[00:22:24] How many customers did we have in the last month that have blue hair and bought this product between 5am and 8am?

[00:22:33] Whatever convoluted queries that they're asking you to pull, you've got to be able to manipulate SQL to effectively pull that data.

[00:22:40] Maybe you're creating dashboards to visualize that data.

[00:22:42] Maybe you've set up a data ecosystem where you have a database followed by a transformation layer.

[00:22:48] And then that data is coming out of that into a business intelligence tool like a Tableau or Power BI or a Looker.

[00:22:55] So that's what I think when I think data analytics.

[00:22:58] Data science is actually such a huge umbrella term that we've got to be more specific before we dig in more there.

[00:23:04] I agree.

[00:23:05] I hate all the terms in the data world.

[00:23:08] Like they all basically, it's all so new too that it's always changing.

[00:23:12] It is really hard to put a box on any of these titles.

[00:23:15] Moving on, talking about how different businesses can use data science.

[00:23:20] So actually before we get into different businesses, I want to talk about at the end of the day to do data science.

[00:23:26] There's actually, it's a very complex term like we just talked about, but there's actually not all that many different things that we can do to different in the data science world.

[00:23:37] Can we quickly just run through like when you're doing data science, what are the main types of questions we're answering or analyses that we're doing or tasks that we're trying to figure out?

[00:23:48] If we had to boil it down to, obviously every problem is unique and individual, but if we had to generally summarize, what does data science answer?

[00:23:57] What would those questions be?

[00:23:58] Yeah, this is important for everyone to know.

[00:24:01] I think a simple way to think about it, there are three main types of questions that we can address using data science.

[00:24:07] So the first is prediction.

[00:24:10] The second is association or inference.

[00:24:13] And the third is description or exploration.

[00:24:16] So the first one, prediction, right?

[00:24:17] Can we predict an outcome?

[00:24:19] Right.

[00:24:19] Can we predict which clients are going to churn?

[00:24:22] Can we predict which clients are going to convert?

[00:24:24] Can we predict who's going to buy this product?

[00:24:27] Those are all prediction problems.

[00:24:29] Can we predict which patients are going to have some outcome within some time window?

[00:24:34] And so, you know, we have a number of methods at our disposal to solve those types of problems.

[00:24:40] So the second one, association or inference, right?

[00:24:42] What attributes are associated with this outcome?

[00:24:45] For example, let's say you have a cohort of clients that stays with you for more than five years, right?

[00:24:53] Hyper loyal customers versus people who churn within a month or two, right?

[00:24:59] What makes those two cohorts different?

[00:25:01] So what are the characteristics or attributes that are associated with longer time to churn versus shorter time to churn?

[00:25:08] That's an example of an association question.

[00:25:11] And then the third is description or exploration, right?

[00:25:14] Here you just want to understand some group or some phenomenon better, right?

[00:25:19] For example, sticking with this churn example, who are these people who stay with us for more than five years, say, right?

[00:25:28] What, like, who are they?

[00:25:29] What are their demographics?

[00:25:31] What are their hobbies?

[00:25:31] What are their interests?

[00:25:33] Another example of this might be, okay, what is the customer journey look like during their first month that they join your product or service, right?

[00:25:42] What activities are they doing?

[00:25:44] Like, what is their timeline?

[00:25:45] So this is more of a descriptive exercise.

[00:25:49] And we also like to call this thing a hypothesis generating exercise, right?

[00:25:53] So here the data scientists might pull a number of different visualizations and create a number of different tables.

[00:26:01] And by saying, oh, turns out that everyone during their first week of the service, they are, you know, choosing to download this app or they're clicking on this button that we have, or, oh, it turns out, you know, in week two, a bunch of them are, you know, filing a support ticket for this issue.

[00:26:19] And so we call this hypothesis generating because through the process of creating these exploratory visualizations, you generate hypotheses and you generate ideas for followup analyses.

[00:26:30] Right?

[00:26:30] So those are kind of the three buckets.

[00:26:32] And the reason it's important to understand these is if you're a business person who is coming to their data team and you say, Hey, we want to be able to predict some outcome, or if you can frame your business question in terms of one of these three prediction, association, or exploration,

[00:26:47] the data team will have a much easier job understanding what it is you actually want.

[00:26:53] Right?

[00:26:54] And as a data person, if you're a data person, your job is to take the business person's request and frame it and clearly put it in one of these buckets.

[00:27:03] Because the worst thing you can do as a data person is, you know, take a business request, not really be clear on, okay, are we building a prediction model or do we just want associations here?

[00:27:14] And then you end up doing some weird combination of the two, and it's not really what anyone wants or cares about.

[00:27:20] So on both ends, it's useful to understand these three types of questions.

[00:27:24] That's awesome.

[00:27:24] I really like that.

[00:27:26] You're either predicting something, you're figuring out how two things are related, or you're describing something.

[00:27:31] It really puts it into three separate buckets.

[00:27:34] Let's go and dive into how some different organizations or companies would maybe do those things.

[00:27:41] So what I actually did was I asked some of my students inside of my bootcamp, the Data Analytics Accelerator, what industries or what organizations that they would be curious how data science could be applied.

[00:27:55] And we are going to quiz Akshay here and ask how he thinks that data science could be applied in these organizations.

[00:28:04] So the number one request from the students was hospitals.

[00:28:08] And it sounds like you actually have some healthcare experience.

[00:28:10] This is great.

[00:28:11] How could a hospital use data science?

[00:28:13] So many examples here.

[00:28:15] I'll just throw some out and we can dig into them as we see it.

[00:28:19] So one example, and we did this at Cerebral, is giving data back to clinicians.

[00:28:25] Right?

[00:28:25] So a lot of times it's difficult for clinicians to see who are all the patients that I'm responsible for.

[00:28:32] And out of all the patients I'm responsible for, who is doing the best?

[00:28:36] Who's doing the worst?

[00:28:37] Who hasn't seen me in a while?

[00:28:40] Who needs to see me very soon?

[00:28:42] Who is in remission, right?

[00:28:43] Who has gotten a lot better?

[00:28:45] Who's gotten really worse?

[00:28:46] Who's going to have a impending crisis?

[00:28:49] And so what we did at Cerebral was create personalized data reports for each clinician that shows very simple data on who are all the patients they're responsible for, who is showing up to appointments, who's not showing up.

[00:29:03] Right?

[00:29:03] So some data on the patient, some data on the clinician's care delivery themselves.

[00:29:07] Right?

[00:29:08] A little bit of peer comparison data so that they can see, okay, where do I stand in relation to my clinician peers?

[00:29:15] And that is not standard practice in most, especially in the mental health field, but in healthcare organizations more broadly.

[00:29:23] So that's one example we can talk about.

[00:29:26] We can talk about identifying patients who are about to experience a bad outcome.

[00:29:31] Right?

[00:29:31] That's a broad class of things.

[00:29:33] A very common one here is prediction models for sepsis.

[00:29:37] Right?

[00:29:38] That's a one condition that's been studied a lot.

[00:29:41] Can you build models to predict when patients are going to have sepsis?

[00:29:45] And so we can talk about that, but I think what I want to emphasize in this healthcare space, especially there is a huge chasm between the technology solutions, the models, the data science solutions that get built and the ones that get deployed.

[00:30:01] By deployed, I mean the ones that are actually ending up in the hands of clinicians, administrators, patients that actually get used.

[00:30:10] And what I mean when I say there's a huge chasm is every day, there are dozens of papers coming out about people who are building models to predict something or, you know, building data science solutions to do something.

[00:30:20] But such a tiny fraction of those actually end up getting implemented in care workflows and generating business value.

[00:30:27] So if you're in academia, you know, you can build a model, write a paper about it and, you know, the job is done.

[00:30:33] But if you're running a hospital system, if you're running a healthcare organization, you don't care about writing papers.

[00:30:39] You care about generating value.

[00:30:41] You care about improving patient outcomes.

[00:30:43] You care about decreasing provider burnout.

[00:30:46] And so how do we bridge that chasm?

[00:30:49] Right.

[00:30:49] And I think the key insight, if you're working in a hospital system or if you're in the healthcare field, you have to recognize that building and developing these models and these tech solutions requires a different skill set than actually deploying them.

[00:31:03] To build the models, you need data scientists.

[00:31:05] To deploy the models, you need implementation scientists.

[00:31:09] You need translational data science people who can work with the data people and the clinicians, because a lot of times the data people are going to build stuff that the clinicians don't want to use.

[00:31:22] Right.

[00:31:23] Or don't know how to use or don't know how to interpret.

[00:31:25] And so this happens all the time where people will build models.

[00:31:29] They'll talk to clinicians.

[00:31:30] The clinician would be like, we don't need this.

[00:31:32] You know, so it goes back to what we were talking about at the beginning.

[00:31:36] You need to start with the problem.

[00:31:37] You need to start with the people who are going to use the solution you're talking about.

[00:31:41] So, you know, an example of this that I worked on at Cerebral where we built a model to identify patients who were having a suicidal crisis.

[00:31:50] And the way we did this was by scanning through the chat messages that patients would send us.

[00:31:55] So, you know, patients can chat in and say whatever they want.

[00:31:58] They can say, you know, Hey, I need, I need help scheduling my next appointment.

[00:32:02] I need, I didn't get my meds on time.

[00:32:04] Or they can chat in and say, I'm feeling really depressed today.

[00:32:07] I'm feeling like I want to hurt myself or something worse.

[00:32:10] And in those cases, you want to detect those as fast as possible and intervene as fast as possible.

[00:32:15] So we developed a solution in collaboration with our crisis response team too.

[00:32:21] So we built a model.

[00:32:23] It's an NLP model that can recognize when a chat message indicates suicidality or homicidality or some other adverse outcome.

[00:32:32] And then it sends that flag message to the crisis response team in Slack.

[00:32:37] So if you guys, I don't know how many of your, you know, your audience will know Slack.

[00:32:41] It's a, it's a DM.

[00:32:43] It's a basically instant messaging tool that companies can use.

[00:32:46] And so our whole company was running on Slack.

[00:32:49] So you had everyone on Slack and you know, everyone who's on Slack is always checking Slack, right?

[00:32:54] So it's a very convenient way to, to manage information.

[00:32:58] So we knew from our initial conversation, right?

[00:33:01] From day one, we were talking with the crisis team leaders to build a solution that met them where they are.

[00:33:09] Right.

[00:33:10] We didn't want to build something and then say, Hey, come and use this thing that we built.

[00:33:14] We wanted to build something that was tailor made to what they were already doing.

[00:33:17] Right.

[00:33:18] So we designed using, you know, the APIs that Slack provides, we're able to get the model to send its predictions into Slack.

[00:33:26] So all those, the crisis specialists, all they have to do is monitor this one Slack channel.

[00:33:32] That's getting these messages straight from our model.

[00:33:36] And that made it very easy for them to triage these messages, these crisis messages.

[00:33:42] And as a result, we were able to bring down response times to these messages from 10 hours to under 10 minutes.

[00:33:48] And I think this is a great example.

[00:33:50] This could have been a model that we built, highly performant, amazing AUC, amazing precision recall.

[00:33:57] But no one ends up using it.

[00:33:58] That could have happened.

[00:34:00] That could have happened if we hadn't talked with the crisis team from the beginning.

[00:34:04] You know, maybe we built a separate app, a separate UI for the model.

[00:34:08] Maybe we, you know, surfaced the results in some random table or some random dashboard.

[00:34:12] But instead, because we had a great business stakeholder, a great domain expert who could tell us, Hey, if you really

[00:34:19] want this to get used by people, this is how we're doing it.

[00:34:23] We're doing everything in Slack.

[00:34:24] And so that gave us the idea, okay, you know, if we can get it in there, then it's actually going to have some impact.

[00:34:29] So that would be my number one message to people working in health systems is, you know, building the models.

[00:34:35] I'm sure all of you have the data science expertise to build good models.

[00:34:40] But what's really going to make the difference is do you have the people to deploy those models?

[00:34:46] With supply chains becoming more complex, you need to stay on top of the latest logistics developments.

[00:34:52] So if you work with logistics, you need the Beyond the Box podcast from Maersk.

[00:34:57] It's the easy way to keep up to date with everything from digital disruption and logistics to the need for supply chain

[00:35:02] resilience in today's market.

[00:35:04] Find out more and keep ahead of the game with the Beyond the Box podcast on Logistics Insights at maersk.com

[00:35:12] slash insights.

[00:35:13] This episode is brought to you by PNC Bank, who believe some things in life should be boring like banking, because boring is

[00:35:20] steady, pragmatic, responsible.

[00:35:22] You don't want your bank to be cool or sexy.

[00:35:24] Sexy is for red carpets, not banks.

[00:35:27] That's why PNC Bank strives to be boring with your money so you can be happily fulfilled with your life.

[00:35:32] PNC Bank, brilliantly boring since 1865.

[00:35:35] Brilliantly boring since 1865 is a service mark of the PNC Financial Services Group Bank.

[00:35:39] The PNC Financial Services Group Bank, PNC Bank National Association member FDIC.

[00:35:44] That's super cool that you guys just built an AI model with basically the front end being Slack.

[00:35:50] That's really meeting the business where they are, because I don't think a data scientist would be like, oh, Slack is the

[00:35:55] optimal place to display statistical results, but it's the place where it's going to get stuff done.

[00:36:03] I love that.

[00:36:03] I think that's really cool.

[00:36:04] Those were great examples.

[00:36:05] Thank you so much for sharing.

[00:36:07] Let's do one more.

[00:36:08] Let's do real estate.

[00:36:10] So how the heck could data science be used in real estate?

[00:36:13] Yeah, real estate is an interesting one.

[00:36:14] And it's funny you chose these two examples, hospitals and real estate, because those are the two domains that we cover in the book.

[00:36:22] Kamala, one of the main characters, she's at a health insurance company.

[00:36:27] And then Steve, the other main character, he's in the real estate space.

[00:36:30] So there's a nice overlap there.

[00:36:32] So for real estate, one thing we talk about in the book is that real estate is really conducive to a number of different data modalities.

[00:36:41] So you have, say, structured data, like tabular data where you have maybe house, real estate pricing data and other structured features like the neighborhood that a property is in or attributes about the property, number of rooms, square footage, number of bathrooms, half baths, whatever, things like that.

[00:37:01] You also have free text data, natural language data.

[00:37:06] You have the descriptions of the properties on maybe your real estate website, how the description of the properties is worded.

[00:37:15] And you can also see how are people responding to those posts.

[00:37:20] And then you also have some geospatial data.

[00:37:23] Spatially, how are these properties organized?

[00:37:25] What is the density of properties in a given area?

[00:37:29] How do the properties relate to the population density?

[00:37:32] What other types of buildings are within a certain radius of a given property?

[00:37:37] Is it in a good school district?

[00:37:39] Is it near to natural, is it near parks and near outdoor spaces?

[00:37:45] So there's geospatial data.

[00:37:47] So in the book, we talk about how can one leverage these different types of data modalities to do data science.

[00:37:55] So I'll just give you one example of this.

[00:37:58] One thing you could do is say, analyze the wording, how these property descriptions are written on your real estate website.

[00:38:06] And then maybe look at, you can associate that with, say, the number of visitors that a property gets or a number of people who maybe convert on the site or something.

[00:38:16] Right. Maybe of the people who see a property description, how many decide to register for an open house?

[00:38:23] Or how many decide to, I don't know, schedule a call with a realtor or something?

[00:38:27] And so this is a pretty straightforward NLP problem where you have the outcome is say registering for an open house or something like that.

[00:38:37] And then the features are the text of the description.

[00:38:40] And so using by doing that sort of analysis, you might be able to understand, OK, what are the keywords?

[00:38:46] What are the phrases that are associated with a higher conversion rate?

[00:38:50] And you can use that to improve the wording of other descriptions of other properties.

[00:38:56] So that's one example with text data.

[00:38:59] For the tabular data, I mean, there's so many examples of real estate pricing models, price forecasting models that have been built.

[00:39:07] I don't know if you know Kaggle, the famous data science contest website, but one of their seminal prediction challenges was can you predict housing prices on Zillow using some structured data sets?

[00:39:20] So if anyone is interested in trying to build a model to predict house prices, I would suggest check out Kaggle and check out what all the winning submissions did and how they engineered their features.

[00:39:32] So I think with the tabular data, there's a lot of interesting stuff that can be done with feature engineering.

[00:39:37] Right.

[00:39:38] So this is all about.

[00:39:39] So let's say you're given data like, OK, it's this many square feet.

[00:39:43] It's this has this many bathrooms has the backyard is this big.

[00:39:49] But then it's up to the data scientists to think about, OK, how can I engineer some clever features?

[00:39:54] Right. Maybe a clever feature is the ratio of the backyard size to the actual house size.

[00:40:01] Right. Another feature you could engineer is, I don't know, maybe like the number of bathrooms per square foot or so.

[00:40:09] Maybe that's meaningful.

[00:40:10] So I think there's a lot of creativity that data scientists can do in that domain as well.

[00:40:16] Perfect. That was a great example.

[00:40:18] I love those.

[00:40:18] Is there a domain where you think that data science can help?

[00:40:22] I think definitely as a data scientist, I believe that going back to the hammer problem.

[00:40:29] Right. We we have a tendency to believe that our tools are we have these these tools that we think are great and we go looking for for problems.

[00:40:38] There are definitely some problems where data science is not going to have the answer.

[00:40:42] For example, data science can often tell you the who, the what, the where, the how.

[00:40:47] But it can't always tell you the why.

[00:40:49] Right. So oftentimes the most effective combination for answering a question is a quantitative approach plus a qualitative approach.

[00:40:59] And you see this all the time with user interviews.

[00:41:01] Right. So you can track the quantitative data from your from your NPS survey, from customer satisfaction surveys.

[00:41:09] But oftentimes the most valuable information you're going to get is when you actually sit in front of a human being and just have a conversation.

[00:41:17] So I think that's it's important for people to realize that you need both.

[00:41:21] Right. I mean, now with with generative AI, you can actually apply AI to help you analyze the qualitative results as well.

[00:41:30] Right. You can record transcripts of your interviews and have AI help you analyze those transcripts.

[00:41:35] But that's kind of a separate note.

[00:41:37] But if you're a business leader trying to make business decisions, I think you want to have a holistic approach where you consider quantitative and qualitative aspects.

[00:41:45] I think kind of what every single page of the book talks about, it's like you need the data scientist, but you also need the business leader.

[00:41:53] You need the domain expert plus the data expert.

[00:41:58] And the combination of the two is what ends up leading to high value.

[00:42:02] Yeah, it's a good question.

[00:42:04] It's hard to know if there is fields out there where there's definitely fields where data science is easier to apply.

[00:42:10] Like if I think about dance, for example, I don't know very much about dance at all.

[00:42:15] But if I just think about dance, I mean, there's definitely ways you could analyze dancers routines geospatially and stuff like that.

[00:42:22] But that's a lot harder because you're relying either on cameras or sensors.

[00:42:27] So even if data science can help down the road, it's a lot of work to even get there in the first place.

[00:42:34] Right. Like even recording that data is more difficult than say, like web scraping something or even just having the tabular data to begin with.

[00:42:42] Anyways, it's really interesting.

[00:42:44] Yeah. I mean, I don't want to discount that right now.

[00:42:48] If you look at the sports analytics that they're doing, they're analyzing the movements of athletes during sports games.

[00:42:55] You know, basically modeling the movement of their bodies.

[00:42:59] And so the tech has caught up in my view.

[00:43:01] Like there's not too many business problems that we can't solve.

[00:43:05] Again, it all comes back to what is the business problem that you're solving?

[00:43:08] Right. If you go to a dance studio and be like, hey, I have this great new method where I can analyze the movement of your dancers and identify, you know, inefficiencies.

[00:43:16] And so I can coach your dancers and tell them how to make their movements more ergonomic or more efficient.

[00:43:23] But if that's not useful to the dance studio, who cares?

[00:43:25] Right. So it all comes back to what is useful.

[00:43:28] And for some business problems, you don't need data science.

[00:43:31] Right. Let's be real here.

[00:43:32] For some business problems, you just need some thoughtful strategy or you need some user interviews.

[00:43:39] And so as data scientists, we also have to be realistic about understanding and admitting when we're not the right people for the job.

[00:43:46] I like that. Have you seen the AWS football analytics at all?

[00:43:50] I don't know if you've seen any of that.

[00:43:52] I watched a lot of NFL.

[00:43:54] So they have like these AWS next gen stats and they're like a huge sponsor of the NFL.

[00:44:00] And I'm always thinking, man, I can't believe that they make a return on investment on these advertisements, but whatever.

[00:44:05] So they have all these next gen stats and they like pull them out, like maybe like a timeouts or something like that.

[00:44:11] And they'll show you certain things like how fast a player was running or like distances they jumped or something like that.

[00:44:17] And one of the things that they rolled out, I hadn't seen it before that they rolled out this year was predicting whether a player is going to blitz or not.

[00:44:25] Basically, basically surprising the quarterback, running at the quarterback when they're not usually supposed to.

[00:44:30] And they show this after it's happened a lot of the time and be like, yeah, look, he was like really likely to blitz.

[00:44:37] And it's super interesting because I don't know, like the only way I know that these stats are being used is to entertain people like me on television.

[00:44:47] But like, but like if you could like somehow take that model and give it to coaches before the play is run, like that would provide a lot of value at that point.

[00:44:54] Yeah, that's an interesting example.

[00:44:56] And I'm curious, are they attracting more viewers because they're showing these next gen stats?

[00:45:02] Right. And so that's an interesting question that they should think about as a business.

[00:45:06] Right. How much value are they?

[00:45:08] Is the value solely coming from attracting viewers?

[00:45:11] If so, I'm sure they're they're A-B testing that.

[00:45:14] And if not, if like you said, are they showing it to coaches and can they show that, hey, coaches who have access to next gen stats are getting better outcomes than coaches who don't have next gen stats?

[00:45:25] If they can show that, then they've struck a gold mine.

[00:45:28] Yeah. Then I feel like you almost get into like cheating basically, because at that point, it's like, wow, your data science team becomes even more impactful at NFL because it's like not only are the players competing against each other, but the data scientists are competing against each other.

[00:45:42] Who has the best like blitz detection model?

[00:45:45] I think that would be fascinating.

[00:45:47] I might change my career and go to the NFL as a data scientist if that rolls out.

[00:45:51] OK, awesome. I loved hearing all of your insight, all of your ideas, all of your stories.

[00:45:57] Thank you so much for sharing them with us.

[00:46:00] If people want to go, they enjoyed listening to your thoughts.

[00:46:04] If people want to go and discover a little bit more, where should they go?

[00:46:07] Should they check out the book?

[00:46:08] Where can they go to find that?

[00:46:09] Let us know.

[00:46:11] Yeah, if you want to learn more, check out winningwithdatascience.com.

[00:46:15] You can learn more about the book, learn more about the work that my co-author and I do.

[00:46:20] And I hope you, you know, I wish you success in your data science journey, whether you're a business person or a data science person.

[00:46:27] For sure. So guys, go check it out.

[00:46:29] Winning with Data Science.

[00:46:31] I've read a good chunk of it.

[00:46:32] It's great, especially if you like, if you see the importance of merging domain and business with data.

[00:46:39] So Akshay, thank you so much for coming on the podcast and we'll talk to you soon.

[00:46:43] Thanks for having me.

[00:46:49] Do you guys feel like data science gurus now?

[00:46:52] Or at least maybe not gurus, but you understand the concept data science a little bit better.

[00:46:57] You understand how certain businesses would use it.

[00:46:59] You understand some of the key vocabulary words.

[00:47:01] I hope that's the case.

[00:47:03] If it is, would you mind sharing with a friend?

[00:47:05] That's the number one way our podcast grows is by you sharing with your students, your coworkers, your peers.

[00:47:12] You know, whoever you're around that wants to get in data.

[00:47:15] Why not tell them about the podcast?

[00:47:16] Costs you $0 and helps us grow and continue to provide awesome free content like this.

[00:47:22] As always, thank you guys for listening and I'll see you in the next episode.