164: I built an entire data pipeline in 30 minutes using only AI (no code required)

Try Keboola 👉 https://www.keboola.com/mcp?utm_campaign=FY25_Q2_RoW_Marketing_Events_Webinar_Keboola_MCP_Server_Launch_June&utm_source=Youtube&utm_medium=Avery

Today, we'll create an entire data pipeline from scratch without writing a single line of code! Using Keboola MCP server and ClaudeAI, we’ll extract data from my FindADataJob.com RSS feed, transform it, load it into Google BigQuery, and visualize it with Streamlit. This is the future of data engineering!

Keboola MCP Integration:

https://mcp.connection.us-east4.gcp.keboola.com/sse

I Analyzed Data Analyst Jobs to Find Out What Skills You ACTUALLY Need

https://www.youtube.com/watch?v=lo3VU1srV1E&t=212s

💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter

🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training

👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa

👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator

⌚ TIMESTAMPS

00:00 - Introduction

00:54 - Definition of Basic Data Engineering Terms

02:26 - Keboola MCP and Its Capabilities

07:48 - Extracting Data from RSS Feed

12:43 - Transforming and Cleaning the Data

19:19 - Aggregating and Analyzing Data

23:19 - Scheduling and Automating the Pipeline

25:04 - Visualizing Data with Streamlit

🔗 CONNECT WITH AVERY

🎥 YouTube Channel: https://www.youtube.com/@averysmith

🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/

📸 Instagram: https://instagram.com/datacareerjumpstart

🎵 TikTok: https://www.tiktok.com/@verydata

💻 Website: https://www.datacareerjumpstart.com/

Mentioned in this episode:

July Cohort - Last Chance for Lifetime Access

Last chance to get lifetime access to my data analytics bootcamp! Get a discount + bonus on top of lifetime access! Starts Monday, July 13th!

https://datacareerjumpstart.com/daa

💙 Thank you for subscribing & leaving a review

Thanks to YOU we recently became the most popular data podcast out there. Thank you so much for listening. If you haven't subscribed yet, hit subscribe. It costs $0. If you enjoy the show, leave us a 5-star rating on Spotify or Apple. It helps people like you find the show AND it helps us land better guests. Thank you so much for your support.

[00:00:00] Avery Smith: We are going to walk through how to build an entire data pipeline from absolute scratch without writing a single line of code. We'll be taking data from an RSS feed, extracting the content from it, transforming the data in multiple ways, and load it into a Google Big Query database. And then display the data via Streamlet all without writing one line of code.

[00:00:20] Avery Smith: How? Well, it's all thanks to Kala's brand new MCP server who happens to be the sponsor of this episode. So thanks so much for supporting my channel. Kula AI is officially here, so it's time that we start using it to become data professionals instead of just being scared. And the cool thing is that AI and the Kola MCP server allow for data pipelines to be built using natural language instead of code.

[00:00:41] Avery Smith: The crazy thing is this allows both technical and even non-technical users to work with complex data infrastructure, just through simple conversations like how you and I would talk and I'm about to show you exactly, but I want to quickly give some basic definitions in case you're new to data engineering and data [00:01:00] pipelines in general, because a lot of these terms, they might sound scary at first.

[00:01:03] Avery Smith: Trust me, they're not as bad as they seem. So if you're already a data engineering wizard, feel free to skip this part. But if you're kind of a data engineering new like me, this is gonna be useful. Let's go ahead and start with data pipeline. A data pipeline is like a conveyor belt for your data. It takes information from one place, it processes it and sends it to another place.

[00:01:23] Avery Smith: So you can use it for things like reports or dashboards down the road. The next key term that we need to know is E-T-L-E-T-L stands for extract, transform, and Load. And it's a specific type of data pipeline, but it's just a fancy way of saying grab data from somewhere, extract clean it up, or reformat it.

[00:01:42] Avery Smith: Transform and then move it to where you need it to actually go Looked. Scheduling is another key term in data pipelines. Scheduling just basically means setting up when your data tasks should be run automatically. For example, like every day at 2:00 AM or once an hour, it's basically the running [00:02:00] of these data tasks automatically requiring no human input at all.

[00:02:04] Avery Smith: So where do you even build data pipelines in ETL? Well, it's easy as done in a data operations platform or a data ops platform for short. A data operations platform is a tool that helps you manage all of your data tasks. It's like a control center where you can build, run, and monitor these. Data pipelines so everything works smoothly.

[00:02:23] Avery Smith: Start from finish 'cause it's a lot of work. Setting these data engineering pipelines up. Keah is one data ops platform that helps you build data pipelines without needing to write a ton of code. Lets you connect different data sources and process the data and get it ready for analysis all in one simple place.

[00:02:39] Avery Smith: And Keah recently released. This super cool thing called the Kula MCP. Now, MCP is a fancy acronym that stands for model context Protocol. Let's go ahead and break that down. Model is referring to an AI model like chat, GPT-4 oh, or Claude Opus four. The context part is referencing, well, what context do those models have, and [00:03:00] specifically how you'll be using MCP to expand those models contexts.

[00:03:03] Avery Smith: Protocol is how two systems talk together. So this is just basically a way to give an AI model more context, specifically your context, your data, your actions, et cetera. Think of MCP, like A-U-S-B-C port for AI applications. Just as USBC and your computer or your laptop provides a standardized way to connect to external devices like a mouse, external storage, a webcam, et cetera.

[00:03:28] Avery Smith: MCP provides an easy way to connect AI models to different data sets, different data sources. Different data tools. And then the last two things you're gonna want to know is BigQuery. BigQuery is basically Google's cloud-based data warehouse. It's basically Google's database in the cloud. It's where you can store and analyze really large amounts of data super fast using sql.

[00:03:48] Avery Smith: And that's what we'll be using to store our data for our data pipeline later on in this episode. And then there is streamlet. Streamlet is an open source Python library owned by Snowflake. That makes it super easy to build interactive web [00:04:00] dashboards and web apps for your data projects. You write a few lines of Python code and boom, you've got a published dashboard that you can literally send the URL to anyone in the world and they can access it.

[00:04:09] Avery Smith: And that's how we'll do the data visualization portion of our data pipeline project. Okay. I think that's all the definitions you need. So now if you've been consuming my episodes in the past. First off, thank you so much. I'm happy to have you back. And hey, please subscribe if you haven't already. Second, you'll know that I'm all about practical examples.

[00:04:26] Avery Smith: To learn data personally, I need hands-on examples to actually learn anything. So I wanna walk you through a concrete use case of actually building a data pipeline from scratch. Using only normal language like we would talk about instead of writing hundreds if not thousands of lines of code. And honestly, this is how I see the future of data engineering looking like.

[00:04:46] Avery Smith: In a previous episode I talked about how I have my own data job board find data job.com, and I analyzed about 3000 different data, job listings to find out what data skills are actually in demand. So I'll have a link to that episode and the show notes down below so you [00:05:00] can consume it after this one.

[00:05:01] Avery Smith: But in that video, I kind of exposed myself. For using Google Sheets as a database, and I'll go ahead and expose myself a little bit more. Not only do I use Google Sheets as my database, I actually do all of my transformations in there as well. Not only is it extremely janky, but it's becoming extremely slow and less and less reliable.

[00:05:20] Avery Smith: So today we'll be building a real data pipeline for my data job board. Find a data job.com and that's gonna be awesome 'cause it's going to replace my existing lousy, kind of non-existent data pipeline. The goal is to have a clean database of thousands of data analysts, job listings with enhanced data, like what skills the job requires, what experience level, where are these jobs located, et cetera.

[00:05:40] Avery Smith: And then we'll use that clean database to create reports, charts, graphs, videos, insightful information, all about the data job market as a whole. Be on the lookout for more content like that, but this is just my use case, literally. This is just something where I could really use a tool like Kola, but you could easily follow this exact same process [00:06:00] for your use case for all sorts of different use cases.

[00:06:02] Avery Smith: For example, like retrieving financial data from something like super based, doing some transformations like currency exchanges. And then uploading all that data into BigQuery every day. Or another example, using an API to fetch current Pokemon card pricing data, cleaning that data set up, and then creating a historic database for a time series analysis.

[00:06:22] Avery Smith: By the way, I'm a bit of a Pokemon nerd, and I'd really like to analyze that Pokemon card price data set. So if you're a fellow Pokemon Nerd two, and you wanna see me do that analysis, leave a comment down below. Okay. But back to today's use case, my data job board has an RSS feed where it stores the latest 1000 data jobs.

[00:06:39] Avery Smith: RSS stands for really simple syndication, and basically it's a web feed format used to distribute and syndicate content from a website. Here's what the RSS feed looks like. Raw, super ugly, and hard to analyze. And by the end of this episode, we'll have a beautiful dashboard that looks something like this.

[00:06:55] Avery Smith: So let's see how we go from really messy to this clean dashboard. First, [00:07:00] you'll need a Kula account. You can create one for free by going to kula.com/cp. After that, you'll need a Claude Pro account or some sort of other CP client like Chat, GPT, cursor or VS. Code. And this walkthrough, we'll be using Claude Desktop, which is right here.

[00:07:18] Avery Smith: And what you'll do is you'll click the search and tools icon right here. Go to the bottom, hit Add Integrations. You'll go to the very bottom and hit add integration here. This is where you'll type in integration name like Kula, and then type in this long link right here, which we'll have in the show notes down below.

[00:07:37] Avery Smith: Hit at. If that works, you should now see Kula in your integration tab right here, like me. With that, you've now hooked up CLO to your Kula account basically, and then you're ready to get going. Our first step of our data pipeline is to figure out how to get the data from our RSS feed. RSS feeds our really industry standard to store data and to share data.

[00:07:59] Avery Smith: So [00:08:00] honestly, with even my 10 years of experience in data field, I have no clue. How to get the data, how to extract the data from the RSS feed, like seriously. I have no clue how to access RSS data at all. I've never done it in all 10 years of my experience. Luckily, that's where Kula, MCP, and Claude come into play.

[00:08:17] Avery Smith: I don't have to know how to access the data. I just need to know how to ask AI to do it for me. This is called Vibe Coding. Vibe Coding is a relatively new approach to writing code where you use AI tools to generate code based on just normal natural language descriptions, rather than coding out everything manually.

[00:08:36] Avery Smith: This allows you to build stuff much, much faster, especially if you don't know exactly how to do it. So let's go ahead and vibe code. So we're gonna go ahead and ask Claude here to do something like this, create a custom Python component in Kula that only extracts the raw data from this [00:09:00] exact RSSV. And I'll go ahead and paste in my RSS feed, but for privacy reasons, it will be blurred.

[00:09:07] Avery Smith: Please use the ID in the RSS feed as a primary key. Please don't do anything else. And I'm doing that because Claude has a mind of its own to actually go in and do all sorts of things for us. Let's go ahead and press enter and we'll see Claude working. Actually doing this vibe code for us. So you can kind of watch this as it builds and we'll see specifically what it's going to do.

[00:09:32] Avery Smith: It's saying it's going to help us. We'll hit always allow, it's getting details about custom Python components in Kula. It's looking the RSS feed to understand its structure. It found our data, our RSS data right here. It's gonna do a web search to get information about the feed structure. Now it's going to create a Python component for configuration that will extract the RSS feed data.

[00:09:54] Avery Smith: And based on the results I can see, the RSS feed typically contains the job listings with IDs that can be used as primary keys. [00:10:00] Let me create the Python code to parse the RSS feed, and you can see that it is creating a component root configuration inside of Kola. This is actually creating stuff. In my cable data operations platform that we can run that's going to actually do the task that I gave it per, it created the custom python bone in Kula that, uh, extracts the raw data from my job feed.

[00:10:20] Avery Smith: Here's the component id, configuration Id here is the. Feed extraction right here, the primary key, and then it created this output table. So it's already, this is where the component extracts the id, the job title, the description, the link, the publication date, the unique identifier extracted at, and we can go ahead and now run this component.

[00:10:41] Avery Smith: It says down here that the component is now ready to use. You can run it to extract the current job listings from the RSS feed, and the data will be stored in this table. Go ahead and look in Kula. Here is my dashboard right here. Then I'll go to the components right here and I can see that in my custom Python.

[00:10:59] Avery Smith: It went ahead and [00:11:00] created this three minutes ago. RSS Feed Extractor for Find a data job.com that was created three minutes ago. So perfect. It did exactly what it said it was going to do inside of Kula. Next, I'll go ahead and tell Claude to please run that component and we're gonna go ahead and see if that component is actually working.

[00:11:16] Avery Smith: So it created the code to run it. Now we're going to see if it actually can run it and if we're going to hit any errors. The job is now in queue and we'll begin processing shortly. Let me check on the status to see how it's going. Okay, perfect. So the job just finished. It said excellent. The job completed successfully.

[00:11:31] Avery Smith: It lasted 32 seconds. It went ahead and, uh, found all this information for us and stored it in that table right here. So we can go ahead and see that table in Kula. So once again from Kula, if we go to components and then we go to our latest right here, we can click on it. We can go ahead and see that yes, it was a successful job right here, and we can see that it actually sent the table to this destination, which is actually a Google BigQuery table.

[00:11:59] Avery Smith: And [00:12:00] if I go to data sample right here, it will show me all of the data. So yes, we have an ID column, we have a description, we have extracted app, we have the link. So for example, I can click this link right here and go to my clipboard. Hit enter, and this is one of the data jobs on my job board right here.

[00:12:17] Avery Smith: So yes, it is working. Everything's looking great. We can go back to overview and we see that the row count is a thousand rows, and that's because my RSS feed saves the last 1000 job postings from my job board. So, so far so good. That was the E or the extraction part of our data pipeline. And we successfully extracted the data from the RSS feed into my Google BigQuery database right here.

[00:12:43] Avery Smith: So the first step is done after E of course, comes T, and that's the transformation part of the data pipeline. Now transformations are any sort of changes that we make to our raw data. If we look at our data right here, we can actually see it is quite messy. In fact, it doesn't [00:13:00] really have any super clean columns.

[00:13:01] Avery Smith: We have this description, which is literally going to be the job description that's gonna be quite long. So it's basically a copy pasted version of this. Now, the RSS does store this in HTML. That's why you see these, all these weird HTML tags. That's not too big of a deal. We have the extracted at date.

[00:13:17] Avery Smith: This is when we just ran it, just barely. Uh, we have the link to the actual job description. We have the dates that it was published. That's a big one. That one might come in handy, but the most important column is probably what's called the title column that you'll see right here. Now, you'll notice that this column has three dashes in it.

[00:13:35] Avery Smith: You'll see that the first one in this case is senior data analyst, then it's a dash. Then financial crime slash aml. And then there's a a dash here, K two integrity, and then there's that extra dash right there. New York City, New York. New York, United States. So the trick here, and it's, it's a little bit tricky, so I'll have to tell this, you know, kind of what's going on.

[00:13:54] Avery Smith: I think most of these only have two dashes, so three different parts. The first one is going to be the [00:14:00] title, and this particular case, it's Business Intelligence Engineer, comma Community Engagement. The next section is the actual company that that job is at, in this case, Amazon. And the third one is the location.

[00:14:10] Avery Smith: Sometimes that location can be just remote. That's the location. So it does look like we need to separate these into three separate columns. So for example, once again, data Analyst, common Growth, that is the job title. The company would be Cook Unity and the location would be New York, United States. And so once again, we don't really have to do anything other than just really tell Claude what we want to do for this cleaning transformation where we know that we're going to take that particular column, the title column.

[00:14:38] Avery Smith: And generate three new columns from it, and literally, we don't have to code it all. I just have to tell Claude what I want. In this particular case, I'm going to say, please take the title column and split it into three new columns. One for the actual job title, one for the [00:15:00] company, and one for the location, the title column.

[00:15:05] Avery Smith: Usually has all three separated by a simple dash, but there can be extra dashes too. So be careful, and I'm going to ignore that I misspelled. All of these words here, press enter and let's see what Claude and Kula come up with. It's saying I'll create a SQL transformation to split the title column into the job title, the company and the location.

[00:15:33] Avery Smith: Let's examine the current data structure and then create the transformations. Great. Lemme check the SQL dialect to make sure I understand what's going on. Perfect. I see the pattern now. The titles are formatted as, so job title, company, location. Some titles have quotes around them and the location can be quite detailed.

[00:15:47] Avery Smith: City, state, country. Let me create a SQL transformation to properly split these. Now I needed to add the proper input and outputting mappings to this transformation. So it's actually writing SQL code right now that's going to generate [00:16:00] this transformation for me. Okay, so Claude just finished and it says, SQL Transformation created successfully.

[00:16:05] Avery Smith: It kind of is, explains what the transformation does, which is exactly what we said. It takes the title and separates it into those three different columns. It has some complex cases. It found like we saw the multiple dashes. There are some quoted titles. Well, it looks like it removed the quotation, which I didn't even ask it to do.

[00:16:19] Avery Smith: Varying formats, which we kind of talked about that earlier. Handle those titles with one or two, three seconds. Okay, that's the multiple dashes, preserves, original and keeps the original column as well. And now we have a new. Output data inside our big query, which is going to be the parsed. And it gives us a little example right here.

[00:16:36] Avery Smith: What we can do is actually go over into Kula and go to our dashboard right here, go to transformations, and we can see the transformation was indeed created, which is awesome. Uh, we can see that it has the input, which is going to be the jobs that we found previously. Then the destination table is going to be this new table here, so we can go check that storage over here.

[00:16:57] Avery Smith: Okay. So inside of Kula we can see in the transformation [00:17:00] pans that it created the split job title into components. But you'll notice that there is this red exclamation mark right here, which basically means our job failed. So I'm actually going to ask Claude and say, how did. Are transformation run, and I'm gonna let it tell me that it failed here.

[00:17:16] Avery Smith: Look, it already found the air. It says it failed with the syntax air, the issues with the SQL query. There's a missing create table statement and the query format is incorrect for big query transformation. Let me go ahead and fix this. So this is pretty impressive that it realized what the air is and it's gonna go try to fix it.

[00:17:31] Avery Smith: For me. Once again, this is probably expected, especially when we're vibe coding. We literally have not typed one thing of code, not gonna be a hundred percent perfect, but the cool thing is with a couple iterations, it'll eventually get there. After a little back and forth with Claude, it just told me that it got it to work.

[00:17:46] Avery Smith: So what happened was it was really struggling to, to actually do the transformation. And I think one of the reasons why is Claude, this is such a pro about Claude, is it wants to do more than you ask it to. [00:18:00] Finally after I redid my prompt here and I said, please keep it, as I said, the exact same thing, but at the end I said, please keep it as simple as possible.

[00:18:07] Avery Smith: We can always improve it later. Uh, it then rewrote the entire sequel script right here. You can kind of see what the sequel script looks like, but eventually it. Found an air, hit an air and it fixed itself. It fixed the air and checked on the final version, and yes, we finally got it to work. That's what it's saying.

[00:18:25] Avery Smith: So it says that we created the job title, the company, and the location. Let's go ahead and see if that did work. I am seeing the green check mark here on the transformation parts of Kola, which does mean that it ran successfully. We can go down here to the table Output mapping. Click right here. This is the job parse data table.

[00:18:43] Avery Smith: Now we can go to data sample and okay, we are seeing the columns here of the original title, the job title, the company, and the location. So we're seeing Walmart. We are seeing acquisitions, we're seeing Grow. Analytics engineer, analytics engineer, analytics engineer, BI analyst, digital [00:19:00] loop, fractal, Toronto, Canada.

[00:19:03] Avery Smith: It does look like sometimes we have the country remote and stuff like that, but that's at least doable for right now. It looks like our transformation worked so far. It did take a little back and forth with Claude to get this to work, but once again, I did not write one line of code to do this. Claude figured it out on its own with the help of Keah.

[00:19:18] Avery Smith: It actually ran everything for me. I. Now that we have that data cleaned up, we would like to do a couple more transformations, some more tees, if you will, to make our data more useful. And what these transformations are going to be are aggregations. And what I mean by aggregations is just an opportunity to basically do the equivalent of a pivot table in Excel or count group buy and sql.

[00:19:39] Avery Smith: Basically. I wanna know how many of our jobs have unique job titles. Like what's the most common job title? So like for instance, I'm seeing three analytics engineers, so we know at least. There are three analytics engineer positions. I wanna see what companies have posted the most jobs. I didn't see any repeats in this little example right here.

[00:19:55] Avery Smith: And then I wanna see what locations are hiring the most. So I'm gonna simply go [00:20:00] into to Claude here, and I'm gonna say create a new transformation that aggregates the job postings by unique job titles. I'd like to see. How many data analyst jobs there are, for example. Now I'm going to remind Claude to please keep it simple to get started.

[00:20:22] Avery Smith: So let's go ahead and see how Claude and Kula do together. Okay, it just finished and I'm laughing because it did not a hundred percent. Keep it simple. So it says the job aggregation results, it did run. It took about 40 seconds to run and it created this new table. So let's go ahead and check it out.

[00:20:38] Avery Smith: We're gonna go out of this data and into our transformations. We have the job title aggregation. Now, if we go to the, where the data is stored and then we go to data sample right here, we'll be able to see the different titles and the job count that is there. Now, it's really just ones right now, and probably the reason is that I.

[00:20:59] Avery Smith: A lot of [00:21:00] these job titles are going to be unique and I actually, Claude realizes that. So, um, it actually said that, you know, I finished the job and here's the SQL query. It basically is selecting the job title, the count of the job title, the count of the distinct companies, and the job title's, not knoll.

[00:21:16] Avery Smith: You're grouping by the job title and then your order it, buying the job count. It did it in descending. So this is now where we can, uh, see the results right here and uh, it can actually, Claude did not keep it simple and it told me some fun things. Like the most popular job titles financial analyst was 53 jobs.

[00:21:32] Avery Smith: Senior financial analyst was next at 48. Data analyst was 45, business analyst was 29, senior business analyst was 18. And then it actually does look at data analyst breakdowns and realize, you know, oh, there's exact matches. And then there's other related jobs. Regardless, financial and data roles. Dominate data analyst is the third most popular, four different companies.

[00:21:52] Avery Smith: Good variety. So it didn't a hundred percent keep it simple. But we do now have this, uh, database right now of the aggregated [00:22:00] data. We're gonna create one more aggregated one as well. So we'll tell Claude. Awesome. Thanks. Always good to thank our AI agents, right? Thanks. Please now create another transformation with an aggregated table for the companies.

[00:22:18] Avery Smith: I want to know what companies are hiring the most. Press enter here and it's working away. I'm gonna sit back, relax, and watch it do its thing. Okay. It says that it just finished the aggregation results. This is the SQL Curie it wrote right here. Once again, very similar to the other one. Nothing super crazy, but I didn't have to write it, so I'm not gonna complain.

[00:22:37] Avery Smith: And the top hiring companies, it says Amazon was one CACI. Not sure what that is. And wow, that is a lot lower than the other one. City of New York. Six. Chewy six. Awesome. So Amazon is the, uh, biggest one. Let's go ahead and go check out. Data table here, so I can just go to storage right here and probably find it.

[00:22:56] Avery Smith: We can go to the company counts, that's probably the right table. Go ahead and go to [00:23:00] data sample and take a look at, these are the company names and the job count here. So that transformation worked as well. Now, I could obviously, probably do more transformations, but I think you get a clear picture of the power that this can do, and we've already transformed and we've already loaded all of our transformations into our storage inside of Google.

[00:23:18] Avery Smith: Big query. Now a key part of all data pipelines is what's called orchestration or scheduling. And the idea is, you know, I post new jobs to find a data job literally every single day, multiple times throughout the day. So for instance, this job was just added three hours ago. So what we need to do is tell Kula, Hey, I want you to go check for new jobs on a scheduled basis.

[00:23:38] Avery Smith: And in Kula, these are called flows. They're basically repeated tasks that Kula will run on a regular basis. So we want to run that extract and those transformations on a schedule, maybe multiple times through a day or something like that. So what I'll do is I'll go back to Claude and I'll say create a Kula flow that runs the extractor for the RSS [00:24:00] jobs, and then does the cleaning transformation as well as the aggregation transformations.

[00:24:10] Avery Smith: Every single hour, we'll press enter and we'll see how it does. Okay. The MCP has told me that it is now finished, so it has created the hourly job data pipeline flow. Basically it does phase one, which is the extraction phase two, the data cleaning and parsing, and phase three, the aggregation and analysis.

[00:24:28] Avery Smith: And, uh, we can go ahead and, uh, make sure that this is running every hour in the Kula console. So what we're gonna do is we're gonna go over here, we're gonna refresh the flows page, and we'll see that there's the hourly data pipeline right here. We can open this up and we can go to schedules right here and just make sure that we hit create schedule.

[00:24:45] Avery Smith: That's going to be every hour at 45. And so this is when it's going to run. We'll click set up schedule, and now that is going to run automatically, which is awesome. Here's what our flow looks like. Once again, the extractor, the cleaning, and then the [00:25:00] aggregation, and that is running every hour. Add the 45 minutes past the hour.

[00:25:04] Avery Smith: Awesome. Now that we got that scheduling automated, the next step in the data pipeline process is actually to visualize all these different insights and tables that we've created. And once again, Kula has made this really simple for us. So all we have to tell Kula through Claude is, can you take all the data we created?

[00:25:23] Avery Smith: And build a simple Streamlet dashboard. Please keep it simple for now, and what this is going to do is it's going to create code, Python code that we will be able to use to create a Streamlet app. You can see that it's coding over here on the right hand side inside of clo, and all we have to do is copy and paste this into Kula right now, and it will deploy our Streamlet app on the cloud automatically for us, which is super fun and easy.

[00:25:53] Avery Smith: It just finished writing the code over here. It wrote a ton of code. We'll see how simple it actually kept it, and it basically explained, [00:26:00] uh, what's going on over here on the left hand side. Let's go ahead and just press copy over here and head over back to our dashboard inside of Kola. What you wanna go to next is components and then data apps.

[00:26:12] Avery Smith: Create a new data app right here and we'll call it Job Board Dashboard. Here and we will hit create. All we need to do is change the deployment from a GI repository to code and we'll go ahead and paste in our code right here. We'll press save. We'll hit close. These are the Python packages that it's going to use.

[00:26:31] Avery Smith: I'm pretty sure the only thing that we need to do is tell it to use something called Kula Streamlet, and this is basically just a custom package that helps Streamlet interact better with Ola. We'll go ahead and press the Deploy data app right here. Hit Deploy data app. Now, this is going to try to spin up the Streamlet instance of this dashboard and make us have a usable dashboard based off all the analysis that we did.

[00:26:55] Avery Smith: Once again without writing a single line of code. After about 35 seconds, the job [00:27:00] was deemed as a success and I can go ahead and press the open data app button right here. This is the moment of truth. I'm gonna press open Data app, you guys, open data app. Let's see what happens.

[00:27:14] Avery Smith: Oh my gosh. It is beautiful. Look at this as the total jobs, as the unique companies, the job categories, the remote jobs as Amazon is that top hiring company the most in demand positions I. Even once again, Claude goes the extra mile for us and creates this pie chart of distribution of locations. So other and remote are the biggest two.

[00:27:37] Avery Smith: It almost makes about 50%. The rest of the 50% are New York, California. Texas, Virginia, Florida, and Illinois. It gives us some key insights, some hiring patterns. It even generates the data down here. This is freaking sweet. We have like a full functioning dashboard in however long that took to actually build, but it was a lot faster than what if I were to build this [00:28:00] on my own.

[00:28:00] Avery Smith: The fact that I literally could just do this with Claude and then Kula in the background, you know, pulling the strings and creating this entire data pipeline, setting up the data warehouses for me in Google, BigQuery is absolutely amazing. The crazy thing is this is only a fraction of what Kula can currently do with AI and what it will be able to do in the future.

[00:28:20] Avery Smith: They can do all sorts of data engineering tasks like automated data documentation, smart validation, column usage intelligence, and conversational data assistance. This is what AI data engineering looks like and feels like. I can totally see data engineers building bespoke data pipelines using tools like this.

[00:28:36] Avery Smith: And in the near future, I can honestly see how this type of tool would democratize data access for even non-technical stakeholders and allow them to do self-service analytics. By querying the data conversationally without even knowing very much about sql, like literally I didn't write a line of code today.

[00:28:53] Avery Smith: This is the future of data engineering. So if you wanna try this out, go ahead and go to [00:29:00] kula.com/cp and you can get started for free. I hope this video helped you understand data pipelines, etl. CPS and data op platforms. If something was unclear, please let me know in the comments. Also, while you're there, let me know what project you want to see me do next.

[00:29:15] Avery Smith: And finally, if you made it through all of this so far, please be sure to hit subscribe because this video took an incredible amount of effort. Plus you don't wanna mix the next one. Thank you so much for listening.