Back to Browse

Zillow Data Analytics (RapidAPI) | End-To-End Python ETL Pipeline | Data Engineering Project |Part 2

9.0K views
Sep 6, 2023
1:23:28

This is the part 2 of this Zillow data analytics end-to-end data engineering project. In this data engineering project, we will learn how to build and automate a python ETL process that would extract real estate properties data from Zillow Rapid API, loads it unto amazon s3 bucket which then triggers a series of lambda functions which then ultimately transforms the data, converts into a csv file format and load the data into another S3 bucket using Apache Airflow. Apache airflow will utilize an S3KeySensor operator to monitor if the transformed data has been uploaded into the aws S3 bucket before attempting to load the data into an amazon redshift. After the data is loaded into aws redshift, then we will connect amazon quicksight to the redshift cluster to then visualize the Zillow (rapid api data) data. Apache Airflow is an open-source platform used for orchestrating and scheduling workflows of tasks and data pipelines. This project will entirely be carried out on AWS cloud platform. In this video I will show you how to install Apache airflow from scratch and schedule your ETL pipeline. I will also show you how to use sensor in your ETL pipeline. In addition, I will show you how to setup aws lambda function from scratch, set up aws redshift and aws quicksight. As this is a hands-on project, I highly encourage you to first watch the video in its entirety without typing along so that you can better understand the concepts and the workflows after which you should either try to replicate the example I showed without watching the video but consult the video when you are stuck or you could watch the video again the second time in its entirety while also typing along this time. Remember the best way to learn is by doing it yourself – Get your hands dirty! If you have any questions or comments, please leave them in the comment section below. Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos. **Books I recommend** 1. Grit: The Power of Passion and Perseverance https://amzn.to/3EZKSgb 2. Think and Grow Rich!: The Original Version, Restored and Revised: https://amzn.to/3Q2K68s 3. The Book on Rental Property Investing: How to Create Wealth With Intelligent Buy and Hold Real Estate Investing: https://amzn.to/3LLpXRy 4. How to Invest in Real Estate: The Ultimate Beginner's Guide to Getting Started: https://amzn.to/48RbuOb 5. Introducing Python: Modern Computing in Simple Packages https://amzn.to/3Q4driR 6. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition: https://amzn.to/3rGF73G ***************** Commands used in this video ***************** sudo apt update sudo apt install python3-pip sudo apt install python3.10-venv python3 -m venv endtoendyoutube_venv source endtoendyoutube_venv/bin/activate pip install --upgrade awscli sudo pip install apache-airflow airflow standalone pip install apache-airflow-providers-amazon ***************** USEFUL LINKS ***************** How to remotely SSH (connect) Visual Studio Code to AWS EC2: https://www.youtube.com/watch?v=sQQjMnEkGjs&t=1224s Extract current weather data from Open Weather Map API using python on AWS EC2: https://www.youtube.com/watch?v=0_caTDCZnd0&t=13s How to send out email alert ON RETRY and ON FAILURE in Apache airflow | Airflow Tutorial https://www.youtube.com/watch?v=Its_66azEy0 Monitor workflow with slack alert upon DAG failure | Airflow Tutorial https://www.youtube.com/watch?v=jVqnKge0AJQ How to build and automate a python ETL pipeline and slack alert with airflow | Airflow Tutorial https://www.youtube.com/watch?v=ocFzNmgYW9o PostgreSQL Playlist: https://www.youtube.com/watch?v=oFaLUCWRnRE&list=PLACD_PaYcVF09khO58CISr08Uy6w3cAIF Rapid API: https://rapidapi.com/hub AWS Lambda function - Create your first Lambda Function | Lambda Function Tutorial for beginners https://www.youtube.com/watch?v=SYKIrEO5Zvg Github Repo: https://github.com/YemiOla/data_engineering_project_zillowrapidapi__dataanalytics https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/sensors/s3/index.html#module-airflow.providers.amazon.aws.sensors.s3 https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/operators/python/index.html#airflow.operators.python.PythonOperator https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/transfer/s3_to_redshift.html https://airflow.apache.org/docs/apache-airflow-providers-amazon/1.0.0/operators/s3_to_redshift.html Part 1: https://www.youtube.com/watch?v=j_skupZ3zw0 Part 3: https://www.youtube.com/watch?v=Hfu3E0zLYDQ DISCLAIMER: This video and description has affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you. #dataengineering #airflow

Download

0 formats

No download links available.

Zillow Data Analytics (RapidAPI) | End-To-End Python ETL Pipeline | Data Engineering Project |Part 2 | NatokHD