ETL Process Movies Data, using Python, Pandas, and PostgreSQL

September 3, 2022

Project Goal

Create an ETL pipeline from raw data to a SQL database. Extract data from several sources using Python. Clean and transform data using Pandas. Load data with PostgreSQL and verify in PgAdmin.

Step 1: Write an ETL Function to Read Three Data Files

The Wiki_movies_df DataFrame:

The Kaggle_metadata DataFrame:

The Ratings DataFrame:

File:

ETL_function_test

Step 2: Extract and Transform the Wikipedia Data

The cleaned Wikipedia data is converted to a Pandas DataFrame, and the DataFrame is displayed in the ETL_clean_wiki_movies file.

Step 3: Extract and Transform the Kaggle data

DataFrames:

File:

ETL_clean_kaggle_data

Step 4: Create the Movie Database

File:

ETL_create_database

The elapsed time to add the data to the database:

The movies table in the SQL database:

The rating table in the SQL database: