ETL Process Movies Data, using Python, Pandas, and PostgreSQL
Project Goal
Create an ETL pipeline from raw data to a SQL database. Extract data from several sources using Python. Clean and transform data using Pandas. Load data with PostgreSQL and verify in PgAdmin.
Step 1: Write an ETL Function to Read Three Data Files
The Wiki_movies_df DataFrame:
The Kaggle_metadata DataFrame:
The Ratings DataFrame:
File:
Step 2: Extract and Transform the Wikipedia Data
The cleaned Wikipedia data is converted to a Pandas DataFrame, and the DataFrame is displayed in the ETL_clean_wiki_movies file.
Step 3: Extract and Transform the Kaggle data
DataFrames:
File:
Step 4: Create the Movie Database
File:
The elapsed time to add the data to the database:
The movies table in the SQL database:
The rating table in the SQL database: