The video demonstrates an exploratory data analysis workflow using pandas and DuckDB on a Kaggle salaries CSV dataset. It sets up a Python environment and Jupyter notebook, loads the data with pandas, inspects rows, columns, shape, and info to confirm there are no nulls, and reviews key fields like work year, company location, experience level, salary currency, and job title using value counts. Then we do some data cleaning using duckdb and pandas. After exporting a cleaned CSV, we create matplotlib/pandas plots
Github repo
https://github.com/kokchun/youtube_demos/tree/main/eda_pandas_duckdb
#duckdb #pandas
00:00 Intro to EDA Setup
00:43 Project Environment Setup
01:53 EDA Mindset and Goals
02:45 Load Data and Inspect
05:56 Quick Profiling with Counts
06:51 Spotting Cleaning Needs
10:22 DuckDB for Title Analysis
13:35 Clean Titles and Levels
17:54 Export Cleaned CSV
18:17 Visualize Top Job Roles
20:09 Yearly Trends Plotting
22:24 Wrap Up and Next Steps