Back to Browse

Data Science EDA with code (Exploratory Data Analysis)

403 views
Nov 12, 2025
24:28

πŸ”₯ Master Data Science EDA (Exploratory Data Analysis) in Python! Complete hands-on tutorial with real student dataset analysis. πŸ“Š PROJECT: Analyzing 10 Engineering Students Performance Data β€’ Student records with branches, semesters, subject marks, grades β€’ Missing values handling β€’ Statistical analysis β€’ Pattern discovery ⏱️ DETAILED TIMESTAMPS: πŸ“š INTRODUCTION (00:00 - 02:51) 00:00 - Introduction to EDA 00:42 - What is EDA? (Being a detective with data) 01:21 - Why EDA? (Find patterns, spot errors, make decisions) 01:44 - Real-Life Examples (Schools, Netflix, Sports, Weather) 02:15 - Our Student Dataset Overview πŸ’» DATA LOADING AND SETUP (02:51 - 04:58) 02:51 - Starting with Google Colab 02:59 - Import Libraries | import pandas as pd, numpy, matplotlib, seaborn 03:26 - Creating Student Dataset | Dictionary to DataFrame conversion 04:22 - Adding Missing Values | np.nan values 04:46 - Creating DataFrame | pd.DataFrame(data) πŸ‘€ VIEWING DATA (05:06 - 07:50) 05:06 - First 5 Records | df.head() 06:36 - Last 3 Records | df.tail(3) 07:03 - Dataset Dimensions | df.shape (10 rows x 15 columns) 07:34 - Shape Attribute | df.shape[0] for rows, df.shape[1] for columns πŸ” DATA TYPES (07:52 - 08:44) 07:52 - Check Column Types | df.dtypes β€’ Float64 & Object = Text data (names, branches) 🚨 MISSING VALUES (08:46 - 11:18) 08:46 - Count Missing Values | df.isnull().sum() 09:00 - Identify Missing Data (Subject 4: 5 missing, Subject 5: 5 missing) 10:05 - Understanding NaN (Not a Number = Missing value) 10:53 - Missing Value Summary | missing_values.sum() πŸ“Š STATISTICAL ANALYSIS (11:20 - 12:57) 11:20 - Complete Statistics | df.describe() (count, mean, std, min, 25%, 50%, 75%, max) 11:44 - Understanding df.describe() (ONE command for all stats!) 12:14 - Individual Statistics: β€’ Mean | df['average'].mean() β€’ Median | df['average'].median() β€’ Maximum | df['average'].max() β€’ Minimum | df['average'].min() β€’ Std Deviation | df['average'].std() β€’ Variance | df['average'].var() 🎯 BRANCH ANALYSIS (13:00 - 16:12) 13:00 - Average by Branch | df.groupby('branch')['average'].mean() 13:22 - Top 3 Students Overall | df.nlargest(3, 'average') 13:35 - Students Per Branch | df['branch'].value_counts() 14:05 - Count Frequency | .value_counts() function 14:28 - Branch Performance | .groupby().mean().sort_values() 14:54 - Top N Students | df.nlargest(n, 'column') 15:56 - Select Columns | df[['student_id', 'name', 'branch', 'average']] πŸ“ˆ GRADE DISTRIBUTION (16:14 - 17:42) 16:14 - Grade Count | df['grade'].value_counts() 16:28 - Grade Percentage | df['grade'].value_counts(normalize=True) * 100 16:42 - Filter A+ Students | df[df['grade'] == 'A+'] 17:18 - Filtering Technique | df[condition] πŸ†˜ STRUGGLING STUDENTS (17:46 - 18:52) 17:46 - Below 75% Students | df[df['average'] less than 75] 18:18 - Branch Performance | df.groupby('branch')['average'].mean() 18:30 - UG vs PG Analysis | df.groupby('level')['average'].mean() 18:45 - Semester Comparison | df.groupby('semester')['average'].mean() πŸ“‹ EDA FUNCTIONS SUMMARY (18:55 - 22:58) 19:01 - Data Loading | pd.read_csv(), pd.read_excel() 19:33 - Quick Overview | df.head(), df.shape 19:45 - Missing Values | df.isnull().sum() 20:00 - Duplicates | df.duplicated().sum() 20:15 - Data Types | df.dtypes 20:27 - Statistics | df.describe() 20:40 - Outliers | sns.boxplot() 21:03 - Correlation | df.corr(), sns.heatmap() 21:21 - Distribution | plt.hist() 21:36 - Relationships | sns.scatterplot() 21:53 - Categories | df['column'].value_counts() 22:10 - Grouping | df.groupby('column').mean() 22:29 - Trends | Time series analysis 22:39 - Cleaning | df.fillna(), df.drop_duplicates() 22:52 - Export | df.to_excel(), df.to_csv() πŸ“Š VISUALIZATION TYPES (23:04 - 24:13) 23:08 - Histogram | plt.hist() - Data distribution 23:16 - Box Plot | sns.boxplot() - Outlier detection 23:28 - Scatter Plot | sns.scatterplot() - Variable relationships 23:43 - Heat Map | sns.heatmap() - Correlation matrix 23:51 - Pair Plot | sns.pairplot() - Multiple comparisons 23:57 - Bar Chart | plt.bar() - Category comparison 24:06 - Pie Chart | plt.pie() - Percentage distribution 24:13 - Next Lecture: Graphs with Live Examples! πŸ“š ESSENTIAL COMMANDS: βœ… df.head() / df.tail() - View data βœ… df.shape - Dimensions βœ… df.dtypes - Data types βœ… df.isnull().sum() - Missing values βœ… df.describe() - Statistics βœ… df['col'].value_counts() - Frequencies βœ… df[condition] - Filtering βœ… df.groupby('col').mean() - Grouping βœ… df.nlargest(n, 'col') - Top values πŸ’» SOURCE CODE: πŸ“‚ Colab Notebook 1: https://colab.research.google.com/drive/1Fg3iXIQYlxF9kyJ6v-iGuiGMuwBokcsy?usp=sharing πŸ“‚ Colab Notebook 2: https://colab.research.google.com/drive/1rULtr7rCbIryGnf5qTwp9dk6Mndmj2Tm?usp=sharing 🎯 Perfect for: Data Science Beginners | Python Students | Engineering Students | Data Analysts πŸ‘ LIKE if helpful! πŸ”” SUBSCRIBE for more! πŸ’¬ COMMENT your questions! #DataScience #Python #EDA #PandasTutorial #DataAnalysis #MachineLearning #PythonProgramming #LearnPython #DataScience2024

Download

1 formats

Video Formats

360pmp431.8 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Data Science EDA with code (Exploratory Data Analysis) | NatokHD