Data Science EDA with code (Exploratory Data Analysis)
π₯ Master Data Science EDA (Exploratory Data Analysis) in Python! Complete hands-on tutorial with real student dataset analysis. π PROJECT: Analyzing 10 Engineering Students Performance Data β’ Student records with branches, semesters, subject marks, grades β’ Missing values handling β’ Statistical analysis β’ Pattern discovery β±οΈ DETAILED TIMESTAMPS: π INTRODUCTION (00:00 - 02:51) 00:00 - Introduction to EDA 00:42 - What is EDA? (Being a detective with data) 01:21 - Why EDA? (Find patterns, spot errors, make decisions) 01:44 - Real-Life Examples (Schools, Netflix, Sports, Weather) 02:15 - Our Student Dataset Overview π» DATA LOADING AND SETUP (02:51 - 04:58) 02:51 - Starting with Google Colab 02:59 - Import Libraries | import pandas as pd, numpy, matplotlib, seaborn 03:26 - Creating Student Dataset | Dictionary to DataFrame conversion 04:22 - Adding Missing Values | np.nan values 04:46 - Creating DataFrame | pd.DataFrame(data) π VIEWING DATA (05:06 - 07:50) 05:06 - First 5 Records | df.head() 06:36 - Last 3 Records | df.tail(3) 07:03 - Dataset Dimensions | df.shape (10 rows x 15 columns) 07:34 - Shape Attribute | df.shape[0] for rows, df.shape[1] for columns π DATA TYPES (07:52 - 08:44) 07:52 - Check Column Types | df.dtypes β’ Float64 & Object = Text data (names, branches) π¨ MISSING VALUES (08:46 - 11:18) 08:46 - Count Missing Values | df.isnull().sum() 09:00 - Identify Missing Data (Subject 4: 5 missing, Subject 5: 5 missing) 10:05 - Understanding NaN (Not a Number = Missing value) 10:53 - Missing Value Summary | missing_values.sum() π STATISTICAL ANALYSIS (11:20 - 12:57) 11:20 - Complete Statistics | df.describe() (count, mean, std, min, 25%, 50%, 75%, max) 11:44 - Understanding df.describe() (ONE command for all stats!) 12:14 - Individual Statistics: β’ Mean | df['average'].mean() β’ Median | df['average'].median() β’ Maximum | df['average'].max() β’ Minimum | df['average'].min() β’ Std Deviation | df['average'].std() β’ Variance | df['average'].var() π― BRANCH ANALYSIS (13:00 - 16:12) 13:00 - Average by Branch | df.groupby('branch')['average'].mean() 13:22 - Top 3 Students Overall | df.nlargest(3, 'average') 13:35 - Students Per Branch | df['branch'].value_counts() 14:05 - Count Frequency | .value_counts() function 14:28 - Branch Performance | .groupby().mean().sort_values() 14:54 - Top N Students | df.nlargest(n, 'column') 15:56 - Select Columns | df[['student_id', 'name', 'branch', 'average']] π GRADE DISTRIBUTION (16:14 - 17:42) 16:14 - Grade Count | df['grade'].value_counts() 16:28 - Grade Percentage | df['grade'].value_counts(normalize=True) * 100 16:42 - Filter A+ Students | df[df['grade'] == 'A+'] 17:18 - Filtering Technique | df[condition] π STRUGGLING STUDENTS (17:46 - 18:52) 17:46 - Below 75% Students | df[df['average'] less than 75] 18:18 - Branch Performance | df.groupby('branch')['average'].mean() 18:30 - UG vs PG Analysis | df.groupby('level')['average'].mean() 18:45 - Semester Comparison | df.groupby('semester')['average'].mean() π EDA FUNCTIONS SUMMARY (18:55 - 22:58) 19:01 - Data Loading | pd.read_csv(), pd.read_excel() 19:33 - Quick Overview | df.head(), df.shape 19:45 - Missing Values | df.isnull().sum() 20:00 - Duplicates | df.duplicated().sum() 20:15 - Data Types | df.dtypes 20:27 - Statistics | df.describe() 20:40 - Outliers | sns.boxplot() 21:03 - Correlation | df.corr(), sns.heatmap() 21:21 - Distribution | plt.hist() 21:36 - Relationships | sns.scatterplot() 21:53 - Categories | df['column'].value_counts() 22:10 - Grouping | df.groupby('column').mean() 22:29 - Trends | Time series analysis 22:39 - Cleaning | df.fillna(), df.drop_duplicates() 22:52 - Export | df.to_excel(), df.to_csv() π VISUALIZATION TYPES (23:04 - 24:13) 23:08 - Histogram | plt.hist() - Data distribution 23:16 - Box Plot | sns.boxplot() - Outlier detection 23:28 - Scatter Plot | sns.scatterplot() - Variable relationships 23:43 - Heat Map | sns.heatmap() - Correlation matrix 23:51 - Pair Plot | sns.pairplot() - Multiple comparisons 23:57 - Bar Chart | plt.bar() - Category comparison 24:06 - Pie Chart | plt.pie() - Percentage distribution 24:13 - Next Lecture: Graphs with Live Examples! π ESSENTIAL COMMANDS: β df.head() / df.tail() - View data β df.shape - Dimensions β df.dtypes - Data types β df.isnull().sum() - Missing values β df.describe() - Statistics β df['col'].value_counts() - Frequencies β df[condition] - Filtering β df.groupby('col').mean() - Grouping β df.nlargest(n, 'col') - Top values π» SOURCE CODE: π Colab Notebook 1: https://colab.research.google.com/drive/1Fg3iXIQYlxF9kyJ6v-iGuiGMuwBokcsy?usp=sharing π Colab Notebook 2: https://colab.research.google.com/drive/1rULtr7rCbIryGnf5qTwp9dk6Mndmj2Tm?usp=sharing π― Perfect for: Data Science Beginners | Python Students | Engineering Students | Data Analysts π LIKE if helpful! π SUBSCRIBE for more! π¬ COMMENT your questions! #DataScience #Python #EDA #PandasTutorial #DataAnalysis #MachineLearning #PythonProgramming #LearnPython #DataScience2024
Download
1 formatsVideo Formats
Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.