Back to Browse

Train Test Split with Python Machine Learning (Scikit-Learn)

85.5K views
Aug 7, 2023
8:06

🧠 Don’t miss out! Get FREE access to my Skool community — packed with resources, tools, and support to help you with Data, Machine Learning, and AI Automations! 📈 https://www.skool.com/data-and-ai-automations-4579 In this Python Machine Learning Tutorial, we take a look at how you can split a data set through train test split in scikit learn. This is a great method for prepping your data before you run a model. Code: https://ryanandmattdatascience.com/train-test-split/ 🚀 Hire me for Data Work: https://ryanandmattdatascience.com/data-freelancing/ 👨‍💻 Mentorships: https://ryanandmattdatascience.com/mentorship/ 📧 Email: [email protected] 🌐 Website & Blog: https://ryanandmattdatascience.com/ 🖥️ Discord: https://discord.com/invite/F7dxbvHUhg 📚 *Practice SQL & Python Interview Questions: https://stratascratch.com/?via=ryan 📖 *SQL and Python Courses: https://datacamp.pxf.io/XYD7Qg 🍿 WATCH NEXT Scikit-Learn and Machine Learning Playlist: https://www.youtube.com/playlist?list=PLcQVY5V2UY4LNmObS0gqNVyNdVfXnHwu8 Feature Scaling: https://youtu.be/6eJHk8JYK2M Random Forest Classifier: https://youtu.be/_QuGM_FW9eo Ordinal Encoder: https://youtu.be/15uClAVV-rI In this video, I walk you through implementing train test split in Python using sklearn, one of the most essential techniques in machine learning. Train test split allows you to divide your dataset into training and testing portions, typically using an 80-20 split. This ensures your machine learning model can be evaluated on unseen data, which is crucial for validating model performance. We start by importing the necessary libraries including pandas and sklearn's train_test_split function. I demonstrate using a real baseball dataset with 500 players, showing you how to load the data and prepare it for splitting. We cover how to separate features (X) from the target variable (y), and I explain why proper data preparation matters before running any machine learning algorithm. I walk through the exact syntax for train_test_split, including key parameters like test_size and random_state. The random_state parameter is particularly important because it ensures reproducibility - you'll get the same split every time you run the code. I show you how to verify your split worked correctly by checking the shape of your training and testing sets, and I demonstrate using describe() to compare statistics between them. By the end of this tutorial, you'll understand exactly how to implement train test split, why it's essential for machine learning projects, and how to validate that your data has been properly divided for model training and testing. TIMESTAMPS 00:00 Introduction to Train Test Split 00:36 Setting Up Python Environment 01:04 Importing Train Test Split 01:07 Loading the Dataset 01:57 Understanding the Data 02:21 Creating X and Y Variables 03:19 Examining the Data Shape 03:42 Implementing Train Test Split 04:11 Understanding Random State 04:56 Setting Test Size 05:23 Verifying the Data Split 06:23 Exploring Training Data 06:52 Comparing Train vs Test Statistics OTHER SOCIALS: Ryan’s LinkedIn: https://www.linkedin.com/in/ryan-p-nolan/ Matt’s LinkedIn: https://www.linkedin.com/in/matt-payne-ceo/ Twitter/X: https://x.com/RyanMattDS Who is Ryan Ryan is a Data Scientist at a fintech company, where he focuses on fraud prevention in underwriting and risk. Before that, he worked as a Data Analyst at a tax software company. He holds a degree in Electrical Engineering from UCF. Who is Matt Matt is the founder of Width.ai, an AI and Machine Learning agency. Before starting his own company, he was a Machine Learning Engineer at Capital One. *This is an affiliate program. We receive a small portion of the final sale at no extra cost to you.

Download

1 formats

Video Formats

360pmp414.9 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Train Test Split with Python Machine Learning (Scikit-Learn) | NatokHD