Data Preprocessing in Machine Learning using Python - SimpleImputer, OneHotEncoder, train_test_split
GitHub: https://github.com/bibhutighimire/Data-Preprocessing-in-Machine-Learning-using-Python Google Collab: https://colab.research.google.com/drive/1e1sKsVO-26kicMXbdq6aoe_shlgsxWhx#scrollTo=fGOiYH95ojot Data Preprocessing in Machine Learning using Python 2023 #SimpleImputer #OneHotEncoder #train_test_split #StandardScaler #LabelEncoder Website: https://www.javatpoint.com/how-to-get-datasets-for-machine-learning //Import Library: import pandas as pd import numpy as np //Import Dataset: dataset = pd.read_csv('Data.csv') print(dataset) //Split dataset into X and y i.e. independent and dependent model X = dataset.iloc[:,:-1].values print(X) y = dataset.iloc[:,-1].values print(y) //Handling missing data: from sklearn.impute import SimpleImputer si = SimpleImputer(missing_values= np.nan , strategy='mean') si.fit(X[:,1:3]) X[:, 1:3] = si.transform(X[:,1:3]) print(X) //Encoding categorical data: from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])] , remainder= 'passthrough') X = ct.fit_transform(X) print(X) from sklearn.preprocessing import LabelEncoder le = LabelEncoder() y = le.fit_transform(y) print(y) //Splitting data into training and testing model: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) print(X_train) print(y_train) print(y_test) print(X_test) //Feature Scaling: from sklearn.preprocessing import StandardScaler ss = StandardScaler() X_train[:,3:5] = ss.fit_transform(X_train[:,3:5]) X_test[:,3:5] = ss.transform(X_test[:,3:5]) print(X_train) print(X_test)
Download
0 formatsNo download links available.