Latest

100 Days Challenge Day 14 - Datasets and Feature Engineering

100 Days Challenge - Day 14

Feature Engineering

Pic Credits: How to Win A Data Science Competition Course on Youtube

 

Learnt feature engineering methods from Krish Naik's Youtube channel and videos from How to Win a Data Science Competition Course on Youtube


Topics covered:
  • Continuous, Discrete (Numeric and Categorical) and Date-Time Variables
  • Loading a random sample with specified features from a csv data file
  • Plotting histograms
  • Converting date and time object to Pandas date-and-time datatype  
  • Feature Generation
  • One-Hot Encoding (For Non-Tree based models)
  • Scaling and Regularization (for Non Tree-based models)
  • Scikit-learn scalers (MinMaxScaler and StandardScaler)
  • Handling Outliers using Winsorization and Rank Transformation(Only for Linear Models, KNN and Neural Networks)
  • Log Transform and Raising to Power<1 (For neural Networks)
  • Generating new features which are functions of existing features
  • Categorical and Ordinal Features
  • Label Encoding (Works best with Trees)
  • Frequency Encoding
  • Feature Generation using Categorical Features (Works best for Linear Models and KNN)

No comments