100 Days Challenge Day 14 - Datasets and Feature Engineering
100 Days Challenge - Day 14
Feature Engineering
Pic Credits: How to Win A Data Science Competition Course on Youtube |
Learnt feature engineering methods from Krish Naik's Youtube channel and videos from How to Win a Data Science Competition Course on Youtube
Topics covered:
- Continuous, Discrete (Numeric and Categorical) and Date-Time Variables
- Loading a random sample with specified features from a csv data file
- Plotting histograms
- Converting date and time object to Pandas date-and-time datatype
- Feature Generation
- One-Hot Encoding (For Non-Tree based models)
- Scaling and Regularization (for Non Tree-based models)
- Scikit-learn scalers (MinMaxScaler and StandardScaler)
- Handling Outliers using Winsorization and Rank Transformation(Only for Linear Models, KNN and Neural Networks)
- Log Transform and Raising to Power<1 (For neural Networks)
- Generating new features which are functions of existing features
- Categorical and Ordinal Features
- Label Encoding (Works best with Trees)
- Frequency Encoding
- Feature Generation using Categorical Features (Works best for Linear Models and KNN)
Sources:
- Feature Engineering in Python- What are continuous numerical variables?
- Feature Engineering in Python- What are discrete numerical variables?
- Feature Engineering in python-What are date time variable?
- 9 - Feature Engineering Overview | How to Win a Data Science Competition: Learn from Top Kagglers
- 10 - Numeric Features | How to Win a Data Science Competition: Learn from Top Kagglers
- 11 - Categorical and Ordinal Features | How to Win a Data Science Competition: Learn from Top Kagglers
No comments