Latest

100 Days Challenge Day 10 - Decision Trees and Random Forests

100 Days Challenge - Day 10.

Decision Trees and Random Forests

Pic from towardsdatascience site


Learnt about Decision Trees and Random Forests from StatQuest Youtube channel

Topics covered include:
Decision Trees
  • Terminology (Root, Node, Leaf)
  • Gini impurity and Building a Tree for Categorica/Numeric/Ranked data
  • Feature selection and impurity threshold to prevent overfitting.
  • Filling Missing Values
Random Forests
  • Disadvantages of Decision Trees (Inefficiency with new samples)
  • Bootstrapping, Bagging
  • Evaluation (Using Out-Of-Bag dataset)
  • Handling missing values in the original data
  • Proximity Matrix and Distance Matrix
  • Handling missing values in the data we want to classify


Sources:


StatQuest: Random Forests Part 2: Missing data and clustering

No comments