ISLR_ch6.0 Intro_Model_Selection
Setting:
In the regression setting, the standard linear model $Y = β_0 + β_1X_1 + · · · + β_pX_p + \epsilon$
In the chapters that follow, we consider some approaches for extending the linear model framework.
Reason of using other fitting procedure than lease squares:
Prediction Accuracy:
- Provided that the true relationship between the response and the predictors is approximately linear, the least squares estimates will have low bias.
- If n $\gg$ p, least squares estimates tend to also have low variance $\Rightarrow$ perform well on test data.
- If n is not much larger than p, least squares fit has large variance $\Rightarrow$ overfitting $\Rightarrow$ consequently poor predictions on test data
- If p > n, no more unique least squares coefficient estimate: the variance is infinite so the method cannot be used at all
By constraining or shrinking the estimated coefficients, we can often substantially reduce the variance at the cost of a negligible increase in bias.
Model Interpretability:
- irrelevant variables leads to unnecessary complexity in the resulting model. By removing these variables—that is, by setting the corresponding coefficient estimates to zero—we can obtain a model that is more easily interpreted.
- least squares is extremely unlikely to yield any coefficient estimates that are exactly zero $\Rightarrow$ feature selection
Alternatives of lease squares:
- Subset Selection
- Shrinkage
- Dimension Reduction