4
$\begingroup$

In Andrew Ng's Machine Learning course (Module 7 on Regularization), he mentions that if we're unsure which parameters to regularize, it's reasonable to include all parameters in the regularization term of the cost function.

enter image description here

However, I’m trying to understand how this approach helps. Wouldn’t penalizing all parameters simply scale them down across the board? Wouldn’t that potentially hurt the model’s performance by reducing the magnitude of the predictions and limiting its ability to fit the data accurately?

New contributor
Harsh Pokarna is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$
1
  • 3
    $\begingroup$ Good question. And you've got a fantastic answer below. But just to be complete, usually the intercept or bias parameter is left out of the regularization. So all parameters isn't totally accurate. $\endgroup$
    – Rick Hass
    Commented 22 hours ago

1 Answer 1

7
$\begingroup$

Wouldn’t penalizing all parameters simply scale them down across the board?

Depending on the size of the penalty, the answer is potentially "Yes".

Wouldn’t that potentially hurt the model’s performance by reducing the magnitude of the predictions and limiting its ability to fit the data accurately?

Yes, fit to the training data is deteriorated slightly, but this comes at the gain of better generalization -- meaning the model will have better accuracy on new observations. For a simpler example of how a biased estimator can have better accuracy, see the James-Stein estimator.

$\endgroup$
1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.