In Andrew Ng's Machine Learning course (Module 7 on Regularization), he mentions that if we're unsure which parameters to regularize, it's reasonable to include all parameters in the regularization term of the cost function.
However, Iâm trying to understand how this approach helps. Wouldnât penalizing all parameters simply scale them down across the board? Wouldnât that potentially hurt the modelâs performance by reducing the magnitude of the predictions and limiting its ability to fit the data accurately?