Hyperparameters
Throughout this chapter, there has been much discussion about model parameters. The job of the machine is to find the best values for these parameters; the parameters should result in predictions that are as close as possible to the real observed values across the training set. However, there is another dimension to model performance that needs to be discussed: hyperparameters.
Unlike parameters, which are internal to the model and are adjusted to minimize loss during training, hyperparameters are external configuration settings that influence how well the model performs. For example, after each batch or epoch during training, the model updates its parameters slightly to reduce the loss. The size of this update (how much the model tries to adjust itself to reduce error during the next epoch) is controlled by a hyperparameter called the learning rate.
A high learning rate lets the model adjust quickly but risks overshooting the optimal values and getting stuck in a cycle of jumping around without minimizing the error function. A low learning rate will give you better precision but could slow down training and increase computational cost, or it could get the model stuck in a local minimum, where it essentially loses sight of the big picture of what it is trying to accomplish. Choosing the right learning rate is something of a trade-off. It may take many adjustments to find the optimal learning rate if doing so by hand, but the job can also be made easier by optimization algorithms, which try to guess the optimal learning rate. Such methods include Bayesian optimizations and Grid Search, among others.
Learning rate is one of many hyperparameters, and each algorithm may have its own unique hyperparameters, such as batch size, number of epochs, or even the model’s structure. What you need to remember is that hyperparameters control how the model learns, not what it learns.
Table 2-1 provides a comparison of parameters and hyperparameters.
Table 2-1 Comparing Parameters and Hyperparameters
Aspect |
Parameter |
Hyperparameter |
|---|---|---|
Definition |
Internal values learned from data during training |
External configuration settings |
Purpose |
Defines how the model makes predictions |
Controls how the learning process unfolds |
Examples |
Weights (such as a and b in y = ax + b), bias terms |
Learning rate, batch size, number of epochs, model depth |
Set by |
Learned automatically by the model during training |
Manually defined by the user or optimization algorithm |
Affected during training |
Yes; updated iteratively to minimize the loss function |
No; remains fixed throughout the training run |
Impact |
Model’s actual behavior and output |
Training efficiency, convergence speed, final model quality |
Tuning method |
Learned through gradient descent or similar algorithms |
By hand; can be tuned via techniques such as Bayesian optimization |
