Home > Articles

This chapter is from the book

Hyperparameters

Throughout this chapter, there has been much discussion about model parameters. The job of the machine is to find the best values for these parameters; the parameters should result in predictions that are as close as possible to the real observed values across the training set. However, there is another dimension to model performance that needs to be discussed: hyperparameters.

Unlike parameters, which are internal to the model and are adjusted to minimize loss during training, hyperparameters are external configuration settings that influence how well the model performs. For example, after each batch or epoch during training, the model updates its parameters slightly to reduce the loss. The size of this update (how much the model tries to adjust itself to reduce error during the next epoch) is controlled by a hyperparameter called the learning rate.

A high learning rate lets the model adjust quickly but risks overshooting the optimal values and getting stuck in a cycle of jumping around without minimizing the error function. A low learning rate will give you better precision but could slow down training and increase computational cost, or it could get the model stuck in a local minimum, where it essentially loses sight of the big picture of what it is trying to accomplish. Choosing the right learning rate is something of a trade-off. It may take many adjustments to find the optimal learning rate if doing so by hand, but the job can also be made easier by optimization algorithms, which try to guess the optimal learning rate. Such methods include Bayesian optimizations and Grid Search, among others.

Learning rate is one of many hyperparameters, and each algorithm may have its own unique hyperparameters, such as batch size, number of epochs, or even the model’s structure. What you need to remember is that hyperparameters control how the model learns, not what it learns.

Table 2-1 provides a comparison of parameters and hyperparameters.

Table 2-1 Comparing Parameters and Hyperparameters

Aspect

Parameter

Hyperparameter

Definition

Internal values learned from data during training

External configuration settings

Purpose

Defines how the model makes predictions

Controls how the learning process unfolds

Examples

Weights (such as a and b in y = ax + b), bias terms

Learning rate, batch size, number of epochs, model depth

Set by

Learned automatically by the model during training

Manually defined by the user or optimization algorithm

Affected during training

Yes; updated iteratively to minimize the loss function

No; remains fixed throughout the training run

Impact

Model’s actual behavior and output

Training efficiency, convergence speed, final model quality

Tuning method

Learned through gradient descent or similar algorithms

By hand; can be tuned via techniques such as Bayesian optimization

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.