Implementing mathematical functions
Why not just use the Python built-in functions and those that are defined in the standard or extension Python modules? For example, why not use the math.hypot() function instead of defining our own hypot() function? The answer to this question is that we do use such functions when they are present (because they are likely to be faster and more accurate). However, there is an unlimited number of functions that we may wish to use and only a finite number of functions is defined in the Python standard and extension modules. When you need a function that is not defined in the Python standard or extension modules, you need to define the function yourself.
As an example, we consider the kind of code required for a familiar and important application that is of interest to many potential college students in the United States. In a recent year, over 1 million students took the Scholastic Aptitude Test (SAT). The test consists of two major sections: critical reading and mathematics. Scores range from 200 (lowest) to 800 (highest) on each section, so overall test scores range from 400 to 1600. Many universities consider these scores when making important decisions. For example, student athletes are required by the National Collegiate Athletic Association (NCAA), and thus by many universities, to have a combined score of at least 820 (out of 1600), and the minimum eligibility requirement for certain academic scholarships is 1500 (out of 1600). What percentage of test takers is ineligible for athletics? What percentage is eligible for the scholarships?
Two functions from statistics enable us to compute accurate answers to these questions. The standard normal (Gaussian) probability density function is characterized by the familiar bell-shaped curve and defined by the formula . The standard normal (Gaussian) cumulative distribution function Φ(z) is defined to be the area under the curve defined by ϕ(x) above the x-axis and to the left of the vertical line x = z. These functions play an important role in science, engineering, and finance because they arise as accurate models throughout the natural world and because they are essential in understanding experimental error. In particular, these functions are known to accurately describe the distribution of test scores in our example, as a function of the mean (average value of the scores) and the standard deviation (square root of the average of the squares of the differences between each score and the mean), which are published each year. Given the mean μ and the standard deviation σ of the test scores, the percentage of students with scores less than a given value z is closely approximated by the function Φ(z, μ, σ) = Φ((z—μ)/σ). Functions to calculate ϕ and Φ are not available in Python’s math module, so we develop our own implementations.
In the simplest situation, we have a closed-form mathematical formula defining our function in terms of functions that are implemented in Python’s math module. This situation is the case for ϕ—the math module includes functions to compute the exponential and the square root functions (and a constant value for π), so a function pdf() corresponding to the mathematical definition is easy to implement. For convenience, gauss.py (PROGRAM 2.1.2) uses the default arguments μ = 0 and σ = 1 and actually computes Φ(x, μ, σ) = Φ((x — μ) / σ) / σ.
No closed form
If no formula is known, we may need a more complicated algorithm to compute function values. This situation is the case for Φ—no closed-form expression exists for this function. Algorithms to compute function values sometimes follow immediately from Taylor series approximations, but developing reliably accurate implementations of mathematical functions is an art and a science that needs to be addressed carefully, taking advantage of the knowledge built up in mathematics over the past several centuries. Many different approaches have been studied for evaluating Φ. For example, a Taylor series approximation to the ratio of Φ and ϕturns out to be an effective basis for evaluating the function:
- Φ(z) 1/2 + ϕ(z) ( z + z3/3 + z5/ (3·5) + z7 / (3·5·7) +... ).
This formula readily translates to the Python code for the function cdf() in PROGRAM 2.1.2. For small (respectively large) z, the value is extremely close to 0 (respectively 1), so the code directly returns 0 (respectively 1); otherwise, it uses the Taylor series to add terms until the sum converges. Again, for convenience, PROGRAM 2.1.2 actually computes Φ(z, μ, σ) = Φ((z—μ) / σ), using the defaults μ = 0 and σ = 1.
Running gauss.py with the appropriate arguments on the command line tells us that about 17% of the test takers were ineligible for athletics in a year when the mean was 1019 and the standard deviation was 209. In the same year, about 1% percent qualified for academic scholarships.
COMPUTING WITH MATHEMATICAL FUNCTIONS OF ALL sorts plays a central role in science and engineering. In a great many applications, the functions that you need are expressed in terms of the functions in Python’s math module, as we have just seen with pdf(), or in terms of a Taylor series approximation or some other formulation that is easy to compute, as we have just seen with cdf(). Indeed, support for such computations has played a central role throughout the evolution of computing systems and programming languages.