# Machine Learning Classification

• Print
This chapter is from the book

## 3.4 Evaluation: Grading the Exam

We’ve talked a bit about how we want to design our evaluation: we don’t teach to the test. So, we train on one set of questions and then evaluate on a new set of questions. How are we going to compute a grade or a score from the exam? For now—and we’ll dive into this later—we are simply going to ask, “Is the answer correct?” If the answer is true and we predicted true, then we get a point! If the answer is false and we predicted true, we don’t get a point. Cue :sadface:. Every correct answer will count as one point. Every missed answer will count as zero points. Every question will count equally for one or zero points. In the end, we want to know the percent we got correct, so we add up the points and divide by the number of questions. This type of evaluation is called accuracy, its formula being . It is very much like scoring a multiple-choice exam.

So, let’s write a snippet of code that captures this idea. We’ll have a very short exam with four true-false questions. We’ll imagine a student who finds themself in a bind and, in a last act of desperation, answers every question with True. Here’s the scenario:

In :

```answer_key      = np.array([True, True, False, True])
student_answers = np.array([True, True, True, True]) # desperate student!```

We can calculate the accuracy by hand in three steps:

1. Mark each answer right or wrong.

3. Calculate the percent.

In :

```correct = answer_key == student_answers
num_correct = correct.sum() # True == 1, add them up
`manual accuracy: 0.75`

Behind the scenes, sklearn’s metrics.accuracy_score is doing an equivalent calculation:

In :

```print("sklearn accuracy:",
`sklearn accuracy: 0.75`