- 3.1 Classification Tasks
- 3.2 A Simple Classification Dataset
- 3.3 Training and Testing: Don't Teach to the Test
- 3.4 Evaluation: Grading the Exam
- 3.5 Simple Classifier #1: Nearest Neighbors, Long Distance Relationships, and Assumptions
- 3.6 Simple Classifier #2: Naive Bayes, Probability, and Broken Promises
- 3.7 Simplistic Evaluation of Classifiers
- 3.8 EOC

## 3.4 Evaluation: Grading the Exam

We’ve talked a bit about how we want to design our evaluation: we don’t teach to the test. So, we train on one set of questions and then evaluate on a new set of questions. How are we going to compute a grade or a score from the exam? For now—and we’ll dive into this later—we are simply going to ask, “Is the answer correct?” If the answer is *true* and we predicted *true*, then we get a point! If the answer is *false* and we predicted *true*, we don’t get a point. Cue :sadface:. Every correct answer will count as one point. Every missed answer will count as zero points. Every question will count equally for one or zero points. In the end, we want to know the percent we got correct, so we add up the points and divide by the number of questions. This type of evaluation is called *accuracy*, its formula being . It is very much like scoring a multiple-choice exam.

So, let’s write a snippet of code that captures this idea. We’ll have a very short exam with four true-false questions. We’ll imagine a student who finds themself in a bind and, in a last act of desperation, answers every question with `True`. Here’s the scenario:

`In [6]:`

answer_key = np.array([True, True, False, True]) student_answers = np.array([True, True, True, True]) # desperate student!

We can calculate the accuracy by hand in three steps:

Mark each answer right or wrong.

Add up the correct answers.

Calculate the percent.

`In [7]:`

correct = answer_key == student_answers num_correct = correct.sum() # True == 1, add them up print("manual accuracy:", num_correct / len(answer_key))

manual accuracy: 0.75

Behind the scenes, sklearn’s `metrics.accuracy_score` is doing an equivalent calculation:

`In [8]:`

print("sklearn accuracy:", metrics.accuracy_score(answer_key, student_answers))

sklearn accuracy: 0.75

So far, we’ve introduced two key components in our evaluation. First, we identified which material we study from and which material we test from. Second, we decided on a method to score the exam. We are now ready to introduce our first learning method, train it, test it, and evaluate it.