Home > Articles

  • Print
  • + Share This
This chapter is from the book

Item 17: Increase the Prominence of a Failure’s Effects

Making problems stand out can increase the effectiveness of your debugging. You can achieve this by manipulating your software, its input, or its environment. In all cases, ensure you perform the changes under revision control in a separate branch, so that you can easily revert them and they won’t end up by mistake in production code.

There are cases where your software simply refuses to behave in the way you expect it to. For example, although certain complex conditions are apparently satisfied, the record that’s supposed to appear in the database doesn’t show up. A good approach in such cases is to lobotomize the software through drastic surgery and see if it falls in line. If not, you’re probably barking up the wrong tree.

As a concrete case, consider the following (abridged) code from the Apache HTTP server, which deals with signed certificate timestamps (SCTs). You might be observing that the server fails to react to SCTs with a time-stamp lying in the future.

for (i = 0; i < arr->nelts; i++) {
   cur_sct_file = elts[i];
   rv = ctutil_read_file(p, s, cur_sct_file, MAX_SCTS_SIZE,
                   &scts, &scts_size_wide);
   rv = sct_parse(cur_sct_file,
              s, (const unsigned char *)scts, scts_size, NULL,
              &fields);
   if (fields.time > apr_time_now()) {
      sct_release(&fields);
      continue;
   }
   sct_release(&fields);
   rv = ctutil_file_write_uint16(s, tmpfile,
              (apr_uint16_t)scts_size);
   if (rv != APR_SUCCESS)
      break;
   scts_written++;
}

A way to debug this is to temporarily change the conditional so that it always evaluates to true.

if (fields.time > apr_time_now()  || 1) {

This change will allow you to determine whether the problem lies in the Boolean condition you short circuited, in your test data, or in the rest of the future SCT handling logic.

Other tricks in this category are to add a return true or return false at the beginning of a method, or to disable the execution of some code by putting it in an if (0) block (see Item 46: “Simplify the Suspect Code”).

In other cases, you may be trying to debug a barely observable effect. Here the solution is to temporarily modify the code to make the effect stand out. If, in a game, a character gets a minute increase in power after some event, and that doesn’t seem to happen, make the power increase dramatically more so that you can readily observe it. Or, when investigating the calculation of an earthquake’s effects on a building in a CAD program, magnify the displayed structure displacement by 1,000 so that you can easily see the magnitude and direction of the structure’s movement.

In cases where your software’s failure depends on external factors, you can increase your effectiveness by modifying the environment where your software executes in order to make it fail more quickly or more frequently (see Item 55: “Fail Fast”). If your software processes web requests, you can apply a load test or stress test tool, such as Apache JMeter, in order to force your application into the zone where you think it starts misbehaving. If your software uses threads to achieve concurrency, you can increase their number far beyond what’s reasonable for the number of cores in the computer you’re using. This may help you replicate deadlocks and race conditions. You can also force your software to compete for scarce resources by concurrently running other processes that consume memory, CPU, network, or disk resources. A particularly effective way to investigate how your software behaves when the disk fills up is to make it store its data in a puny USB flash drive.

Finally, a testing approach that can also help you investigate rare data validation or corruption problems is fuzzing. Under this approach you either supply to your program randomly generated input, or you randomly perturb its input, and see what happens. Your objective is to increase the likelihood of chancing on the data pattern that produces the failure in a systematic way. Having done that, you can use the problematic data to debug the application. This technique may, for example, help you find out why your application crashes when running on your customer’s production data but not when it’s running on your own test data. You can perform fuzzing operations using a tool such as zzuf.

Things to Remember

  • Force the execution of suspect paths.

  • Increase the magnitude of some effects to make them stand out for study.

  • Apply stress to your software to force it out of its comfort zone.

  • Perform all your changes under a temporary revision control branch.

  • + Share This
  • 🔖 Save To Your Account