Home > Articles > Security > Network Security

  • Print
  • + Share This
This chapter is from the book

2.6 Attempting to Fool Signature Detectors

Signature detectors work by computing a value for a block (or chunk) of text in a message. For example, suppose a message contained this text:

We sell herbs at the lowest price you will
ever find on the net.

A signature-generating program might create an expression that represents this text, such as the following:

244372015810742154622705

This "signature" is saved in a file or database. Later, when another message arrives, its chunks are also given signatures. Then each signature is looked up, and, if it is found, that serves as an indication that the message may have been seen before. Clearly, several signatures will have to match so that one message may be considered significantly like another.

Similar spam detection software recognizes phonemes (distinct parts of words) instead of chunks of words. Other software performs permutations of the divided text to increase the number of signatures used, and still others perform statistical analysis on individual words and then store the probabilities.

But all these forms of spam detection share the common method of examining the message's text. Spammers, recognizing that text analysis is being used, have responded by adding large chunks of random text to each message.

Random text can be actual words and names in random order:

dissonant deanna heron aphasia restaurateur circulate
controllable corporeal cranston giuliano helmholtz bertha
albany shank eye asphyxiate commentary gaston aide filler
chipboard prostheses perturb cryptographer atlantic bernice

Random text can also be random combinations of characters:

eyhydxre yaceyaxv gesmveu vmlpv wmgrxa drgcah mqbjneq
wbfqzkmwr fdbkqogtgzwv lsunhut wuwnp- hivrkef dhdpfhcu
ndowgkx cjxrofun yepjhxp rhbxag ncgvmv

Random text can also be a solid stream of random characters:

hyfaqjimgdalmrymmolaktivajvctikdhpfzaplgumufsvtjgu
tccqenngjwtodktenkrvefpmkiherqymsccysqfbmapkkvxuo
tauimesuijmivglyefqlgclxvyjsxfgsfadrhvnrhzacfncmssx
awlzrjilipsbuuenbbdtievlmkpycivegidatnlccffyajnbmqw

Finally, random text can also be an actual abstract from real text, where a different abstract is used in each message: [9]

Mother called me home that night with a shout
that told me there was trouble. "Mom," I yelled,
pounding up the back steps. "What's wrong, Mom?"

When you parse a message to detect spam, your goal is to find a way to skip such random text and to run signatures only on the portions of the message that do not change. Portions of a message that should be checked include the following:

  • Common images
  • Web references
  • Email addresses
  • Phone numbers
  • + Share This
  • 🔖 Save To Your Account