Home > Articles > Security > Network Security

This chapter is from the book

2.8 Grokking the Site

Spam messages generally contain a method for the recipient to contact the spammer (or the business on whose behalf the spam is being sent). Some offer an email address and others a phone number, but most try to get the recipient to go to a website. To do this, spammers include a clickable web reference (URL) in the body of the spam email. But because spammers seek to keep their identities a secret, they generally try to disguise all web references. In this section we show some of the tricks they use.

First note that a web reference is usually specified as part of an HTML a command:

<a href="http://www.example.com"> visible text </a>

The <a (followed by a space) begins the command. The <a is followed by one or more keywords particular to that command, all terminated by the > character. The href= keyword indicates a web reference (URL). Following the > is the actual text that will appear on the screen and that must be clicked to invoke the web reference. The </a> ends the command.

In the following, we examine the pieces of this web reference one item at a time.

  • Section 2.8.1 examines the leading <a part.
  • Section 2.8.2 explains case insensitivity.
  • Section 2.8.3 examines the http: part.
  • Section 2.8.4 shows how email addresses can mask the www.example.com part.
  • Section 2.8.5 shows that IP numbers and hexadecimal representations of IP numbers are used by spammers to disguise host names.
  • Section 2.8.6 shows how the www.example.com part can be hidden with redirects.
  • Section 2.8.7 shows how the www.example.com part can be ridiculously stretched out to say aa.bb.cc.dd.ee.ff.gg.hh.ii.jj.kk.example.com.
  • Section 2.8.8 shows how the www.example.com part can be hidden behind CNAME records.
  • Section 2.8.9 shows that URLs can also be used as comments.

One common method of using URLs to fight spam is to record the host names from those URLs in a database. Each time a new piece of email shows up, the URL is found and the new host name is looked up, and if the new name is found in the database, it is interpreted as spam. In addition to host names, a well-designed antispam database includes IP numbers.

To illustrate, consider a database that contains the host name spam.example.com and that host's IP number (192.168.33.44). When new mail arrives, the host name in the arriving mail is looked up in the database. If that new host also has the address 192.168.22.44, it too is rejected even though the new and old host names may be different.

2.8.1 The HTML Keyword

The <a command always contains a web reference, but other HTML commands can also contain web references. The <a command indicates a web reference by using an href= expression. For example:


   <a href="http://www.example.com">

But other HTML commands use different expressions to indicate the web reference. Table 2.1 lists the HTML commands that allow URLs and the expression used by each.

Table 2.1. HTML Commands That Reference URLs

Command

Expression

Description

<a

href=

Create a hyperlink (href=) or identifier (name=) in a document.

<applet

codebase=

Define an executable applet with a document.

<area

href=

Define a mouse-clickable area within a map.

<base

href=

The base for all URLs in the document.

<body

background=

Set background to a URL.

<del

cite=

Citation reference for deleted information.

<embed

src=

Embed an object in a document.

<form

action=

URL to use on submission of a form.

<frame

src=

Define a frame within a frameset.

<iframe

src=

Embed a frame inside a document.

<img

dynsrc=

Specify a video clip to play.

<img

lowsrc=

Specify a low-resolution image to preload.

<img

src=

Specify an image to load.

<img

usemap=

Coordinates list for a map.

<ins

cite=

Citation for inserted commentary.

<input

src=

Image to select for an input choice.

<isindex

action=

Create a searchable document.

<link

href=

Create an interdocument link.

<link

src=

Specify an external style sheet to use.

<meta

url=

Reference for an HTTP refresh.

<object

classid=

Identify the class of an object.

<object

codebase=

Source of the code base for the object.

<object

data=

Source of data for the object.

<object

name=

Source for the name of the object.

<object

usemap=

Specify the image map to use with the object.

<q

cite=

Citation for the enclosed quotation.

<script

src=

Source for external language code to run.

<table

background=

Source of background image to load.

<td

background=

Source of background image to load.

<th

background=

Source of background image to load.

<tr

background=

Source of background image to load.

2.8.2 Just in Case

Before digging deeply into URLs, we first need to comment on one of the (sometimes overlooked) characteristics of HTML in general.

First, note that all HTML commands and the URLs they reference are case insensitive. That is, all the following are the same:

<a href="www.example.com">
<A HREF="WWW.EXAMPLE.COM">
<a HrEf="wWw.ExAmPlE.CoM">

Note, too, that host and domain names are also case insensitive.

2.8.3 The Protocol Specification

A protocol can be thought of as a language used by programs to communicate over a network connection. [10] Usually a request is sent from the client software to the server software, and an answer (or reply or data) is returned. The most common protocols are http (Hypertext Transport Protocol), https (HTTP with Secure Sockets Layer, or SSL), ftp (File Transfer Protocol), and file (for viewing local files).

In a URL, the expression that identifies the protocol prefixes the host or domain:

<a href="http://www.example.com" ... >

That prefix expression (here, the http://) is followed by a host or domain specification (or an IP number) and whatever additional information is needed:

<a href="http://www.example.com/cgi-bin/search?who=bob">
<a href="http://192.168.44.55">

Note that the protocol can be excluded, in which case it is automatically set to the document's default. The protocol, host or domain, and other information are normally enclosed in quotation marks to protect HTML-capable mail programs from stumbling over illegal characters. The quotation marks may be double or single, but whichever is used, they must pair up (a double quotation mark may not be mixed with a single quotation mark):


href="http://www.example.com"
href="http://www.example.com"
href="http://www.example.com"     Won't work.

Note, however, that quotation marks can often be safely omitted, so you should not count on their presence when parsing spam email.

If a spam HTML message is in a language other than English, the quotes may be present but specified using the other language's encoding:

href=EF2Dhttp://www.example.comEF2D

Here, the EF2D is hexadecimal that represents two binary byte values (not four characters) that specify quotation marks appropriate to the language. So again it is better not to depend on quotation marks when parsing URLs.

Although the protocol, when present, is always specified with a trailing ://, in actuality all that is really needed is the colon. [11] Thus, all three of the following produce the same URL result:

http://www.example.com
http:www.example.com
http:////////////www.example.com

Notice that the number of forward slashes is unimportant. The single required character is the colon.

Also note that there can be no space between the protocol and its colon, but the colon can be followed by arbitrary white space characters.


http :www.example.com    Space before colon won't work.
http: www.example.com    Space after the colon is OK.
http:                    A new line is OK.
www.example.com

Note also that the protocol does not need to actually be present with each URL. A <base command (if present) sets a prefix that will precede all URLs that do not specify a protocol. The prefix is always terminated by a forward slash (/) even if one is omitted from the <base command.

<base href="http://www.example.com">
<img src="images/bob.jpg">

These two commands are the equivalent of the following (single) command:

<img src="https://www.example.com/images/bob.jpg">

If a <base is omitted and if the URL omits a protocol, the default is generally the http:// protocol:


<a href="www.example.com">     The protocol defaults to http://.

2.8.4 Email Addresses Mask URLs

To protect from inappropriate input, some HTML-compliant mail readers interpret an email address that is part of an http:// reference to be the same as a host or domain specification. [12] For example, the email address in the first line is interpreted (in the second line) as if the user part and the @ were omitted:

<a href="http://bob@www.example.com">
<a href="http://www.example.com">

The lesson is that whenever you are parsing a host or domain specification, you will need to start parsing over again when you encounter an @ character.

2.8.5 IP Numbers Too

Spammers are aware that the domain part of the URL does not need to be expressed in host.domain form (as www.example.com), and they use that fact to help disguise the host's name. For example, the following replaces the host.domain form with the IP number of the host www.example.com:

<a href="http://192.168.22.33">

Here 192.168.22.33 is the IP number for www.example.com. So be aware, when parsing URLs, that the host.domain part can be expressed as an IP number, too.

Also note that IP numbers can be expressed in decimal or in hexadecimal when prefixed with a literal 0x, thus making them even harder to detect:


<a href="http://0xC0A81621">      IP number in hexadecimal
<a href="http://3232241185">      IP number in decimal

All this effort to disguise a host's name with a cryptic-looking hexadecimal address allows spam email to double as a means to accomplish fraud. Consider, for example, the following web reference and surrounding text:

Your ATM card PIN number has expired. For security
reasons, connect to our
<a href="https://www.ABCDE-Bank.com:Secure@0xC0A8162">
secure server</a> and select
a new PIN as soon as possible.

Users who click on this URL are taken to the spammer's fraud site at @0xC0A8162, and not to the bank's site as they would expect. Even more dangerous, users see the following literal link address in the browser's link window, thereby being further fooled into thinking that the link is legitimate:

This example shows why it is crucial to start over when you encounter an @ when parsing a URL and why you need to allow for host.domain, IP address, and hexadecimal forms of addresses when parsing the URL.

2.8.6 Dealing with Redirects

A redirecting site is a host that takes a web reference that points to itself, strips away the self-referencing part, and then issues an HTTP redirect command back to the user's browser with the remaining part of the reference. The effect for the user is to view the redirected-to site and not the redirecting host's site.

http://redirect.example.com/*http://www.real.host
http://redirect.example.com/*http://www.real.host
http://www/real.host

Here, redirect.example.com is a redirecting site. When a browser visits it with the full URL shown on the first line, it recognizes the self-referencing part (bold on the second line), strips that self-reference, and returns an HTTP redirect to the actual host (shown on the third line).

The presence of redirect sites increases the complexity of parsing an actual site from a URL in spam email. Currently it is sufficient to simply start parsing over again when you encounter an http: or an https: while parsing a URL. As spammers gain skill and experience, however, such a simple solution will become less effective.

There are many redirect sites on the Internet. One that spammers know well is rd.yahoo.com. [13] They all share the same characteristics. The URL for the redirecting host is listed first, followed by a forward slash or backslash, then a special character, and then the URL for the actual host.

<a href="http://rd.yahoo.com/xxxx/*http://real.site">

For the rd.yahoo.com family of servers, the special character is an asterisk. For others the special character varies. The one common characteristic among all redirect servers is that the special character is not one that would normally appear as part of a URL. [14]

The redirect site can be followed by an arbitrary amount of URL information. For example:

href="http://rd.yahoo.com/bypass/winkie/food/thumb*big/dairy/
noisy/gyroscope/middle/fred/*http://real.site"

Recall that the special character must follow a forward slash. So here the * in thumb*big is not special. That is, it does not begin a URL for the real site.

Note that backslashes are the equivalent of forward slashes in the redirect portion of the URL because they are essentially ignored by the redirect server:

href="http://rd.yahoo.com\bypass/winkie/food\thumb*big/dairy/noisy/gyroscope/\middle/fred/http://www.chunky.example.com/acidrain/moose/jane/wheel\*http://real.site"

Here, the actual site (always last) is real.site. But beware: real.site is the spammer's site and subject to the spammer's rules. It is not unreasonable to expect spammer sites to employ new methods that will make the real.site appear perhaps second from the last or even third from the last:

href="http://rd.yahoo.com\bypass/winkie/food\thumb*big/dairy/noisy/gyroscope/\middle/fred/http://www.chunky.example.com/acidrain/moose/jane/wheel\*http://real.site?search=http://some.bogus.site&referenece=http://another.bogus.site"

Just as it is important to begin parsing over again when you find an http: or https:, it is also necessary to stop parsing when a URL argument (the ?) appears. Clearly, spammers will go to great lengths to disguise the actual URL, and the examples we have shown here are only a hint of the techniques you will see in the future.

2.8.7 Wildcard DNS Records

The host name part of a URL can be subject to the same random word masquerading as the body, making it difficult to detect. To illustrate, consider these two URLs:

href="http://bob.biff.bonny.bill.betty.boop.example.com"
href="http://andy.able.alex.annie.alice.boop.example.com"

If you look at only one of these, there is no way to know for certain where the randomizing ends. But looking closely at two, one might surmise that the host name is boop.example.com. But one might be wrong. The real host might actually be example.com. To find out which is right, we need to delve briefly into DNS records.

When an HTML-capable mail program needs to connect to a URL's site, it first must look up the address of that site. For both of the sample randomized host names, it would (for example) find the address 192.168.111.44. But look at what happens when we look up the two names we suspect to be the real host names for these URLs:

boop.example.com    U2192.GIF   192.168.1.23
example.com         U2192.GIF   192.168.111.44

Clearly, our presumption—that the boop.example.com host was the real host—was wrong. Instead, the real host is example.com because it has the address 192.168.111.44.

But a savvy spammer might anticipate this logic and use a host name that appears random but is actually the real host name.


boop.example.com U2192.GIF   some innocent's address
example.com U2192.GIF   some innocent's address
able.alex.andy.boop.example.com U2192.GIF 192.168.111.44

Here, if you decide to use the address that you suppose is the real host, you might cause some innocent site's address to be interpreted as that of a spamming site. The correct way to record the spamming address for later use is to look up and record the full domain name in the reference, even if it appears to be random.

2.8.8 CNAME Records and URLs

When an HTML-capable mail program attempts to look up a URL's host or domain, it may receive another host name in return rather than the expected address. That new host name is called a CNAME record. Here's how it works:

  • You find the URL http://example.com.
  • You look up the host name example.com.
  • You expect an address, such as 198.162.33.44, but instead you get the host name www.example.net.

When you look up a URL's host name and expect an address but instead get another host name in return, you need to do an additional lookup to find the actual address. [15]

    example.com  U2192.GIF   www.example.net
www.example.net  U2192.GIF   192.168.33.44

But CNAMEs can lead to other CNAMEs, thus creating a long thread of potential lookups, and CNAMEs can even form infinite loops:


    example.com U2192.GIF www.example.net
www.example.net U2192.GIF www.example.com
www.example.com U2192.GIF example.com      An infinite loop!

When you combine the risk of CNAMEs with the need to decipher long host names, the thread of lookups might get ugly indeed:


able.alex.andy.boop.example.com U2192.GIF www.example.net
                www.example.net U2192.GIF boop.example.com
               boop.example.com U2192.GIF 192.168.111.44
     alex.andy.boop.example.com U2192.GIF 192.168.22.4
          andy.boop.example.com U2192.GIF 10.9.4.2
               boop.example.com U2192.GIF 192.168.111.44
                    example.com U2192.GIF example.net
                    example.net U2192.GIF example.com
                    example.com U2192.GIF example.net
                    example.net U2192.GIF example.com
                         etc. in an infinite loop

When you write code to decipher a long host name, be sure to account for the possibility of infinite loops.

2.8.9 URLs Used as Comments

Just as non-HTML words can be used to create comments (see section 2.5), so can URLs. For example, consider the following:

V<a href="bob.example.com"></a>i<a href="jane.example.com></a>a<a href="alice.example.com"></a>g<a href="dan.example.com"></a>ra

When viewed on an HTML-aware mail reader, the preceding would appear like this:

Viagra

The use of URLs as comments is intended to make it difficult to find the actual URL in the message. When you look for the URL, it is not enough to simply look for the </a> abutting the reference, because arbitrary nonprint HTML can also appear between the two:

V<a href="bob.example.com"><font size="+5"></font></a>i<a href="jane.example.com"><font color="white"></font></a>

Finding the URL when this technique is used requires your spam scanner to act almost as an actual HTML parser.

2.8.10 JavaScript.Encode URLs

Last here—but certainly not the last word in hiding the URL—is the technique of encoding a URL using JavaScript.Encode. The spammer's idea in this strategy is to wrap the URL in JavaScript so that it will be decoded by the browser. For example, consider the following obscured URL (wrapped to fit the page):

<script language="JScript.Encode">#@~^hQAAAA=3D=3D~@#@&[Km!:+ YcADbYn'E@> !(o"bHA~?"Z'r4OYa)Jz+!+ O, FF+R8*f&^kxV 4YhVr~qq9:C{!P_2&!C:'TPwI)\AAr"92"> '!,j/I}SdqHMxE WE@*@!&qwI)\A@*BbI@#@&AyIAAA==^#~@</script>

Here, the HTML tag, <script, tells the interpreter to decode what follows using the JavaScript.Encode protocol. That protocol will then decode everything between the leading <script and the ending </script>. When decoded, the URL becomes the following:

document.write('<IFRAME src="https://192.168.23.45/link.html" WIDTH=0 HEIGHT=0 FRAMEBORDER=0 SCROLLING=0>')

After decoding, it is clear that the encoded URL contains a web reference that the spammer wants to keep secret. But we can use it to record the URL of the spamming site. Here, that site is represented by the IP address 192.168.23.45, but in other JavaScript. Encoded URLs the reference may be a host name or may be further obscured by other means.

See http://www.virtualconspiracy.com/ for C language source examples of ways to decode JavaScript.Encode.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020