Computer Forensics: Tracking an Offender
Learn to collect and analyze evidence found in a compromised computer system. The goal of computer forensics is to conduct the investigation in a manner that will hold up to legal scrutiny.
In this sample chapter from Computer Forensics: Incident Response Essentials, Kruse and Heiser explain how to track an offender across the digital matrix.
In this age of pervasive connectivity, it is unrealistic to expect cyber crime incidents to be isolated to a single system. Like characters in a William Gibson novel, cyber sleuths often have to track offenders across the digital matrix. While the techniques of network forensics are still largely undeveloped, it would be a disservice to devote an entire book to computer forensics without any discussion of Internet methods that you can use to find leads to suspect computers.
When tracking cyber offenders across the Internet, you use many of the same software tools that system and network administrators use to monitor and test network connectivity. Many of these programs are included in modern operating systems, and you may already be familiar with them. Even if you are already comfortable with the tools we discuss in this chapter, you may not have considered their use during an investigation. Unfortunately, many of our most common Internet application protocols make no provisions for strongly authenticating the transmitter of a communication. Services like email and Usenet are based on simple text-based initiation protocols and basically use the honor system. This complicates investigations because you cannot necessarily trust the identification information contained within Internet messages. The better you understand the underlying protocols and processes, the better you can evaluate the validity of the names and Internet addresses associated with Internet communications.
This book is intended to be an introduction to computer investigations, not to TCP/IP. If you want to be an effective network tracker, you need a thorough understanding of the Internet protocol suite. Many books are available on this subject. W. Richard Stevens' three-volume set, TCP/IP Illustrated, published by Addison-Wesley (1993, 1995, 1996), is considered one of the definitive references. The more comprehensive and detailed your understanding of Internet technology, the greater your skill at investigating network-enabled crime.
The Internet and many private networks run a set of protocols commonly referred to as TCP/IP, which stands for Transmission Control Protocol/Internet Protocol. The label "TCP/IP" is a convenient abbreviation for a set of related network protocols, the development of which effectively started in the late 1960s and is ongoing today. More precisely referred to as "the Internet protocol suite," it is a set of communication conventions that a device must implement in order to participate on the Internet. TCP/IP is not specific to any operating system, programming language, or network hardware. It is an equal opportunity set of standards that enables Macs, Windows, Unix, routers, switches, and a variety of mainframe environments to communicate with each other. It is not specific to network topology, meaning that Ethernet, token ring, and wireless networks can also interoperate. This universal interoperability is a prerequisite to both modern computer crime and investigations.
Plenty of books and essays exhaustively discuss the Open Systems Interconnection (OSI) seven-layer Network Reference model, so we won't spend a great deal of time on it. The model is illustrated in Figure 2-1. The original seven-layer model was conceived as an abstraction that didn't apply to any currently existing technologyespecially not the burgeoning suite of Internet protocolsand the exact labeling of Internet services and protocols within this model continues to be a matter of tremendous debate (especially the session and presentation layers). But it is a debate of no consequence because after all, the Internet still functions whatever abstract labels are assigned to its protocols. The important lesson to learn from this model is that certain infrastructural services provide the foundation for the actual file sharing and distributed applications that are the reason the network exists in the first place. These services are stacked on top of each other like Lego building blocks. Its relevance to forensic investigations is that you cannot interpret evidence without understanding its place within the hierarchy of stacked services. Let's look at a concrete example to see how this layering works.
Figure 2-1 OSI seven-layer model
You might not have realized that when you send and receive email, you are dealing with three different addresses, each within a different network layer. Every network interface has a unique hardware address burned into it at the factory. This address is called the MAC (media access control) address. (We discuss an unusual use Microsoft makes of this address in Chapter 8.) This address enables all of the devices on a LAN segmentthose devices that can see each other's network trafficto refer to each other. At the network layer, devices recognize traffic intended for themselves on the basis of the MAC addresses incorporated within the chunks of data on the network, which are called packets. It is entirely impractical for every device on the Internet to refer to devices outside of their LAN segment by this hardware address, so when a computer joins the Internet, it has a numeric IP address assigned to it. An IP address is usually written as a series of four numbers in the range 0255, separated by dots, such as 192.168.0.55.
Certain IP addresses, or ranges of addresses, are reserved for special purposes. For example, IP addresses that end with 0 denote a network address, such as 192.168.0.0. An IP address that ends with 255 denotes a broadcast address, such as 192.168.0.255. "Private addresses" in the 192.168.0.0 to 192.168.255.255 range may be used on internal networks. These addresses "are intended for intra-enterprise communications, without any intention to ever directly connect to other enterprises or the Internet itself." 1 When tracking offenders, if you locate an address within this range, don't pack your bags for California (the location of the Internet Assigned Numbers Authority 2); you have to determine the suspects' external IP address to locate them.
An Internet address actually contains two parts. The network portion is unique among all the networks interconnected to the LAN segment (which often means the entire Internet), and the host section is unique among all the devices using the same network portion. The effect is that all IP addresses on the Internet are both unique and identifiable as being within a specific network. Private networks use addressing that is unique within their networks, but any two private networks can use the same "address space" as long as they are not interconnected to each other.
The uniqueness of addresses and the distinction between network and host portions of the address make it practical for routers to know where to route to. Entire books have been written about routing. For our simplified purposes, routers are devices that automatically forward your data packets to another network when the destination is not your network. Routers base their decision on where to forward your packet on current conditions and their programmed instructionsrouters do whatever is most expedient, which means that the route between any two points can change. This is completely different from the Public Switched Telephone Network (PSTN). When you make a telephone call, the switches within the PSTN sequentially establish a circuit from end to end, and it is maintained throughout the duration of the call. On the Internet, it may often seem as if you are using a circuit, but the actual path taken by each individual packet is dependent upon the whims of the intermediate routers.
The network part of an Internet address is assigned by the Internet Assigned Numbers Authority (IANA) to each network owner, and the host part is assigned to individual hosts and devices by the network owner. The network may be run by an organization (business or government agency), or it may be run by an Internet service provider (ISP) to provide Internet access to its customers. In the latter case, the IP addresses may be used by individuals or multiple organizations. Because IP addresses are used for routing, when a device is moved to a new network, it often requires a new address.
IP address can be statically or dynamically assigned. Computers that are assigned a static IP address always use the same IP address until it is manually changed to a new address, which is becoming increasingly less convenient in a time of constant reorganizations and mobile computers. Dynamic addresses are automatically assigned to a computer when it registers itself on a network using a protocol called Dynamic Host Configuration Protocol (DHCP) or Windows Internet Naming Service (WINS), a Microsoft protocol that is rapidly becoming obsolete. For network administrators, DHCP neatly solves the tedium and confusion of manually assigning constantly moving Internet devices. Virtually all ISPs use DHCP to assign addresses to their dial-up customers, and many permanently connected home users have dynamically assigned addresses that can change whenever their cable modems are powered off and on. Use of DHCP is definitely on the increase, but unfortunately, DHCP makes detective work a little more difficult.
Reading Obfuscated IP Addresses
Those who send spam, unsolicited commercial junk mail, usually try to keep their true identities secretotherwise, they would be overwhelmed by disgruntled Internet citizens who wish to retaliate. In addition to using a bogus return email address, they often include obfuscated URLs. Instead of having a human-readable name, or the dotted-decimal format such as 18.104.22.168, a URL may appear in 10 Digit Integer Format (base 256), so it appears like this: http://2280853951.
It's fairly easy to convert a number in this format back into the normal quad format so that you can research the ownership of a Web site.
Open Windows Calculator in scientific mode (in the Calculator window, choose View | Scientific).
Convert 2280853951 to hexadecimal format = 87F311BF.
Now convert each pair to decimal notation and add the dots:
87 = 135 F3 = 243 11 = 17 BF = 191 22.214.171.124 Dotted Quad to 10 Digit Decimal (base 256): Dotted Quad format: A.B.C.D = 10 Digit Decimal # A(2563) + B(2562) + C(2561) + D = Example: 126.96.36.199 = 185(2563) + 127(2562) + 185(2561) + 152 = 3103784960 + 8323072 + 47360 + 152 = 3112155544
Or if you hate math like we do, the easiest way to convert is to let ping or traceroute do it for you. Running ping or traceroute on the 10 Digit Decimal Number will resolve its Dotted Decimal notation, showing you the dotted quad format equivalent. For example:
C:\>ping 2280853951 Pinging 188.8.131.52 with 32 bytes of data:
In case you were worried that we hadn't figured out what to do with media address control (MAC) addresses, don't worry, we still need them. Remember that devices on the same LAN segment are somewhat on a first-name basis. They don't refer to each other by the formal IP addresses used on the Internet. However, the MAC address is used only at the hardware layer, so when a process or application "up the stack" specifies another device on a network segment by IP address, it has to be translated into a MAC address. This is done by looking it up in the ARP table, which is automatically created by the Address Resolution Protocol. ARP is just one of a number of network services that run in the background, invisible to most users but essential to the operation of a network. Networked computers can be quite chatty, constantly comparing notes on routing tables, network conditions, and each other's presence.
There is a common belief that because MAC addresses are burned into the network interface card (NIC), they never can be changed. The MAC address can be changed by using the ifconfig command in Unix. Given that MAC addresses are sometimes used to identify the source of hostile activity, it should also come as no surprise that programs are available that can randomly change a MAC address.3 Don't automatically assume that a piece of equipment is useless as evidence because its MAC is different than you expected. The MAC you are seeking may have been changed through software, or the NIC may have been changed.
You probably already have used the most common tool for network debugging, ping. (By the way, the name is not an acronym; it is a reference to the underwater echolocation system called SONAR.) ping is a simple, yet greatly valuable program, that uses Internet Control Message Protocol's ECHO_REQUEST datagram. This datagram sends a request to the target machine and listens for an ICMP response. You can use ping to determine when a machine is alive and sometimes the DNS name of the machine. If you want to continuously keep checking for a "live" machine, you can use a program like What's Up Gold. With this program and others, you can input the IP address, and at preset intervals, it automatically checks to ensure that a specific service on a specific host is still reachable. Be aware that ping is a relatively noisy processit is easily detected by the remote system. Assume that a moderately savvy Internet criminal may be monitoring all forms of connection to his or her host, so don't ping someone when you don't want that person to know about it.
Domain Name Service (DNS)
We expect computers to refer to each other by numbersnot by name. Unfortunately, most humans don't do as well with numbers and prefer to use names that can be easily remembered, spoken, and typed, such as http://www.cia.gov or http://www.amazon.com. To accommodate this difference between humans and machine, the Domain Name Service (DNS) was developed. Internet DNS is effectively a huge global database, usable from any point in the Internet and capable of mapping human-readable names such as http://www.lucent.com to a corresponding numeric IP address. This process is called domain name resolution. Use of domain names and associated IP address ranges is controlled by the Internet Corporation for Assigned Names and Numbers (ICANN) through accredited registrars.4 The owner of each domain is responsible for placing all host names and corresponding IP addresses on a name server so that outsiders can resolve their names. Most name servers also support reverse lookups, which is the process of providing the human-readable domain name that corresponds to a specific numeric IP address. Many Internet applications perform reverse lookups as a simple security measure, checking to ensure that the IP address associated with an incoming connection attempt is associated with a registered domain namea weak but useful test.
The domain name server responsible for a particular domain may resolve any query with any IP address. The IP address may not be one within an IP address range assigned to that organization, and that doesn't matter. The owners of a particular domain, such as bubbabbq.com, may choose to host their Web site at someone else's facility. In this case, the specific machine, http://www.billybob.com, won't have an IP address contiguous with the rest of bubbabbq.com. This provides a great deal of flex-ibility, allowing organizations to move their machines from network to network, changing Web host service providers or ISPs without having to change their human-readable domain names.
Another type of network tool that will be useful to you in tracking an offender is one that can be manually used to resolve a domain name. The classic tool for this purpose, nslookup, is available on Unix, Windows NT, and Windows 2000. You can use nslookup to perform both forward and reverse lookups, resolving the IP address associated with a specific host name or obtaining the name associated with a numeric address.
In order to use a domain name, the owner must register it with the appropriate authoritya task that is usually facilitated by an ISP or one of several online services. At the time of registrationand ideally whenever it is changedthe owner of the domain is required to include name and contact information for a domain administrator. This person is expected to respond to email messages or telephone calls regarding activities associated with his or her domain. It should come as no surprise that these people are frequently not easy to contact. The whois utility can be used to obtain contact information on a specific domain from a server maintained by the appropriate Internet naming authority. Remember that whois information is furnished by the person who provided the registration information. It isn't really verified for accuracy; either through deliberate deception or an innocent mistake, it is possible to register an address and include inaccurate or totally false contact names, addresses, and phone numbers.
After pinging a system that we're researching (from a computer that is not going to resolve to our company in case the suspect is watching his or her network), we like to perform a whois just to see what comes up, keeping in mind that the information can be bogus. You don't have to have a whois utility on your workstation because several sites enable you to perform a whois over the Web. One of the most popular is the Sam Spade Web site.5Another popular and reliable Web-based whois service is provided by Network Solutions.6After we perform a whois, we like to follow up with an inverse name server lookup to see what it provides and compare the results to the whois output. The inverse lookup can be accomplished on a Unix or Linux machine (or with software such as NetScanTools Pro for Windows) with either the nslookup or the dig x command, and Sam Spade provides reverse lookup services also. You can use dig on an IP address like this:
dig x @123.456.789.000
dig x %domainname.com
dig is an alternative to nslookup, but we usually run nslookup again just to compare the results to all the previous queries.
After we have obtained contact information using the tools previously described, we usually run traceroute (or tracert) to see what route the packets are taking to get to their destination. Like ping, this handy utility sends your packets to the computer you are examining, so don't use it if you don't want to tip off the suspect that you are watching. We use the results from traceroute to help confirm or question the results of whois (see the example in Figure 2-2). For example, if the site is registered to the Netherlands but traceroute takes a few hops and stops at an ISP in Philadelphia, we might suspect that something is amiss. Be aware that many corporations have their Web sites hosted by an ISP, and not necessarily an ISP in their home townor even their home country.
Tracing route to awl.com [184.108.40.206]over a maximum of 30 hops:
Figure 2-2 Example traceroute output