Penetration Testing and Network Defense: Performing Host Reconnaissance
- Passive Host Reconnaissance
- Active Host Reconnaissance
- Port Scanning
- NMap
- Detecting a Scan
- Case Study
- Summary
Take advantage of the enemy's unreadiness, make your way by unexpected routes, and attack unguarded spots.
—Sun Tzu
The Duke of Wellington, who fought Napoleon at Waterloo, once said, "The most difficult part of warfare was seeing what was on the other side of the hill." Wellington realized that success at war meant more than combat; it also involved secrecy and reconnaissance.
Malicious hackers also value reconnaissance as the first step in an effective attack. For them, seeing what is on the "other side of the hill" is crucial to knowing what type of attack to launch. Launching attacks pertaining to UNIX vulnerabilities if the target is running only Microsoft servers makes no sense. A little time spent investigating saves a lot of time during the penetration attack. A malicious hacker might scope out a target for months before attempting to breach its security.
Although penetration testers might not always have the luxury of time that a malicious hacker might have, they do recognize the value of reconnaissance. The goal of reconnaissance is to discover the following information:
- IP addresses of hosts on a target network
- Accessible User Datagram Protocol (UDP) and Transmission Control Protocol (TCP) ports on target systems
- Operating systems on target systems
Figure 5-1 illustrates the process of unearthing this information.
Figure 5-1 Passive and Active Reconnaissance
Passive reconnaissance, as the figure shows, involves obtaining information from user group meetings, websites, Edgars' database, UUNet newsgroups, business partners, dumpster diving, and social engineering. Passive reconnaissance takes patience, but it is the most difficult for the target company to detect. Active reconnaissance, in contrast, involves using technology in a manner that the target might detect. This could be by doing DNS zone transfers and lookups, ping sweeps, traceroutes, port scans, or operating system fingerprinting. After you gather the information, you create a network map that diagrams the live hosts, their open UDP and TCP ports (which offers hints to the type of applications running on the hosts), and their respective operating systems. This information forms the skeleton to knowing what type of attacks to launch.
In this chapter, you learn how to discover live hosts on your target network using these various information-gathering techniques. Using port-scanning tools, you also learn how to determine the operating systems and open TCP and UDP ports on your target hosts. Finally, you learn best practices for the detection and prevention of reconnaissance techniques.
Passive Host Reconnaissance
As previously mentioned, you can use two different reconnaissance methods to discover information on the hosts in your target network:
- Passive reconnaissance
- Active reconnaissance
Passive reconnaissance gathers data from open source information. Open source means that the information is freely available to the public. Looking at open source information is entirely legal. A company can do little to protect against the release of this information, but later sections of this chapter explore some of the options available. Following are examples of open source information:
- A company website
- Electronic Data Gathering, Analysis, and Retrieval (EDGAR) filings (for publicly traded companies)
- Network News Transfer Protocol (NNTP) USENET Newsgroups
- User group meetings
- Business partners
- Dumpster diving
- Social engineering
All of these, with the exception of dumpster diving and social engineering, are discussed in this chapter. Review Chapter 4, "Performing Social Engineering," for more information about dumpster diving and social engineering.
A Company Website
If you are hired to perform a penetration test against a company's Internet presence, the first place you should look, obviously, is the company website. Begin by downloading the website for offline viewing. This allows you to spend more time analyzing each page without being detected and provides benefits later when you attempt to penetrate the website. In the process of downloading the website, you can also collect orphan pages. Orphan pages are web pages that might have been parts of the company website at one time but now have no pages linking to them. While the pages should be removed from the server, they often are not. They can contain useful information for the penetration tester.
Two programs that you can use for downloading a website for offline viewing are GNU Wget (ftp://ftp.gnu.org/pub/gnu/wget/) and Teleport Pro (http://www.tenmax.com). GNU Wget is free under the GNU license and can be run under Linux or Windows. Teleport Pro is commercial software that runs only on Windows.
Wget is a noninteractive command-line-driven website retrieval application that creates local copies of remote websites. Figure 5-2 shows Wget retrieving the pages off of http://www.hackmynetwork.com. Notice the use of the –r switch, which enables recursive mirroring of all pages on the site. You can specify the recursion maximum depth level with the –l switch. If you select the recursive option, Wget follows the hyperlinks and downloads referenced pages. Wget continues following hyperlinks up to the depth specified in the –l option.
Figure 5-2 Wget Web Retrieval
The goal of penetration testing is not only to see what access the auditor can gain, but also what the auditor is able to do without being detected. To minimize the possibility of detection when using Wget, you can use the following switches:
- --random-wait—Because some websites perform statistical analysis of website viewing to detect spidering and web retrieval software, you should use the --random-wait switch to vary your retrieval between 0 and 2 * wait seconds. Wait refers to the time specified with the wait switch.
- --wait= seconds—This switch specifies the number of seconds between retrievals. You should use this along with the --random-wait switch.
- --cookies= on/off—Cookies enable web servers to keep track of visitors to their websites. Disabling this switch prevents the server from tracking your viewing of its website; however, you might want this enabled for cookie-based exploits discussed later in Chapter 7, "Performing Web-Server Attacks."
- --H—This switch enables host spanning. Host spanning allows Wget not only to collect web pages on your target site but also enable recursive mirroring of any sites referenced by hyperlinks on the web pages. Be careful with this switch because it then mirrors the referenced site and any sites it references. This can consume a significant amount of hard drive space.
- --D—This is the domain switch that, when used with the --H switch, limits host spanning to only the domains listed.
Because you will probably use the same switches each time you use Wget, you can include the switches in the wget.rc file. By listing the switches in this file, you do not have to type them every time you launch the program. The syntax might vary slightly from the switches previously listed, so be sure to read the documentation before you create the file.
If command-line switches are not your thing, you can use the Windows-based Teleport Pro program from Tennyson Maxwell. After you launch Teleport Pro, you are prompted with the New Project Wizard, as shown in Figure 5-3.
Figure 5-3 Teleport Pro New Project Wizard
For the purpose of offline browsing, select the Create a browsable copy of a website on my hard drive option. After you click Next, you are prompted with the screen shown in Figure 5-4.
Figure 5-4 Teleport Pro Starting Address Screen
On the Starting Address screen, enter the website you want to store offline. Note that the address is case-sensitive. You can choose how deep you want Teleport Pro to explore. The default is up to three links, which is sufficient for most retrievals. On the next screen (Project Properties), shown in Figure 5-5, you can specify what type of files you want to retrieve. Teleport Pro is limited to retrieving the files displayed in the Project Properties screen options. Typically, you would choose the Everything option, but if you are on limited bandwidth and do not care about graphics, you can choose the Just text option.
Figure 5-5 Teleport Pro Project Properties Screen
You can also enter an account name and password to access the site if it is needed, but because you probably do not know any usernames or passwords at this point (you will learn how to discover these in Chapter 7), you should leave this blank.
After you select Next, you are prompted to finish the wizard and select where to save the project file. Having a project file is useful if you want to return and copy the website again.
When you are ready to begin copying the target website, you can either go to the Project menu and choose Start or click the Play button on the toolbar. When the project is finished, you see a screen like that in Figure 5-6, which shows you how many files were requested and how many were received. If the number of failed requests is high, you can change retrieval parameters with the Project Properties screen under the Project menu.
Figure 5-6 Teleport Pro Retrieval Completion Screen
After you have copied down the website, either through Teleport Pro or Wget, you can browse it offline in your preferred web browser. With respect to host reconnaissance, you should be looking for two things:
- Comments in the source code
- Contact information
Comments in the source code might reveal what platform the website is running on, which is useful later when you attempt to infiltrate the target web server. You can view the source code by opening the web pages in an HTML editor, text editor, or within the browser. In Internet Explorer, you can view source code by choosing Source under the View menu. Figure 5-7 shows sample source code of a web page.
Figure 5-7 Sample Web Page Source Code: Comments Reveal HTML Authoring Tool Used
Comments start with the <!—- HTML tag and end with -->. Figure 5-7 shows that the web page was written with Microsoft FrontPage. Exploits are related to Microsoft FrontPage, so document this information for later.
Figure 5-8 shows another example of useful comment information. Here, you can see that this site was developed by XYZ Web Design Company. Although at first glance this might not reveal much, it is actually useful information. Many web design companies advertise what type of platform they develop sites for, such as Microsoft or UNIX. By going to the XYZ Web Design website, you might learn that they specialize in ASP, .NET, and FrontPage. You can conclude with fair certainty that because these specializations are all used on Microsoft platforms, the target website is running on Microsoft Internet Information Server (IIS). If XYZ Web Design advertises that its specialty is Perl, CGI, PHP, and Python, the target website is most likely running on a UNIX-based platform. Although all of these technologies can also run on Windows, they are more common on the UNIX platform.
Figure 5-8 Sample Web Page Source Code: A Third-Party Web Developer Is Revealed
After you look at the source code, examine any contact information published on the target site. Typically, you can find this by clicking on links labeled About Us or Contact Information. Figure 5-9 shows an example of a page with information about the company executives.
Figure 5-9 Sample Contact Information Web Page
On this web page, you see executive names along with their phone numbers and e-mail information. This can be useful for performing social engineering, as discussed in Chapter 4. The phone numbers in Figure 5-9 are also useful for war dialing techniques, in which you dial a range of phone numbers with software such as Tone Loc or THC war dialer and attempt to establish remote access connectivity. In the figure, all phone numbers start with the prefix 503 555-1 followed by the extension number. Armed with this knowledge, you can configure your war dialing software to dial all numbers within the range 503 555-1000 through 503 555-1999 and detect modems used for remote access.
If possible, companies should list only 800 numbers on their website connecting the caller to a receptionist to minimize the risk of war dialing attacks. If employee information is to be displayed, make sure policies are in place and enforced that protect against social engineering attacks.
EDGAR Filings
Publicly traded companies in the United States are required to file with the Security and Exchange Commission (SEC). You can access this information through the EDGAR database, which you can view at http://www.sec.gov/edgar.shtml. Searches can reveal financial information and press releases. Some companies advertise the technology used in their organization in a press release posted to EDGAR filings. This saves time when trying to determine the operating system through other means.
NNTP USENET Newsgroups
If you have ever had to troubleshoot a difficult problem, you know the value of networking with others to find a solution. One of the methods that engineers use to seek help is by posting questions on USENET newsgroups. Unfortunately, some post too much information when they are seeking help.
Example 5-1 shows an engineer named Bill asking a question about a problem he is experiencing. In his message, he describes that he is running Red Hat Linux 6.2. No company should give up this information so freely to the public.
Example 5-1. Sample Newsgroup Posting
From: bsmith@hackmynetwork.com Subject: Apache Problem Newsgroups: comp.infosystems.www.servers.unix, comp.os.linux, alt.apache.configuration, comp.lang.java.programmer Date: 2004-07-07 09:19:28 PST I am having a problem with Apache reverse proxy not communicating with web applications using HTTP 1.1 keepalive. I am using Apache 1.3.23 on Red Hat Linux 6.2. It is compiled with mod_proxy and mod_ssl. Any ideas would be greatly appreciated. Thank you. ------- bsmith@hackmynetwork.com Sr. Systems Administrator Hackmynetwork.com
Example 5-1 also shows the e-mail address of Bill: bsmith@hackmynetwork.com. This not only reveals the name of the company Bill works for (Hackmynetwork), but also might reflect his user account on the network. Unfortunately, many companies still use the same network logon name as their e-mail name. Although you can not know for certain, you should document his e-mail address when doing host reconnaissance. Because he works on production servers for the target company, you might be able to gain full access to his network if you crack his password. (You will learn more about password cracking in Chapter 9, "Cracking Passwords.")
You can browse newsgroups using software such as Microsoft Outlook Express, Netscape, Xnews, and many others. Alternatively, and perhaps more effectively, you can search newsgroups using Google. Just enter the name of the target company, and you will obtain all newsgroup messages posted from or related to your target company.
User Group Meetings
If searching through thousands of Newsgroup messages is not your forte (or your idea of a fun afternoon), you might try attending user group meetings. Most cities hold user group meetings related to various technologies, such as Microsoft development, Cisco technologies, Linux, and even penetration testing. User group meetings provide an opportunity for people in a community to receive information and meet others who work with the same technology.
Attending user groups is a great way to practice your social engineering skills learned in Chapter 4. By arriving early or staying late after the meeting, you can network with others and discover what companies people work for and what technologies their companies use.
Of course, knowing which user groups your target employees are attending is difficult. Penetration testers should frequent user group meetings and talk to as many people as possible at each meeting. When a client requests your service, you might already know that the client runs Microsoft servers, for example, because you met an employee of the client at a Microsoft user group.
Business Partners
If perusing a target website, searching EDGAR filings and newsgroups, or attending user groups does not provide you with the information you need, you might have to check the business partners of the target for more information. Although the target might protect against giving away technical information, the partners might not.
A company website often reveals who the business partners are, but a more effective means of obtaining partner information is using Google. For instance, if you enter link:www.hackmynetwork.com in the Google search box, your search pulls up all sites that link back to your target site.
By going to all the sites listed in your search results, you might uncover technologies in use by your target. Network integrators are notorious for listing their client names and the technologies they specialize in. If you see a network integrator that specializes in Sun Solaris solutions and links back to your target website, you can safely assume that your target is running on Sun Solaris servers.