Servlets for the World Wide Web
When the term Servlet is mentioned, it is almost always implied that the Servlet is an instance of HttpServlet3. The explanation of this is simple. The HyperText Transfer Protocol (HTTP)4 is used for the vast majority of transactions on the World Wide Webevery Web page you visit is transmitted using HTTP, hence the http:// prefix. Not that HTTP is the best protocol to ever be made, but HTTP does work and HTTP is already widely used. Servlet support for HTTP transactions comes in the form of the javax.servlet.http.HttpServlet class.
Before showing an example of an HttpServlet, it is helpful to reiterate the basics of the HyperText Transfer Protocol. Many developers do not fully understand HTTP, which is critical in order to fully understand an HttpServlet. HTTP is a simple, stateless protocol. The protocol relies on a client, usually a Web browser, to make a request and a server to send a response. Connections only last long enough for one transaction. A transaction can be one or more request/response pairs. For example, a browser will send a request for an HTML page followed by multiple requests for each image on that page. All of these requests and responses will be done over the same connection. The connection will then be closed at the end of the last response. The whole process is relatively simple and occurs each time a browser requests a resource from an HTTP server5.
Requests, Responses, and Headers
The first part of an HTTP transaction is when an HTTP client creates and sends a request to a server. An HTTP request in its simplest form is nothing more than a line of text specifying what resource a client would like to retrieve. The line of text is broken into three parts: the type of action, or method, that the client would like to do; the resource the client would like to access; and the version of the HTTP protocol that is being used. For example:
GET /index.html HTTP/1.0
The preceding is a completely valid HTTP request. The first word, GET, is a method defined by HTTP to ask a server for a specific resource; /index.html is the resource being requested from the server; HTTP/1.0 is the version of HTTP that is being used. When any device using HTTP wants to get a resource from a server, it would use something similar to the above line. Go ahead and try this by hand against Tomcat. Open up a telnet session with your local computer on port 80. From the command prompt this is usually accomplished with:
telnet 127.0.0.1 80
Something similar to Figure 2-2 should appear.
Figure 2-2. Telnet to localhost:80
The telnet program has just opened a connection to Tomcat's Web server. Tomcat understands HTTP, so type6 in the example HTTP statement. This HTTP request can be terminated by a blank line, so hit Enter a second time to place an additional blank line and finish the request7.
GET /jspbook/index.html HTTP/1.0
The content of index.html is returned from the Web Application mapped to /jspbook (the application we started last chapter), as shown in Figure 2-3.
Figure 2-3. Manual HTTP Request and the Server's Response
You just sent a basic HTTP request, and Tomcat returned an HTTP response. While usually done behind the scenes, all HTTP requests resemble the preceding. There are a few more methods to accompany GET, but before discussing those, let's take a closer look at what Tomcat sent back.
The first thing Tomcat returned was a line of text:
HTTP/1.1 200 OK
This is an HTTP status line. Every HTTP response starts with a status line. The status line consists of the HTTP version, a status code, and a reason phrase. The HTTP response code 200 means everything was fine; that is why Tomcat included the requested content with the response. If there was some sort of issue with the request, a different response code would have been used. Another HTTP response code you are likely familiar with is the 404 "File Not Found" code. If you have ever followed a broken hyperlink, this is probably the code that was returned.
HTTP Response Codes
In practice, you usually do not need to understand all of the specific HTTP response codes. JSP, Servlets, and Web servers usually take care of these codes automatically, but nothing stops you from sending specific HTTP response codes. Later on we will see examples of doing this with both Servlets and JSP. A complete list of HTTP response codes along with other HTTP information is available in the current HTTP specification, http://www.ietf.org/rfc/rfc2616.txt.
Along with the HTTP response code, Tomcat also sent back a few lines of information before the contents of index.html, as shown in Figure 2-4.
Figure 2-4. Example HTTP Headers
All of these lines are HTTP headers. HTTP uses headers to send meta-information with a request or response. A header is a colon-delimited name:value pairthat is, it contains the header's name, delimited by a colon followed by the header's value. Typical response headers include content-type descriptions, content length, a time-stamp, server information, and the date the content was last changed. This information helps a client figure out what is being sent, how big it is, and if the data are newer than a previously seen response. An HTTP request will always contain a few headers8. Common request headers consist of the user-agent details and preferred formats, languages, and content encoding to receive. These headers help tell a server what the client is and how they would prefer to get back information. Understanding HTTP headers is important, but for now put the concept on hold until you learn a little more about Servlets. HTTP headers provide some very helpful functionality, but it is better to explain them further with some HttpServlet examples.
GET and POST
The first relatively widely used version of HTTP was HTTP 0.9. This had support for only one HTTP method, or verb; that was GET. As part of its execution, a GET request can provide a limited amount of information in the form of a query string9. However, the GET method is not intended to send large amounts of information. Most Web servers restrict the length of complete URLs, including query strings, to 255 characters. Excess information is usually ignored. For this reason GET methods are great for sending small amounts of information that you do not mind having visible in a URL. There is another restriction on GET; the HTTP specification defines GET as a "safe" method which is also idempotent10. This means that GET must only be used to execute queries in a Web application. GET must not be used to perform updates, as this breaks the HTTP specification.
To overcome these limitations, the HTTP 1.0 specification introduced the POST method. POST is similar to GET in that it may also have a query string, but the POST method can use a completely different mechanism for sending information. A POST sends an unlimited amount of information over a socket connection as part of the HTTP request. The extra information does not appear as part of a URL and is only sent once. For these reasons the POST method is usually used for sending sensitive11 or large amounts of information, or when uploading files. Note that POST methods do not have to be idempotent. This is very important, as it now means applications have a way of updating data in a Web application. If an application needs to modify data, or add new data and is sending a request over HTTP, then the application must not use GET but must instead use POST. Notice that POST requests may be idempotent; that is, there is nothing to stop an application using POST instead of GET, and this is often done when a retrieval requires sending large amounts of data12. However, note that GET can never be used in place of POST if the HTTP request is nonidempotent.
In the current HTTP version, 1.1, there are in total seven HTTP methods that exist: GET, PUT, POST, TRACE, DELETE, OPTIONS, and HEAD. In practice only two of these methods are usedthe two we have already talked about: GET and POST.
The other five methods are not very helpful to a Web developer. The HEAD method requests only the headers of a response. PUT is used to place documents directly to a server, and DELETE does the exact opposite. The TRACE method is meant for debugging. It returns an exact copy of a request to a client. Lastly, the OPTIONS method is meant to ask a server what methods and other options the server supports for the requested resource.
As far as this book is concerned, the HTTP methods will not be explained further. As will soon be shown, it is not important for a Servlet developer to fully understand exactly how to construct and use all the HTTP methods manually. HttpServlet objects take care of low-level HTTP functionality and translate HTTP methods directly into invocations of Java methods.
HTTP Response Codes
An HTTP server takes a request from a client and generates a response. Responses, like requests, consist of a response line, headers, and a body. The response line contains the HTTP version of the server, a response code, and a reason phrase. The reason phrase is some text that describes the response, and could be anything, although a recommended set of reason phrases is given in the specification. Response codes themselves are three-digit numbers that are divided into groups. Each group has a meaning as shown here:
1xx: Informational: Request received, continuing process.
2xx: Success: The action was successfully received, understood, and accepted.
3xx: Redirection: Further action must be taken in order to complete the request.
4xx: User-Agent Error: The request contains bad syntax or cannot be fulfilled.
5xx: Server Error: The server failed to fulfill an apparently valid request.
Each Status: Code has an associated string (reason phrase).
The status code you'll see most often is 200. This means that everything has succeeded and you have a valid response. The others you are likely to see are:
401: you are not authorized to make this request
404: cannot find the requested URI
405: the HTTP method you have tried to execute is not supported by this URL (e.g., you have sent a POST and the URL will only accept GET)
500: Internal Server Error. You are likely to see this if the resource to where you are browsing (such as a Servlet) throws an exception.
If you send a request to a Servlet and get a 500 code, then the chances are your Servlet has itself thrown an exception. To discover the root cause of this exception, you should check the application output logs. Tomcat's logs are stored in /logs13 directory of the Tomcat installation.