Build Your Own Java-Based Email Programs
Socket, URI, and URL: Those concepts form the foundation on which Java's Network API rests. Because I explored the socket, URI, and URL concepts (and associated classes) in my two previous Network API articles, you might wonder what more needs to be said about the Network API. My answer: plenty.
For example, it is possible to discuss socket factories and URL protocol handlersand even to discuss useful programs that work with sockets, URIs, and URLs. One useful program is a World Wide Web (WWW) browser that lets users view HTML pages with ease. Creating a Java-based browser is not as difficult a task as you might think because Java's class library includes two classes that work with the Network API to implement a browser for HTML (version 3.2) pages: javax.swing.JEditorPane and javax.swing.text.html.HTMLEditorKit.
Although it would be interesting to explore the creation of a Java-based WWW browser (and those classes) in this article, I will not do that because my goal is to explore electronic mail (email). Specifically, I plan to introduce you to the anatomy of an email message and then show you how to use the Network API to develop programs that send and receive email messages.
Programs typically use Simple Mail Transfer Protocol (SMTP) to send email messages and Post Office Protocol 3 (POP3) to receive email messages. Because this article provides only a brief look at those network protocols, you should read the following Request For Comments (RFC) documents (after you finish reading this article) to learn more about SMTP and POP3:
Sun provides the high-level JavaMail API for working with email. I have chosen not to discuss that API because it is my desire to show how email works at a low level. Once you complete this article, you might want to learn more about JavaMail. To visit Sun's official JavaMail API WWW page, point your browser to http://java.sun.com/products/javamail/index.html.
Version 1.4 (beta 2) of Sun's Java 2 Standard Edition (J2SE) SDK was used to build this article's programs.
Anatomy of an Email Message
Before building your own email program, you should understand the anatomy (that is, format) of an email message. That anatomy bases itself on RFC 2822, "Internet Message Format."
According to RFC 2822, an email message consists of a sequence of lines, with each line consisting of ASCII characters (whose codes range from 1 through 127) and ending with a carriage-return character (ASCII code 13) followed by a newline character (ASCII code 10). Furthermore, the maximum length of each line (excluding the carriage-return and newline characters) is 998 characters. Various lines provide information that are important to the whole message, known as header fields (or headers, for short). Other lines provide the message's content. Figure 1 illustrates the anatomy of an email message as lines and headers/content.
Figure 1 Email message anatomy.
Headers provide information concerning the email message's origin (who sent the email message), the email message's destination (who will receive the email message), the subject of the email message, and so on. Each header is organized as a name and a colon character, followed by one or more values of relevance to that header. Some header values identify mailboxes (conceptual entities that receive email messages). Each of those mailbox values is either a display name and address specification (in which the address specification is enclosed by angle brackets) or just an address specification (not enclosed by angle brackets).
The following example illustrates a mailbox address specification followed by a mailbox display name and address specification:
Doe@x.org John Doe <Doe@x.org>
Who sent the email message? The From:, Sender:, and Reply-To: headers provide information regarding an email message's origin. From: identifies the mailbox(es) of the email message's author(s), Sender: identifies the mailbox of the agent (person or machine) responsible for sending the email message, and Reply-To: identifies the mailbox(es) to which replies should be directed.
It is possible for an email message to have multiple authors. Therefore, the From: header specifies either a single mailbox value or a comma-delimited list of mailbox values. However, From: should not list any mailbox value that does not belong to an author.
The following example illustrates a From: header consisting of a single author's mailbox value and a From: header consisting of two authors' mailbox values.
From: John Doe <Doe@x.org> From: Sally Smith <Smith@x.org>, Doe@x.org
It is not possible for an email message to have multiple senders. Therefore, the Sender: header specifies a single mailbox value. Furthermore, if only a single author's mailbox value is specified in the From: header, and if that mailbox value's address specification is identical to the sender mailbox value's address specification, the Sender: header should not be present (because that header is redundant). Otherwise, the Sender: header should be present (according to RFC 2822).
The following example illustrates a Sender: header that specifies a single mailbox value:
Sender: Jane Smith <JSmith@x.org>
It's possible to direct replies to multiple mailboxes. Therefore, the Reply-To: header specifies either a single mailbox value or a comma-delimited list of mailbox values. If Reply-To: is present, an email program directs its replies to all mailbox values listed by that header. However, if that header is absent, an email program directs its replies to all mailbox values listed by the From: header.
The following example illustrates a Reply-To: header that specifies three mailbox values. Replies are sent to the mailboxes identified by those values.
Reply-To: Smith@x.org, John Doe <Doe@x.org>, JSmith@x.org
Who will receive the email message? The To: and Cc: headers provide information about an email message's destination. To: specifies the primary recipient(s) of the message, and Cc: (carbon copy) specifies the secondary recipient(s). For each header, either a single mailbox value or a comma-delimited list of mailbox values appears as part of that header.
The following example illustrates To: and Cc: headers. The To: header specifies a single mailbox value for the primary recipient and the Cc: header specifies two mailbox values for the secondary recipients.
To: Jeff Friesen <email@example.com> Cc: Smith@x.org, Doe@x.org
When an email message is being turned into a reply, place Reply-To: mailbox values (if present) in the To: header. Otherwise, use From: mailbox values.
RFC 2822 presents many headers besides the originator and receiver headers. For example, the Subject: header provides an optional title for an email message. That means that a sequence of ASCII characters can follow Subject: and a colon character, and that a sequence of ASCII characters serves as the email message's title.
The following example illustrates a Subject: header in context with other headers:
From: John Doe <Doe@x.org> To: Jeff Friesen <firstname.lastname@example.org> Cc: Bill Jones <email@example.com> Subject: Accounting Details
The example shows that John Doe is authoring an email message, the email message is destined for Jeff Friesen's mailbox at firstname.lastname@example.org, Jeff Friesen is the primary recipient, Bill Jones (email@example.com) is the secondary recipient, and the subject of the email message is Accounting Details.
The previous article introduced you to Multipurpose Internet Mail Extensions (MIME). MIME lets an email program attach a file of binary data to an email message, which is known as an attachment, and transmit that file's contents as part of the email message. MIME accomplishes that task by introducing a variety of headers. The most important headers for attachments are Content-Type: (which classifies the type and subtype of data that serves as an email message's content) and Content-Transfer-Encoding: (which specifies an encoding of 8-bit binary data to 7-bit ASCII data).
Among the various types and subtypes that can be specified in the Content-Type: header, MIME reserves type multipart and subtype mixed for attachments. That type/subtype combination signifies content broken up into multiple body parts, with each body part representing an attachment and having its own Content-Type: and Content-Transfer-Encoding: headers. To help an email program differentiate a body part from the next body part, MIME requires a sending email program to include a boundary parameter as part of the Content-Type: multipart/mixed header. boundary's value (between double-quote characters) is a character sequence that delimits a body part from the next body part. Before transmitting a body part, an email program transmits a carriage-return character, a newline character, two hyphen characters, and boundary's valuea character sequence known as an encapsulation boundary. Following the final body part, an email program transmits an encapsulation boundary and two hyphens (--).
The following code fragment identifies a plain-text email message with characters taken from the iso-8859-1 character set, and a plain-text attachment that associates its contents with file.txt. Content-Transfer-Encoding:'s absence implies default 7BIT ASCII.
Content-Type: multipart/mixed; boundary="***" --*** Content-Type: text/plain; charset="iso-8859-1" This message has an attachment. --*** Content-Type: text/plain; name="file.txt" Attachment text. --***--