Simplified support for the development of network software is one of Java's strengths. That support manifests itself through Java's Network API, a collection of classes and interfaces located in packages java.net and javax.net. While writing my book Java 2 by Example, Second Edition (Que, 2000), I intended to include a chapter on the Network API. Unfortunately, I ran out of time and that chapter did not make it into my book. Because the thought of not including a chapter on the Network API bothered me, I decided to create a trilogy of articles that explores that API. The article that you are currently reading and its companion articles form that trilogy and serve as my book's final chapter.
My articles explore the Network API in the context of the Internet, a global collection of interconnected networks. If you are not acquainted with the term, a network is an interconnected set of computers and other devices that enables communication and resource sharing. Each networked computer is known as a host.
This article introduces you to the sockets concept. You then have an opportunity to work with the sockets portion of the Network API. Once you finish this article, you will be capable of using sockets for low-level network communications. The second article introduces you to the concepts of URIs and URLs. You then have an opportunity to work with the Network API's URI, URL, and URL-related classes. Once you finish the next article, you will be capable of using URL (and related classes) for high-level network communications with the Internet's World Wide Web (WWW).
Have you ever wanted to know how electronic mail (e-mail) works? The final Network API article explores e-mail. You learn the anatomy of an e-mail message, how to send an e-mail message, and how to receive an e-mail message. Once you finish that article, you will be capable of building GUI-based programs to send and receive e-mail.
Version 1.4 (Beta 2) of Sun's Java 2 Standard Edition (J2SE) SDK was used to build this article's programs.
What Is a Socket?
The Network API is typically used to enable communication between a Java program and another program across a TCP/IP[nd]based network, such as the Internet. To enable communication, the Network API relies upon sockets. A socket is an endpoint in a communication link between two programs. One program writes a message (a sequence of bytes) to a socket, which forwards that message to the other socket, which makes that message available to the other program, as illustrated in Figure 1.
Figure 1 Two programs use sockets to communicate with each other across a TCP/IP-based network.
According to Figure 1, Program A on Host A is writing a message to a socket. The contents of that socket are accessed by Host A's network-management software, which sends the message through Host A's network interface card (NIC) to Host B. Host B's NIC gets the message and passes it to Host B's network-management software, which deposits the message in Host B's socket. Program B can then read that message from the socket.
Suppose that a third host is added to Figure 1's network. How does Host A know that the message is meant for Host B and not for the new host? Each host attached to a TCP/IP[nd]based network is given a unique IP address, which is (usually) a 32-bit unsigned integer that makes it possible to distinguish among hosts. (An IP address is analogous to a street address.) Because people do not converse in binary, IP addresses are often shown using dotted-decimal notation. An example is 188.8.131.52. As you can see, there are four components comprising the address: 198, 163, 227, and 6. Each component ranges from 0 through 255 (inclusive) and accounts for 8 bits of the address.
IP addresses that occupy 32 bits are known as IPv4 (Internet Protocol version 4) addresses. Because the Internet is running out of IPv4 addresses, IPv4 is slowly being replaced with IPv6 (Internet Protocol version 6). Unlike IPv4 addresses, an IPv6 address is a 128-bit unsigned integer.
Suppose that a second network-aware program is added to Host B in Figure 1's network. How does Host A know that the message is meant for Program B and not for the new program? Each program communicating over a TCP/IP[nd]based network is given a unique port and port number. A port is a message buffer that holds a socket's incoming/outgoing message, and the port number is a 16-bit unsigned integer ranging from 0 through 65,535 (inclusive) that identifies a port and makes it possible to distinguish among network-aware programs on a given host. (A port number is analogous to the box number of a house on a street.) Port numbers less than 256 are reserved for standard programs, such as POP3's port number 110. (I discuss POP3 in my third article in this series.)
Each socket combines an IP address with a port and a port number. Those entities identify that socket to other sockets. Subsequent sections explore two categories of sockets: stream and datagram.
This section referred to TCP/IP without providing any explanation of that term. TCP/IP is an acronym for Transmission Control Protocol/Internet Protocol, the main network protocols (rules for formatting messages and routing those messages among hosts) found in a host's network-management software. IP routes message chunks, known as IP packets, to the correct host by using each IP packet's embedded IP address. TCP establishes a connection between two hosts for sending and receiving messages consisting of multiple IP packets. On the sending end, TCP divides a message into multiple IP packets and relies on IP to deliver those IP packets to their destination host. On the receiving end, TCP assembles those IP packets into the original message. A third network protocol comprising TCP/IP[md]User Datagram Protocol (UDP)[md]allows a message that fits into a single IP packet to be sent without requiring a connection. TCP is a reliable but slow network protocol: It guarantees that a message will reach its destination (without errors), but it takes time to establish a connection. By contrast, UDP is an unreliable but fast network protocol: It does not guarantee that a message will reach its destination (or arrive without errors), but it does not need to take time establishing a connection.