Home > Articles > Programming > Java

Like this article? We recommend

Like this article? We recommend

Working with URLs

The Network API makes it possible to work with URLs at the source code level by providing class URL (located in package java.net). Each URL object encapsulates a resource's identifier with a protocol handler. As the previous tip indicates, one way to obtain a URL object is to call a URI object's toURL() method. However, that option is not always convenient. (Why should you have to create a URI object whenever you need a URL object?) Instead, you can call a URL constructor to create a URL object. You can also call URL methods to extract URL components, open an input stream to read from the resource, obtain a reference to an object that makes it possible to retrieve the resource's data in a convenient fashion, compare the URLs in two URL objects, and obtain a connection object to the resource. The connection object allows code to learn more about (and write to) the resource.

A close look at class URL reveals six constructors. The simplest constructor is URL(String url). That constructor takes a URL as a String argument, parses the URL into its components, and stores those components in a new URL object. As with the other five constructors, URL(String url) throws a java.net.MalformedURLException object if either the URL contains no protocol component or the URL's protocol is unknown.

The following code fragment demonstrates using URL(String url) to create a URL object. That object encapsulates a simple URL's components with an http protocol handler.

URL url = new URL ("http://www.informit.com");

Once you have a URL object, you can extract various components by calling the methods getAuthority(), getDefaultPort(), getFile(), getHost(), getPath(), getPort(), getProtocol(), getQuery(), getRef(), and getUserInfo(). The getDefaultPort() method returns the default port that the URL object's protocol handler uses (to locate the resource) when a port is not specified as part of the URL. The getFile() method returns a combination of the path and query components. The getProtocol() method returns the name of the protocol (such as http, mailto, ftp, and so on) that determines the type of connection to the resource. The getRef() method returns the fragment (also known as a reference or an anchor) portion of the URL. Finally, the getUserInfo() method returns the user information portion of the authority component. As with URI's component extraction methods, URL's extraction methods return null or -1 if their components do not exist (although getDefaultPort() returns -1 if a default port has not been assigned to the URL object's protocol handler).

In addition to the component-extraction methods, you can call the openStream() method to retrieve a java.io.InputStream reference. Using that reference, you can read from the resource in a byte-oriented fashion.

Listing 4 presents source code to URLDemo1. That program creates a URL object from a command-line argument, calls URL's component-extraction methods to retrieve the URL's components, calls URL's openStream() method to open a connection to the resource (via the protocol handler) and return an InputStream reference for reading bytes from that resource, reads/prints those bytes, and closes the input stream.

Listing 4: URLDemo1.java

// URLDemo1.java

import java.io.*;
import java.net.*;

class URLDemo1
{
  public static void main (String [] args) throws IOException
  {
   if (args.length != 1)
   {
     System.err.println ("usage: java URLDemo1 url");
     return;
   }

   URL url = new URL (args [0]);

   System.out.println ("Authority = " +
             url.getAuthority ());

   System.out.println ("Default port = " +
             url.getDefaultPort ());

   System.out.println ("File = " +
             url.getFile ());

   System.out.println ("Host = " +
             url.getHost ());

   System.out.println ("Path = " +
             url.getPath ());

   System.out.println ("Port = " +
             url.getPort ());

   System.out.println ("Protocol = " +
             url.getProtocol ());

   System.out.println ("Query = " +
             url.getQuery ());

   System.out.println ("Ref = " +
             url.getRef ());

   System.out.println ("User Info = " +
             url.getUserInfo ());

   System.out.print ('\n');

   InputStream is = url.openStream ();

   int ch;
   while ((ch = is.read ()) != -1)
     System.out.print ((char) ch);

   is.close ();
  }
}

URLDemo1 produces the following (though slightly modified) output from java URLDemo1 http://www.javajeff.com/articles/articles/html:

Authority = http://www.javajeff.com
Default port = 80
File = /articles/articles.html
Host = http://www.javajeff.com
Path = /articles/articles.html
Port = -1
Protocol = http
Query = null
Ref = null
User Info = null

<html>
 <head>
  <title>
   Java Jeff - Articles
  </title>

  <meta http-equiv=Content-Type content="text/html; 
   charset=ISO-8859-1">
  <meta name=author content="Jeff Friesen">
  <meta name=keywords content="java, virtual machine">

  <script language=JavaScript>
   if (navigator.appName == "Netscape")
     document.write ("<br>");
  </script>
 </head>

 <body bgcolor=#000000>
  <center>
   <table border=1 cellpadding=5 cellspacing=0>
    <tr>
     <td>
      <table cellpadding=0 cellspacing=0>
       <tr>
        <td>
         <a href=informit/informit.html>
          <img alt=InformIT border=0 src=informit.gif></a>
        </td>
       </tr>
      </table>
     </td>

     <td align=middle>
      <img src=title.gif><br>

      <a href=../welcome/welcome.html>
       <img alt="Welcome to Java Jeff!" border=0 src=jupiter.jpg>
      </a><br>

      <img src=../common/clear_dot.gif vspace=5><br>

      <a href=../ads/ads.html>
       <img alt="Welcome to Java Jeff!" border=0 
        src=jupiter.jpg>
     </td>

     <td>
      <table cellpadding=0 cellspacing=0>
       <tr>
        <td>
         <a href=javaworld/javaworld.html>
          <img alt=JavaWorld border=0 src=javaworld.gif></a>
        </td>
       </tr>
      </table>
     </td>
    </tr>
   </table>
  </center>

  <br>
  <font color=#ffffff>
   <center>
    Best viewed at a resolution of 1024x768 or higher.<br>

    <img src=../common/clear_dot.gif vspace=5><br>

    <i>
     Copyright &copy; 2001-2002, Jeff Friesen. All rights 
     reserved.
    </i>

    <p>
    <a href=../index.html>
     <img alt=Back border=0 src=../common/back.gif></a>
   </center>
  </font>
 </body>
</html>

Among other things, the output identifies 80 as the default port and HTTP as the protocol, and gives the HTML for one of the WWW pages (the resource) that comprise my Web site.

URL's openStream() method always returns a reference to an object created from a concrete subclass of the abstract InputStream class. That implies that you read resource data as a byte sequence, and this is appropriate because you do not know what kind of data is being read. If you know ahead of time that the data is textual, with each line ending with a newline (\n) character, you can read the data as a sequence of lines instead of 1 byte at a time.

The following code fragment demonstrates wrapping the InputStream subclass object in a java.io.InputStreamReader object to bridge from 8-bit bytes to 16-bit characters, wrapping the resulting object in a java.io.BufferedReader object to access BufferedReader's readLine() method, and calling method readLine() to read entire lines of text from the resource.

InputStream is = url.openStream ();
BufferedReader br = new BufferedReader (new InputStreamReader (is));
String line;
while ((line = br.readLine ()) != null)
  System.out.println (line);
is.close ();

Sometimes reading data as a sequence of bytes is not convenient. For example, if the resource is a JPEG file, it is more natural to obtain an image producer and register a consumer with that producer to consume the data. It is then a simple matter to display the image once the image is completely consumed. For that to happen, it is necessary to use URL's getContent() method.

When called, getContent() returns an Object reference to an object whose methods (after casting to the proper type) can be called to retrieve the data in a more convenient fashion. Before calling that method, however, you should use instanceof to verify the object's type, to prevent class cast exceptions.

For JPEG resources, getContent() returns an object whose class implements the java.awt.Image.ImageProducer interface. The following code fragment demonstrates using instanceof to verify the object is an ImageProducer and making a cast. ImageProducer methods (although not shown) can subsequently be called to register a consumer and initiate the process of consuming the image.

URL url = new URL (args [0]);
Object o = url.getContent ();
if (o instanceof ImageProducer)
{
  ImageProducer ip = (ImageProducer) o;
  // ...
}

TIP

Call URL's equals(Object o) and sameFile(Object o) methods to determine whether two URLs are equal. The first method includes the fragment in the comparison, whereas the second method ignores the fragment. Consult the SDK documentation for more information on those methods.

Study the getContent() method's source code, and you will discover openConnection().getContent(). Furthermore, study the openStream() method's source code and you will discover openConnection().getInputStream(). Each method first makes a call to URL's openConnection() method. That method returns a reference to an object created from a subclass of the abstract java.net.URLConnection class that describes a connection to some resource. URLConnection's methods reveal resource and connection details, and make it possible for code to write to the resource.

Listing 5's URLDemo2 source code demonstrates openConnection() and calls to some of URLConnection's methods.

Listing 5: URLDemo2.java

// URLDemo2.java

import java.io.*;
import java.net.*;
import java.util.*;

class URLDemo2
{
  public static void main (String [] args) throws IOException
  {
   if (args.length != 1)
   {
     System.err.println ("usage: java URLDemo2 url");
     return;
   }

   URL url = new URL (args [0]);

   // Return a reference to a new protocol-specific object
   // that represents a connection to a resource.

   URLConnection uc = url.openConnection ();

   // Make the connection.

   uc.connect ();

   // Print out the contents of various header fields.

   Map m = uc.getHeaderFields ();
   Iterator i = m.entrySet ().iterator ();

   while (i.hasNext ())
     System.out.println (i.next ());

   // Find out if resource input and output operations are
   // allowed.

   System.out.println ("Input allowed = " +
             uc.getDoInput ());

   System.out.println ("Output allowed = " +
             uc.getDoOutput ());
  }
}

After the call to openConnection() returns, a call is made to the connect() method—to establish a resource connection. (Although the openConnection() method returns a reference to a connection object, openConnection() does not connect to a resource.) The call to URLConnection's getHeaderFields() method returns a reference to an object whose class implements the java.util.Map interface. That map contains a collection of header names and values. What are headers? Headers are text-based name/value pairs that identify the type of resource data, the length of that data, and so forth.

After you compile URLDemo2, type the command line java URLDemo2 http://www.javajeff.com. You see the following output:

Date=[Sun, 17 Feb 2002 17:49:32 GMT]
Connection=[Keep-Alive]
Content-Type=[text/html; charset=iso-8859-1]
Accept-Ranges=[bytes]
Content-Length=[7214]
null=[HTTP/1.1 200 OK]
ETag=["4470e-1c2e-3bf29d5a"]
Keep-Alive=[timeout=15, max=100]
Server=[Apache/1.3.19 (Unix) Debian/GNU]
Last-Modified=[Wed, 14 Nov 2001 16:35:38 GMT]
Input allowed = true
Output allowed = false

The output identifies a variety of headers (including Date, null, Content-Length, Server, Last-Modified, and so on) and their values. The output also shows that only reading from the resource is allowed.

Have you ever wondered how a program is capable of identifying resource data? Look closely at the preceding output, and you'll come across something called Content-Type. Content-Type is a header that identifies the resource data (content) type as text/html. The text portion is known as the type, and the html portion is known as the subtype. (If the content was ordinary text, Content-Type would probably contain text/plain as its value. The content is still text but is now plain.) The Content-Type header is part of something known as Multipurpose Internet Mail Extensions (MIME).

MIME is an extension to the traditional 7-bit ASCII standard for transmitting messages. By introducing various headers, MIME makes it possible to incorporate audio, video, still images, text from different character sets, and so on into 7-bit ASCII text. Along with Content-Type, MIME identifies Content-Length and other standard headers. As you work with the URLConnection class, you will encounter the methods getContentType() and getContentLength(). Those methods return the values of Content-Type and Content-Length headers. To learn more about MIME, I encourage you to read the RFC document identified earlier in this article.

You've probably heard of HTML forms—the <form>, </form>, and other HTML tags. Forms make it possible to GET data from a resource and POST the data from HTML form fields to a resource for subsequent processing. You can simulate an HTML form getting or posting data by using the URLConnection class and MIME. Here is how you accomplish that task.

Suppose that you want to POST form data to a server program. Posting requires manipulation of the form data. First, the form data must be organized into name/value pairs. Second, each pair must be specified in a name=value format. Third, if multiple name/value pairs are being sent, each pair must be separated from other pairs by using an ampersand (&) character. Finally, the contents of name and the contents of value must be encoded using the application/x-www-form-urlencoded MIME type. For example, x=y&a=b represents two name/value pairs—x/y and a/b.

To assist with the encoding, Java supplies a java.net.URLEncoder class that declares a pair of static encode() methods. Each method takes a String argument and returns a reference to a String object that contains the encoded contents of the argument. For example, if encode() discovers a space character in the argument, it replaces that space with a plus sign character in the result.

The following code fragment demonstrates a call to URLEncoder's encode(String s) method, to encode a, the space, and b in the "a b" literal string. a+b is stored in a new String object, which is referenced by result.

String result = URLEncoder.encode ("a b");

In addition to preparing form data, the URLConnection object must be told that data is being posted because URLConnection defaults to getting data. To accomplish that task, you first cast openConnection()'s return value to an HttpURLConnection type, after ensuring that the return value is of that type. Then you call the resulting object's setRequestMethod(String method) method with POST as the value of the object referenced by the method argument.

Another task that must be accomplished is to call URLConnection's setDoOutput(boolean doOutput) method with a true argument value. That task is necessary because the URLConnection object defaults to not supporting output. (The program can then ultimately make a call to URLConnection's getOutputStream() method, to return a reference to the resource's output stream for sending form data.)

To put the aforementioned tasks (and a few other not-mentioned tasks) into perspective, Listing 6's URLDemo3 source code demonstrates posting form data to a resource that "understands" the application/x-www-form-urlencoded content type.

Listing 6: URLDemo3.java

// URLDemo3.java

import java.io.*;
import java.net.*;

class URLDemo3
{
  public static void main (String [] args) throws IOException
  {
   // Check for at least two arguments and also for an even number 
   // of arguments.

   if (args.length < 2 || args.length % 2 != 0)
   {
     System.err.println ("usage: java URLDemo3 name value " +
               "[name value ...]");
     return;
   }

   // Create a URL object that lets the program connect to a server
   // program resource, that echoes back a form's name/value pairs.

   URL url;
   url = new URL 
   ("http://banshee.cs.uow.edu.au:2000/~nabg/echo.cgi");

   // Return a reference to a protocol-specific object that 
   // represents a connection to the http resource.

   URLConnection uc = url.openConnection ();

   // Validate the type of connection. Must be HttpURLConnection.

   if (!(uc instanceof HttpURLConnection))
   {
     System.err.println ("Wrong connection type");
     return;
   }

   // Indicate that the program must output name/value pairs to the
   // server program resource.

   uc.setDoOutput (true);

   // Indicate that only "live" information can be returned.

   uc.setUseCaches (false);

   // Set the Content-Type header to indicate the form MIME type that
   // specifies URL encoded data.

   uc.setRequestProperty ("Content-Type",
               "application/x-www-form-urlencoded");

   // Build the name/value pairs content to send to the server.

   String content = buildContent (args);

   // Set the Content-Type header to indicate the form MIME type 
   // that specifies URL encoded data.

   uc.setRequestProperty ("Content-Length",
               "" + content.length ());

   // Extract appropriate type of connection.

   HttpURLConnection hc = (HttpURLConnection) uc;

   // Set the HTTP request method to POST. (Default is GET.)

   hc.setRequestMethod ("POST");

   // Output the content.

   OutputStream os = uc.getOutputStream ();
   DataOutputStream dos = new DataOutputStream (os);
   dos.writeBytes (content);
   dos.flush ();
   dos.close ();

   // Input and display the result from the server program.

   InputStream is = uc.getInputStream ();

   int ch;
   while ((ch = is.read ()) != -1)
     System.out.print ((char) ch);

   is.close ();
  }

  static String buildContent (String [] args)
  {
   StringBuffer sb = new StringBuffer ();

   for (int i = 0; i < args.length; i++)
   {
      // Encode each argument for proper transmission.

      String encodedItem = URLEncoder.encode (args [i]);

      sb.append (encodedItem);

      if (i % 2 == 0)
        sb.append ("="); // Separate name from value.
      else
        sb.append ("&"); // Separate name/value pairs.
   }

   // Remove final & separator.

   sb.setLength (sb.length () - 1);

   return sb.toString ();
  }
}

You might be wondering why URLDemo3 does not call URLConnection's connect() method. That method is not explicitly called because other URLConnection methods (such as getContentLength()) implicitly call connect() if the connection to the resource has not been established. Once the connection is made, however, it is illegal to call methods such as setDoOutput(boolean doOutput). Those methods throw IllegalStateException objects after connect() has been (explicitly or implicitly) called.

After you compile URLDemo3, type the command line java URLDemo3 name1 value1 name2 value2 name3 value3. You see the following output:

<html> <head>
<title>Echoing your name value pairs</title>
</head>
<body>
<ol>
<li>name1   : value1
<li>name2   : value2
<li>name3   : value3
</ol>
<hr>
Mon Feb 18 08:58:45 2002
</body>
</html>

The server program resource's output consists of HTML that echoes back name1, value1, name2, value2, name3, and value3.

TIP

If you need a string representation of a URL object's URL, call either toExternalForm() or toString(). Both methods are equivalent.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020