Home > Articles > Programming > Java

  • Print
  • + Share This
This chapter is from the book

20.8 Object Serialization

The ability to save objects in a byte stream that can be transferred across the network (perhaps for use in remote method invocations), saved to disk in a file or database, and later reconstituted to form a live object, is an essential aspect of many real-world applications.

The process of converting an object's representation into a stream of bytes is known as serialization, while reconstituting an object from a byte stream is deserialization. When talking about the classes, interfaces, and language features involved in this overall process, we generally just use the term serialization and understand that it includes deserialization as well.

A number of classes and interfaces are involved with serialization. You have already learned about the basic mechanisms for reading and writing primitive types and strings using the Data stream classes (see page 537). This section covers the object byte streams—ObjectInputStream and ObjectOutputStream—that allow you to serialize and deserialize complete objects. Various other classes and interfaces provide specific support for the serialization process. In addition, the field modifier transient provides a language-level means of marking data that should not be serialized.

20.8.1 The Object Byte Streams

The Object streams—ObjectInputStream and ObjectOutputStream—allow you to read and write object graphs in addition to the well-known types (primitives, strings, and arrays). By "object graph" we mean that when you use writeObject to write an object to an ObjectOutputStream, bytes representing the object—including all other objects that it references—are written to the stream. This process of transforming an object into a stream of bytes is called serialization. Because the serialized form is expressed in bytes, not characters, the Object streams have no Reader or Writer forms.

When bytes encoding a serialized graph of objects are read by the method readObject of ObjectInputStream—that is, deserialized—the result is a graph of objects equivalent to the input graph.

Suppose, for example, that you have a HashMap object that you wish to store into a file for future use. You could write the graph of objects that starts with the hash map this way:

FileOutputStream fileOut = new FileOutputStream("tab");
ObjectOutputStream out = new ObjectOutputStream(fileOut);
HashMap<?,?> hash = getHashMap();
out.writeObject(hash);

As you can see, this approach is quite straightforward. The single writeObject on hash writes the entire contents of the hash map, including all entries, all the objects that the entries refer to, and so on, until the entire graph of interconnected objects has been visited. A new copy of the hash map could be reconstituted from the serialized bytes:

FileInputStream fileIn = new FileInputStream("tab");
ObjectInputStream in = new ObjectInputStream(fileIn);
HashMap<?,?> newHash = (HashMap<?,?>) in.readObject();

Serialization preserves the integrity of the graph itself. Suppose, for example, that in a serialized hash map, an object was stored under two different keys:

When the serialized hash map is deserialized, the two analogous entries in the new copy of the hash map will have references to a single copy of the rose.jpg object, not references to two separate copies of rose.jpg. [2]

Sometimes, however, sharing objects in this way is not what is desired. In that case you can use ObjectOutputStream's writeUnshared method to write the object as a new distinct object, rather than using a reference to an existing serialization of that object. Any object written into the graph by writeUnshared will only ever have one reference to it in the serialized data. The readUnshared method of ObjectInputStream reads an object that is expected to be unique. If the object is actually a reference to an existing deserialized object then an ObjectStreamException is thrown; similarly, if the deserialization process later tries to create a second reference to an object returned by readUnshared, an ObjectStreamException is thrown. These uniqueness checks only apply to the actual object passed to writeUnshared or read by readUnshared, not to any objects they refer to.

20.8.2 Making Your Classes Serializable

When an ObjectOutputStream writes a serialized object, the object must implement the Serializable marker interface. This marker interface declares that the class is designed to have its objects serialized.

Being serializable can be quite simple. The default serialization process is to serialize each field of the object that is neither transient nor static. Primitive types and strings are written in the same encoding used by DataOutputStream; objects are serialized by calling writeObject. With default serialization, all serialized fields that are object references must refer to serializable object types. Default serialization also requires either that your superclass have a no-arg constructor (so that deserialization can invoke it) or that it also be Serializable (in which case declaring your class to implement Serializable is redundant but harmless). For most classes this default serialization is sufficient, and the entire work necessary to make a class serializable is to mark it as such by declaring that it implements the Serializable interface:

public class Name implements java.io.Serializable {
    private String name;
    private long id;
    private transient boolean hashSet = false;
    private transient int hash;
    private static long nextID = 0;

    public Name(String name) {
        this.name = name;
        synchronized (Name.class) {
            id = nextID++;
        }
    }

    public int hashCode() {
        if (!hashSet) {
            hash = name.hashCode();
            hashSet = true;

        }
        return hash;
    }

    // ... override equals, provide other useful methods
}

The class Name can be written to an ObjectOutputStream either directly with writeObject, or indirectly if it is referenced by an object written to such a stream. The name and id fields will be written to the stream; the fields nextID, hashSet, and hash will not be written, nextID because it is static and the others because they are declared transient. Because hash is a cached value that can easily be recalculated from name, there is no reason to consume the time and space it takes to write it to the stream.

Default deserialization reads the values written during serialization. Static fields in the class are left untouched—if the class needs to be loaded then the normal initialization of the class takes place, giving the static fields an initial value. Each transient field in the reconstituted object is set to the default value for its type. When a Name object is deserialized, the newly created object will have name and id set to the same values as those of the original object, the static field nextID will remain untouched, and the transient fields hashSet and hash will have their default values (false and 0). These defaults work because when hashSet is false the value of hash will be recalculated.

You will occasionally have a class that is generally serializable but has specific instances that are not serializable. For example, a container might itself be serializable but contain references to objects that are not serializable. Any attempt to serialize a non-serializable object will throw a NotSerializableException.

20.8.3 Serialization and Deserialization Order

Each class is responsible for properly serializing its own state—that is, its fields. Objects are serialized and deserialized down the type tree—from the highest-level class that is Serializable to the most specific class. This order is rarely important when you're serializing, but it can be important when you're deserializing. Let us consider the following type tree for an HTTPInput class:

When deserializing an HTTPInput object, ObjectInputStream first allocates memory for the new object and then finds the first Serializable class in the object's type hierarchy—in this case URLInput. The stream invokes the no-arg constructor of that class's superclass (the object's last non-serializable class), which in this case is InputSource. If other state from the superclass must be preserved, URLInput is responsible for serializing that state and restoring it on deserialization. If your non-serializable superclass has state, you will almost certainly need to customize the first serializable class (see the next section). If the first serializable class directly extends Object (as the earlier Name class did), customizing is easy because Object has no state to preserve or restore.

Once the first serializable class has finished with its part of its superclass's state, it will set its own state from the stream. Then ObjectInputStream will walk down the type tree, deserializing the state for each class using readObject. When ObjectInputStream reaches the bottom of the type tree, the object has been completely deserialized.

As the stream is deserialized, other serialized objects will be found that were referenced from the object currently being deserialized. These other objects are deserialized as they are encountered. Thus, if URLInput had a reference to a HashMap, that hash map and its contents would be deserialized before the HTTPInput part of the object was deserialized.

Before any of this can happen, the relevant classes must first be loaded. This requires finding a class of the same name as the one written and checking to see that it is the same class. You'll learn about versioning issues shortly. Assuming it is the same class, the class must be loaded. If the class is not found or cannot be loaded for any reason, readObject will throw a ClassNotFoundException.

20.8.4 Customized Serialization

The default serialization methods work for many classes but not for all of them. For some classes default deserialization may be improper or inefficient. The HashMap class is an example of both problems. Default serialization would write all the data structures for the hash map, including the hash codes of the entries. This serialization is both wrong and inefficient.

It is wrong because hash codes may be different for deserialized entries. This will be true, for example, of entries using the default hashCode implementation.

It is inefficient because a hash map typically has a significant number of empty buckets. There is no point in serializing empty buckets. It would be more efficient to serialize the referenced keys and entries and rebuild a hash map from them than to serialize the entire data structure of the map.

For these reasons, java.util.HashMap provides private writeObject and readObject methods. [3] These methods are invoked by ObjectOutputStream and ObjectInputStream, respectively, when it is time to serialize or deserialize a HashMap object. These methods are invoked only on classes that provide them, and the methods are responsible only for the class's own state, including any state from non-serializable superclasses. A class's writeObject and readObject methods, if provided, should not invoke the superclass's readObject or writeObject method. Object serialization differs in this way from clone and finalize.

Let us suppose, for example, that you wanted to improve the Name class so that it didn't have to check whether the cached hash code was valid each time. You could do this by setting hash in the constructor, instead of lazily when it is asked for. But this causes a problem with serialization—since hash is transient it does not get written as part of serialization (nor should it), so when you are deserializing you need to explicitly set it. This means that you have to implement readObject to deserialize the main fields and then set hash, which implies that you have to implement writeObject so that you know how the main fields were serialized.

public class BetterName implements Serializable {
    private String name;
    private long id;
    private transient int hash;
    private static long nextID = 0;

    public BetterName(String name) {
        this.name = name;
        synchronized (BetterName.class) {
            id = nextID++;
        }
        hash = name.hashCode();
    }

    private void writeObject(ObjectOutputStream out)
        throws IOException
    {
        out.writeUTF(name);
        out.writeLong(id);
    }

    private void readObject(ObjectInputStream in)
        throws IOException, ClassNotFoundException
    {
        name = in.readUTF();
        id = in.readLong();
        hash = name.hashCode();
    }

    public int hashCode() {
        return hash;
    }

    // ... override equals, provide other useful methods
}

We use writeObject to write out each of the non-static, non-transient fields. It declares that it can throw IOException because the write methods it invokes can do so, and, if one does throw an exception, the serialization must be halted. When readObject gets the values from the stream, it can then set hash properly. It, too, must declare that it throws IOException because the read methods it invokes can do so, and this should stop deserialization. The readObject method must declare that it throws ClassNotFoundException because, in the general case, deserializing fields of the current object could require other classes to be loaded—though not in the example.

There is one restriction on customized serialization: You cannot directly set a final field within readObject because final fields can only be set in initializers or constructors. For example, if name was declared final the class BetterName would not compile. You will need to design your classes with this restriction in mind when considering custom serialization. The default serialization mechanism can bypass this restriction because it uses native code. This means that default serialization works fine with classes that have final fields. For custom serialization it is possible to use reflection to set a final field—see "Final Fields" on page 420—but the security restrictions for doing this means that it is seldom applicable. One circumstance in which it is applicable, for example, is if your classes are required to be installed as a standard extension and so have the necessary security privileges—see "Security Policies" on page 680.

The readObject and writeObject methods for BetterName show that you can use the methods of DataInput and DataOutput to transmit arbitrary data on the stream. However, the actual implementations replicate the default serialization and then add the necessary setup for hash. The read and write invocations of these methods could have been replaced with a simple invocation of methods that perform default serialization and deserialization:

private void writeObject(ObjectOutputStream out)
    throws IOException
{
    out.defaultWriteObject();
}

private void readObject(ObjectInputStream in)
    throws IOException, ClassNotFoundException
{
    in.defaultReadObject();
    hash = name.hashCode();
}

In fact, as you may have surmised, given that writeObject performs nothing but default serialization, we need not have implemented it at all.

A writeObject method can throw NotSerializableException if a particular object is not serializable. For example, in rare cases, objects of a class might be generally serializable, but a particular object might contain sensitive data.

You will occasionally find that an object cannot be initialized properly until the graph of which it is a part has been completely deserialized. You can have the ObjectInputStream invoke a method of your own devising by calling the stream's registerValidation method with a reference to an object that implements the interface ObjectInputValidation. When deserialization of the top-level object at the head of the graph is complete, your object's validateObject method will be invoked to make any needed validation operation or check.

Normally, an object is serialized as itself on the output stream, and a copy of the same type is reconstituted during deserialization. You will find a few classes for which this is not correct. For example, if you have a class that has objects that are supposed to be unique in each virtual machine for each unique value (so that == will return true if and only if equals also would return true), you would need to resolve an object being deserialized into an equivalent one in the local virtual machine. You can control these by providing writeReplace and readResolve methods of the following forms and at an appropriate access level:

  • <access> Object writeReplace() throws ObjectStreamException
    • Returns an object that will replace the current object during serialization. Any object may be returned including the current one.
  • <access> Object readResolve() throws ObjectStreamException
    • Returns an object that will replace the current object during deserialization. Any object may be returned including the current one.

In our example, readResolve would check to find the local object that was equivalent to the one just deserialized—if it exists it will be returned, otherwise we can register the current object (for use by readResolve in the future) and return this. These methods can be of any accessibility; they will be used if they are accessible to the object type being serialized. For example, if a class has a private readResolve method, it only affects deserialization of objects that are exactly of its type. A package-accessible readResolve affects only subclasses within the same package, while public and protected readResolve methods affect objects of all subclasses.

20.8.5 Object Versioning

Class implementations change over time. If a class's implementation changes between the time an object is serialized and the time it is deserialized, the ObjectInputStream can detect this change. When the object is written, the serial version UID (unique identifier), a 64-bit long value, is written with it. By default, this identifier is a secure hash of the full class name, superinterfaces, and members—the facts about the class that, if they change, signal a possible class incompatibility. Such a hash is essentially a fingerprint—it is nearly impossible for two different classes to have the same UID.

When an object is read from an ObjectInputStream, the serial version UID is also read. An attempt is then made to load the class. If no class with the same name is found or if the loaded class's UID does not match the UID in the stream, readObject throws an InvalidClassException. If the versions of all the classes in the object's type are found and all the UIDs match, the object can be deserialized.

This assumption is very conservative: Any change in the class creates an incompatible version. Many class changes are less drastic than this. Adding a cache to a class can be made compatible with earlier versions of the serialized form, as can adding optional behavior or values. Rather then relying on the default serial version UID, any serializable class should explicitly declare its own serial version UID value. Then when you make a change to a class that can be compatible with the serialized forms of earlier versions of the class, you can explicitly declare the serial version UID for the earlier class. A serial version UID is declared as follows:

private static final
    long serialVersionUID = -1307795172754062330L;

The serialVersionUID field must be a static, final field of type long. It should also be private since it is only applied to the declaring class. The value of serialVersionUID is provided by your development system. In many development systems, it is the output of a command called serialver. Other systems have different ways to provide you with this value, which is the serial version UID of the class before the first incompatible modification. (Nothing prevents you from using any number as this UID if you stamp it from the start, but it is usually a really bad idea. Your numbers will not be as carefully calculated to avoid conflict with other classes as the secure hash is.)

Now when the ObjectInputStream finds your class and compares the UID with that of the older version in the file, the UIDs will be the same even though the implementation has changed. If you invoke defaultReadObject, only those fields that were present in the original version will be set. Other fields will be left in their default state. If writeObject in the earlier version of the class wrote values on the field without using defaultWriteObject, you must read those values. If you try to read more values than were written, you will get an EOFException, which can inform you that you are deserializing an older form that wrote less information. If possible, you should design classes with a class version number instead of relying on an exception to signal the version of the original data.

When an object is written to an ObjectOutputStream, the Class object for that object is also written. Because Class objects are specific to each virtual machine, serializing the actual Class object would not be helpful. So Class objects on a stream are replaced by ObjectStreamClass objects that contain the information necessary to find an equivalent class when the object is deserialized. This information includes the class's full name and its serial version UID. Unless you create one, you will never directly see an ObjectStreamClass object.

As a class evolves it is possible that a new superclass is introduced for that class. If an older serialized form of the class is deserialized it will not contain any serialized data for that superclass. Rather than making this an error, the system will set all fields declared by the superclass to their default initialized values. To override this default behavior, the new superclass (which must implement Serializable, of course) can declare the following method:

private void readObjectNoData() throws ObjectStreamException

If, as an object is deserialized, the serialized data lists the superclass as a known superclass then the superclass's readObject method will be invoked (if it exists), otherwise the superclass's readObjectNoData method will be invoked. The readObjectNoData method can then set appropriate values in the object's superclass fields.

20.8.6 Serialized Fields

The default serialization usually works well, but for more sophisticated classes and class evolution you may need to access the original fields. For example, suppose you were representing a rectangle in a geometric system by using two opposite corners. You would have four fields: x1, y1, x2, and y2. If you later want to use a corner, plus width and height, you would have four different fields: x, y, width, and height. Assuming default serialization of the four original fields you would also have a compatibility problem: the rectangles that were already serialized would have the old fields instead of the new ones. To solve this problem you could maintain the serialized format of the original class and convert between the old and new fields as you encounter them in readObject or writeObject. You do this using serialized field types to view the serialized form as an abstraction and to access individual fields:

public class Rectangle implements Serializable {
    private static final
        long serialVersionUID = -1307795172754062330L;
    private static final
        ObjectStreamField[] serialPersistentFields = {
            new ObjectStreamField("x1", Double.TYPE),
            new ObjectStreamField("y1", Double.TYPE),
            new ObjectStreamField("x2", Double.TYPE),
            new ObjectStreamField("y2", Double.TYPE),
        };
    private transient double x, y, width, height;

    private void readObject(ObjectInputStream in)
        throws IOException, ClassNotFoundException
    {
        ObjectInputStream.GetField fields;
        fields = in.readFields();
        x = fields.get("x1", 0.0);
        y = fields.get("y1", 0.0);
        double x2 = fields.get("x2", 0.0);
        double y2 = fields.get("y2", 0.0);
        width = (x2 - x);
        height = (y2 - y);
    }

    private void writeObject(ObjectOutputStream out)
        throws IOException
    {

        ObjectOutputStream.PutField fields;
        fields = out.putFields();
        fields.put("x1", x);
        fields.put("y1", y);
        fields.put("x2", x + width);
        fields.put("y2", y + height);
        out.writeFields();
    }
}

Rectangle keeps the serialVersionUID of the original version to declare that the versions are compatible. Changing fields that would be used by default serialization is otherwise considered to be an incompatible change.

To represent each of the old fields that will be found in the serialized data, you create an ObjectStreamField object. You construct each ObjectStreamField object by passing in the name of the field it represents, and the Class object for the type of the field it represents. An overloaded constructor also takes a boolean argument that specifies whether the field refers to an unshared object—that is, one written by writeUnshared or read by readUnshared. The serialization mechanism needs to know where to find these ObjectStreamField objects, so they must be defined in the static, final array called serialPersistentFields.

The fields x, y, width, and height are marked transient because they are not serialized—during serialization these new fields must be converted into appropriate values of the original fields so that we preserve the serialized form. So writeObject uses an ObjectOutputStream.PutField object to write out the old form, using x and y as the old x1 and y1, and calculating x2 and y2 from the rectangle's width and height. Each put method takes a field name as one argument and a value for that field as the other—the type of the value determines which overloaded form of put is invoked (one for each primitive type and Object). In this way the default serialization of the original class has been emulated and the serialized format preserved.

When a Rectangle object is deserialized, the reverse process occurs. Our readObject method gets an ObjectInputStream.GetField that allows access to fields by name from the serialized object. There is a get method for returning each primitive type, and one for returning an Object reference. Each get method takes two parameters: the name of the field and a value to return if it is not defined in the serialized object. The return value's type chooses which overload of get is used: A short return value will use the get that returns a short, for example. In our example, all values are double: We get the x1 and y1 fields to use for one corner of the rectangle, and the old x2 and y2 fields to calculate width and height.

Using the above technique the new Rectangle class can deserialize old rectangle objects and a new serialized rectangle can be deserialized by the original Rectangle class, provided that both virtual machines are using compatible versions of the serialization stream protocol. The stream protocol defines the actual layout of serialized objects in the stream regardless of whether they use default serialization or the serialized field objects. This means that the serialized form of an object is not dependent on, for example, the order in which you invoke put, nor do you have to know the order in which to invoke get—you can use get or put to access fields in any order any number of times.

20.8.7 The Externalizable Interface

The Externalizable interface extends Serializable. A class that implements Externalizable takes complete control over its serialized state, assuming responsibility for all the data of its superclasses, any versioning issues, and so on. You may need this, for example, when a repository for serialized objects mandates restrictions on the form of those objects that are incompatible with the provided serialization mechanism. The Externalizable interface has two methods:

public interface Externalizable extends Serializable {
    void writeExternal(ObjectOutput out)
        throws IOException;
    void readExternal(ObjectInput in)
        throws IOException, ClassNotFoundException;
}

These methods are invoked when the object is serialized and deserialized, respectively. They are normal public methods, so the exact type of the object determines which implementation will be used. Subclasses of an externalizable class will often need to invoke their superclass's implementation before serializing or deserializing their own state—in contrast to classes that use normal serialization.

You should note that the methods of the interface are public and so can be invoked by anyone at anytime. In particular, a malicious program might invoke readExternal to make an object overwrite its state from some serialized stream, possibly with invented content. If you are designing classes where such security counts you have to take this into account either by not using Externalizable or by writing your readExternal method to be only invoked once, and never at all if the object was created via one of your constructors.

20.8.8 Documentation Comment Tags

As you can see from the Rectangle code, the serialized form of an object can be an important thing, separate from its runtime form. This can happen over time due to evolution, or by initial design when the runtime form is not a good serialized form. When you write serializable classes that others will reimplement, you should document the persistent form so that other programmer's can properly re-implement the serialized form as well as the runtime behavior. You do this with the special javadoc tags @serial, @serialField, and @serialData.

Use @serial to document fields that use default serialization. For example, the original Rectangle class could have looked like this:

/** X-coordinate of one corner.
 *  @serial */
private double x1;
/** Y-coordinate of one corner.
 *  @serial */
private double y1;
/** X-coordinate of opposite corner.
 *  @serial */
private double x2;
/** Y-coordinate of opposite corner.
 *  @serial */
private double y2;

The @serial tag can include a description of the meaning of the field. If none is given (as above), then the description of the runtime field will be used. The javadoc tool will add all @serial information to a page, known as the serialized form page.

The @serial tag can also be applied to a class or package with the single argument include or exclude, to control whether serialization information is documented for that class or package. By default public and protected types are included, otherwise they are excluded. A class level @serial tag overrides a package level @serial tag.

The @serialField tag documents fields that are created by GetField and PutField invocations, such as those in our Rectangle example. The tag takes first the field name, then its type, and then a description. For example:

/** @serialField x1 double X-coordinate of one corner. */
/** @serialField y1 double Y-coordinate of one corner. */
/** @serialField x2 double X-coordinate of other corner. */
/** @serialField y2 double Y-coordinate of other corner. */
private transient double x, y, width, height;

You use the @serialData tag in the doc comment for a writeObject method to document any additional data written by the method. You can also use @serialData to document anything written by an Externalizable class's writeExternal method.

  • + Share This
  • 🔖 Save To Your Account