Home > Articles

  • Print
  • + Share This
This chapter is from the book

2.5 Memory-Mapped Files

Most operating systems can take advantage of a virtual memory implementation to “map” a file, or a region of a file, into memory. Then the file can be accessed as if it were an in-memory array, which is much faster than the traditional file operations.

2.5.1 Memory-Mapped File Performance

At the end of this section, you can find a program that computes the CRC32 checksum of a file using traditional file input and a memory-mapped file. On one machine, we got the timing data shown in Table 2.5 when computing the checksum of the 37MB file rt.jar in the jre/lib directory of the JDK.

Table 2.5 Timing Data for File Operations

Method

Time

Plain input stream

110 seconds

Buffered input stream

9.9 seconds

Random access file

162 seconds

Memory-mapped file

7.2 seconds

As you can see, on this particular machine, memory mapping is a bit faster than using buffered sequential input and dramatically faster than using a RandomAccessFile.

Of course, the exact values will differ greatly from one machine to another, but it is obvious that the performance gain, compared to random access, can be substantial. For sequential reading of files of moderate size, on the other hand, there is no reason to use memory mapping.

The java.nio package makes memory mapping quite simple. Here is what you do.

First, get a channel for the file. A channel is an abstraction for a disk file that lets you access operating system features such as memory mapping, file locking, and fast data transfers between files.

FileChannel channel = FileChannel.open(path, options);

Then, get a ByteBuffer from the channel by calling the map method of the FileChannel class. Specify the area of the file that you want to map and a mapping mode. Three modes are supported:

  • FileChannel.MapMode.READ_ONLY: The resulting buffer is read-only. Any attempt to write to the buffer results in a ReadOnlyBufferException.

  • FileChannel.MapMode.READ_WRITE: The resulting buffer is writable, and the changes will be written back to the file at some time. Note that other programs that have mapped the same file might not see those changes immediately. The exact behavior of simultaneous file mapping by multiple programs depends on the operating system.

  • FileChannel.MapMode.PRIVATE: The resulting buffer is writable, but any changes are private to this buffer and not propagated to the file.

Once you have the buffer, you can read and write data using the methods of the ByteBuffer class and the Buffer superclass.

Buffers support both sequential and random data access. A buffer has a position that is advanced by get and put operations. For example, you can sequentially traverse all bytes in the buffer as

while (buffer.hasRemaining())
{
   byte b = buffer.get();
   . . .
}

Alternatively, you can use random access:

for (int i = 0; i < buffer.limit(); i++)
{
   byte b = buffer.get(i);
   . . .
}

You can also read and write arrays of bytes with the methods

get(byte[] bytes)
get(byte[], int offset, int length)

Finally, there are methods

getInt            getChar
getLong           getFloat
getShort          getDouble

to read primitive-type values that are stored as binary values in the file. As we already mentioned, Java uses big-endian ordering for binary data. However, if you need to process a file containing binary numbers in little-endian order, simply call

buffer.order(ByteOrder.LITTLE_ENDIAN);

To find out the current byte order of a buffer, call

ByteOrder b = buffer.order();

To write numbers to a buffer, use one of the methods

putInt            putChar
putLong           putFloat
putShort          putDouble

At some point, and certainly when the channel is closed, these changes are written back to the file.

Listing 2.5 computes the 32-bit cyclic redundancy checksum (CRC32) of a file. That checksum is often used to determine whether a file has been corrupted. Corruption of a file makes it very likely that the checksum has changed. The java.util.zip package contains a class CRC32 that computes the checksum of a sequence of bytes, using the following loop:

var crc = new CRC32();
while (more bytes)
   crc.update(next byte);
long checksum = crc.getValue();

The details of the CRC computation are not important. We just use it as an example of a useful file operation. (In practice, you would read and update data in larger blocks, not a byte at a time. Then the speed differences are not as dramatic.)

Run the program as

java memoryMap.MemoryMapTest filename

Listing 2.5 memoryMap/MemoryMapTest.java


 1  package memoryMap;
 2  
 3  import java.io.*;
 4  import java.nio.*;
 5  import java.nio.channels.*;
 6  import java.nio.file.*;
 7  import java.util.zip.*;
 8  
 9  /**
10  * This program computes the CRC checksum of a file in four ways. <br>
11  * Usage: java memoryMap.MemoryMapTest filename
12  * @version 1.02 2018-05-01
13  * @author Cay Horstmann
14  */
15  public class MemoryMapTest
16  {
17    public static long checksumInputStream(Path filename) throws IOException
18    {
19        try (InputStream in = Files.newInputStream(filename))
20        {
21          var crc = new CRC32();
22  
23          int c;
24          while ((c = in.read()) != -1)
25              crc.update(c);
26          return crc.getValue();
27        }
28    }
29  
30    public static long checksumBufferedInputStream(Path filename) throws IOException
31    {
32        try (var in = new BufferedInputStream(Files.newInputStream(filename)))
33        {
34          var crc = new CRC32();
35  
36          int c;
37          while ((c = in.read()) != -1)
38              crc.update(c);
39          return crc.getValue();
40        }
41    }
42  
43    public static long checksumRandomAccessFile(Path filename) throws IOException
44    {
45        try (var file = new RandomAccessFile(filename.toFile(), "r"))
46        {
47          long length = file.length();
48          var crc = new CRC32();
49  
50          for (long p = 0; p < length; p++)
51          {
52              file.seek(p);
53              int c = file.readByte();
54              crc.update(c);
55          }
56          return crc.getValue();
57        }
58    }
59  
60      public static long checksumMappedFile(Path filename) throws IOException
61      {
62          try (FileChannel channel = FileChannel.open(filename))
63          {
64            var crc = new CRC32();
65            int length = (int) channel.size();
66            MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, length);
67  
68            for (int p = 0; p < length; p++)
69            {
70                int c = buffer.get(p);
71                crc.update(c);
72              }
73              return crc.getValue();
74          }
75      }
76  
77      public static void main(String[] args) throws IOException
78      {
79          System.out.println("Input Stream:");
80          long start = System.currentTimeMillis();
81          Path filename = Paths.get(args[0]);
82          long crcValue = checksumInputStream(filename);
83          long end = System.currentTimeMillis();
84          System.out.println(Long.toHexString(crcValue));
85          System.out.println((end - start) + " milliseconds");
86  
87          System.out.println("Buffered Input Stream:");
88          start = System.currentTimeMillis();
89          crcValue = checksumBufferedInputStream(filename);
90          end = System.currentTimeMillis();
91          System.out.println(Long.toHexString(crcValue));
92          System.out.println((end - start) + " milliseconds");
93  
94          System.out.println("Random Access File:");
95          start = System.currentTimeMillis();
96          crcValue = checksumRandomAccessFile(filename);
97          end = System.currentTimeMillis();
98          System.out.println(Long.toHexString(crcValue));
99          System.out.println((end - start) + " milliseconds");
100 
101        System.out.println("Mapped File:");
102        start = System.currentTimeMillis();
103        crcValue = checksumMappedFile(filename);
104        end = System.currentTimeMillis();
105        System.out.println(Long.toHexString(crcValue));
106        System.out.println((end - start) + " milliseconds");
107    }
108 }

2.5.2 The Buffer Data Structure

When you use memory mapping, you make a single buffer that spans the entire file or the area of the file that you’re interested in. You can also use buffers to read and write more modest chunks of information.

In this section, we briefly describe the basic operations on Buffer objects. A buffer is an array of values of the same type. The Buffer class is an abstract class with concrete subclasses ByteBuffer, CharBuffer, DoubleBuffer, FloatBuffer, IntBuffer, LongBuffer, and ShortBuffer.

In practice, you will most commonly use ByteBuffer and CharBuffer. As shown in Figure 2.9, a buffer has

FIGURE 2.9

FIGURE 2.9 A buffer

  • A capacity that never changes

  • A position at which the next value is read or written

  • A limit beyond which reading and writing is meaningless

  • Optionally, a mark for repeating a read or write operation

These values fulfill the condition

0 = mark = position = limit = capacity

The principal purpose of a buffer is a “write, then read” cycle. At the outset, the buffer’s position is 0 and the limit is the capacity. Keep calling put to add values to the buffer. When you run out of data or reach the capacity, it is time to switch to reading.

Call flip to set the limit to the current position and the position to 0. Now keep calling get while the remaining method (which returns limit – position) is positive. When you have read all values in the buffer, call clear to prepare the buffer for the next writing cycle. The clear method resets the position to 0 and the limit to the capacity.

If you want to reread the buffer, use rewind or mark/reset (see the API notes for details).

To get a buffer, call a static method such as ByteBuffer.allocate or ByteBuffer.wrap.

Then, you can fill a buffer from a channel, or write its contents to a channel. For example,

ByteBuffer buffer = ByteBuffer.allocate(RECORD_SIZE);
channel.read(buffer);
channel.position(newpos);
buffer.flip();
channel.write(buffer);

This can be a useful alternative to a random-access file.

  • + Share This
  • 🔖 Save To Your Account