Home > Articles > Programming > C/C++

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

13.7 File Copying—with a Twist

Let's write a short program that copies files—a popular way to get acquainted with a language's file system interface. Ah, the joy of K&R's classic getchar/putchar example [34, Chapter 1, page 15]. Of course, the system-provided programs that copy files use buffered reads and writes and many other optimizations to accelerate transfer speed, so it would be difficult to write a competitive program, but concurrency may give an edge.

The usual approach to file copying goes like this:

  1. Read data from the source file into a buffer.
  2. If nothing was read, done.
  3. Write the buffer into the target file.
  4. Repeat from step 1.

Adding appropriate error handling completes a useful (if unoriginal) program. If you select a large enough buffer and both the source and destination files reside on the same disk, the performance of the algorithm is near optimal.

Nowadays a variety of physical devices count as file repositories, such as hard drives, thumb drives, optical disks, connected smart phones, and remotely connected network services. These devices have various latency and speed profiles and connect to the computer via different hardware and software interfaces. Such interfaces could and should be put to work in parallel, not one at a time as the "read buffer/write buffer" algorithm above prescribes. Ideally, both the source and the target device should be kept as busy as possible, something we could effect with two threads following the producer-consumer protocol:

  1. Spawn one secondary thread that listens to messages containing memory buffers and writes them to the target file in a loop.
  2. Read data from the source file in a newly allocated buffer.
  3. If nothing was read, done.
  4. Send a message containing the read buffer to the secondary thread.
  5. Repeat from step 2.

In the new setup, one thread keeps the source busy and the other keeps the target busy. Depending on the nature of the source and target, significant acceleration could be obtained. If the device speeds are comparable and relatively slow compared to the bandwidth of the memory bus, the speed of copying could theoretically be doubled. Let's write a simple producer-consumer program that copies stdin to stdout:

import std.algorithm, std.concurrency, std.stdio;

void main() {
   enum bufferSize = 1024 * 100;
   auto tid = spawn(&fileWriter);
   // Read loop
   foreach (immutable(ubyte)[] buffer; stdin.byChunk(bufferSize)) {
      send(tid, buffer);
   }
}

void fileWriter() {
   // Write loop
   for (;;) {
      auto buffer = receiveOnly!(immutable(ubyte)[])();
      tgt.write(buffer);
   }
}

The program above transfers data from the main thread to the secondary thread through immutable sharing: the messages passed have the type immutable(ubyte)[], that is, arrays of immutable unsigned bytes. Those buffers are acquired in the foreach loop by reading input in chunks of type immutable(ubyte)[], each of size bufferSize. At each pass through the loop, one new buffer is allocated, read into, and bound to buffer. The foreach control part does most of the hard work; all the body has to do is send off the buffer to the secondary thread. As discussed, passing data around is possible because of immutable; if you replaced immutable(ubyte)[] with ubyte[], the call to send would not compile.

  • + Share This
  • 🔖 Save To Your Account