6.7 Case Study
In this case study, I will walk you through the proper design of a remote backup system. Keep in mind our trust modelthe local environment is trusted; the network is not, and neither is the remote server.
Assume that there is a secure way to obtain a client-side program. While this is a leap of faith, we have to start somewhere. Perhaps the client backup program has a well-known hash, and you are able to verify it on the client end. Anyway, if you cannot obtain a secure version of your security software, you are in big trouble.
So, what is the software that you are running locally? Ideally it is a cryptographic file system like the ones described in Chapter 4. If that were the case, then you could simply ship out the encrypted versions of files and store those remotely.
There are several reasons why this is not practical. Cryptographic file systems (CFSs) require some understanding on the part of the user that he is using a CFS. Also, the installation may not be trivial. Not all users are sophisticated enough to manage this. Furthermore, the user may be running applications that do not let him control where files are stored, so there may be no way to mount those files in a CFS. Thus, in this case study, we focus on a backup system that is retrofitted to a commonly used environment, such as a Windows PC.
6.7.1 The Client Software
In my view of the ideal remote backup system, a user first starts a session, which is an interaction with the software for the purposes of backup or restore. When the user starts a session, he is prompted for a passphrase. He then selects whether or not this session is a backup or a restore. If it is a backup, then the system does some proactive checking and makes sure that the passphrase has enough entropy. One good way to accomplish this is to show a progress bar and require the user to keep entering characters until the progress bar is full. A sensible algorithm is then used to derive two 128-bit keys from the passphrase. The first is for authentication, and the second is for encryption. In practice, the user should probably use the same passphrase for all sessions, otherwise he is likely to forget it or write it down somewhere.
The client software would ideally resemble a nice graphical file manager. Perhaps it could be identical in look and feel to Windows Explorer, with folders and icons for files. In fact, a very good program would simply add functionality to the existing Windows Explorer. The user presses the shift and control keys and uses the mouse to select which files to back up, or alternatively, picks from a previously saved list of files. Next, the user activates the backup by pressing a button or selecting from a menu. For security reasons, unattended backups are not allowed.
At this point, the software kicks in. First, a bundle is created. Each file is compressed and added to the bundle. In practice, this could be the same as a zip archive or a UNIX tar.gz file. Then, the authentication key is used to compute the HMAC (see Chapter 8) of the bundle, and the output is added to the bundle. Finally, the bundle is encrypted with the encryption key using a strong block cipher, such as triple DES or AES. The bundle is then tagged with the user name, the address of the user's machine, and a time and date, and is sent over to the untrusted remote backup server. The remote server then stores the bundle, indexed by the tags. One nice thing about this way of doing things is that the file system structure and the file names are hidden from the remote server and from anyone listening in on the network.
If the session is a restore, then the user is prompted to pick a date. A list of all of the previous backup dates is downloaded from the server and shown, and the user selects which date he wishes to restore. The software imports the corresponding bundle from the server and decrypts it using the key derived from the passphrase. The authentication is then checked, and if it verifies correctly, the restore proceeds. Next, a Windows Explorer view of all of the restored files is presented, anchored at a new root directory. For example, the old file system view is mounted at C:\restore\old root. The user can preview all of the files in their restored format and decide to accept or reject the restore. If it is accepted, then all of the files are restored in the actual file system. The user can also select to restore on a per-file basis as opposed to taking the whole bundle.
One interesting feature of the scheme presented here is that there need not be any user authentication for a restore session. The servers can make all of the bundles available to the world. The strong encryption and authentication properties make them tamper evident and opaque to anyone who cannot obtain a user passphrase or break the authentication and encryption functions.
However, it is desirable to have some user authentication when a user performs a backup. Otherwise, attackers could fill up the disks on the servers with anything they wanted. Users should be strongly advised not to use their data backup passphrase to authenticate to the remote backup server.
6.7.2 Incremental Backups
In a typical backup scenario, the user selects a set of files in the file system to back up. The set is often given a name or an icon, and there is an easy mechanism for the user to execute a backup of that set of files. It is inefficient to back up all of the files in a particular set every time. A common technique for avoiding this is to perform a full backup periodically, in which all of the files are copied. Then, whenever a backup is needed in between full backups, an incremental backup is done. An incremental backup consists of copying only those files that have changed since the last backup. To accomplish this, a local database is maintained containing filenames and modification times for all files that have been backed up. When it is time for an incremental backup, the software checks this database to see which files in the file system have a more recent modification time than is shown in the database, and these files are backed up.
To restore incremental backups, the system simply restores the most recent full backup and then restores each of the subsequent incremental backups in order of least recent first. Thus, files get restored to the view of the file system at the last incremental backup. It is not necessary to back up the incremental backup database, as long as the order of incremental backups is maintained on the remote backup server. If the database gets trashed, then the next backup must be a full one.
There are, however, some security considerations when doing incremental backups. Assuming that there is an adversary out to get you, there is a basic attack that could wreak havoc with your backups. If an attacker can change the modification times in the database, he can set the time ahead, and the system will not back up files, even though they have changed. In fact, in most deployed systems, an attacker could set the modification times of all files in the database to some time far into the future, and the software would probably not detect it.
The defense against this attack is straightforward. When I discussed performing backups, I already described two keys: an authentication key and an encryption key that are derived from the user passphrase and available in memory at the time of the backup. A secure remote backup system should use the authentication key to compute a MAC on the incremental backup database after every legitimate change. The MAC can be stored together with the database. The cryptographic properties of the MAC are such that nobody can modify the file or the MAC in a way that modifications to either will not be detected. Of course, it is important that the MAC be verified before every incremental backup. Again, keep in mind that attacking the database only disrupts the backup process, not the restore process. The database is not used for restoring files.