Becoming a Software Developer Part 3: Version Control for Fun and Profit
By Pete McBreen
Date: Apr 19, 2002
Article is provided courtesy of Addison Wesley.
Introduction
In this third article in my "Becoming a Software Developer" series, I'm going to step away from programming for a short while and look at an issue that affects all software developershow to manage all of the changing versions of files as a project moves forward.
Most developers have been bitten by a small change somewhere breaking the application somewhere else. Normally it starts fairly innocently. You run your application to check that a new feature worksand discover that something else has stopped working. Being a careful developer, you try running the version you saved last night. Horror of horrors, that doesn't work either, even though you know it was working yesterday. In desperation, you try a different version on another machine; it sort of works, but other things are broken...welcome to the nightmare.
Avoid the Nightmare - Control Your Configuration
Once you start to see the extent of the problems with versions that work differently on different machines, you can be forgiven for asking yourself "Why did I ever become a software developer?" Unfortunately, once you get into the mess, getting out is really difficult. You're going to spend a lot of time with a puzzled look on your face, trying different options until eventually you discover what went wrong.
Rather than spend time learning strategies for digging yourself out, I think it's much better to take the time to learn how to avoid these problems in the first place:
Establish a baseline configuration. Make sure that if all else fails, you can always restore your development machine to a known state. Many teams do this by having a "standard image" that's installed onto all development machines. This ensures that any new machines can be quickly configured and if something goes wrong with a existing machine it's easy to revert to a known state.
Whenever you install anything new, do lots of testing. Are you in the habit of downloading the latest and greatest versions of tools? Whenever you download something, do a complete check to make sure that everything else still works. If just one shared library is changed, lots of other things may break. When something does break, don't panic; that's what the baseline configuration is for.
Create a new baseline whenever you find a new configuration that works. A baseline is not much use if after reverting to the baseline you have to download a whole set of tools and install updated versions of applications from CD. Outdated baseline images are just a waste of time, since it's easy to lose a day or two while you install and configure all the applications and tools that weren't part of the baseline image.
Daily Backups Are Your Friend
Absolutely nothing matches the sick feeling in the pit of your stomach when you realize that all the work you've done for the last six months has been lost due to a hardware fault that destroyed all the data on the disk. If you're lucky, you might get some relief by finding a few older versions of files on your laptop or on a long-forgotten floppy, but that's a small consolation for losing months of work.
Every day, without fail, save all of your valuable, project-specific files to permanent, offline media.
You only have to back up the valuable files. Anything you don't bother to back up is by definition not valuable, since by not backing it up you're stating that it doesn't matter if you lose it.
Keep the backups for each project separate. Keeping your projects separate ensures that you can always re-create an entire project from a single backup image. It's a real waste of time to have to restore from several different backup images in order to re-create a single project.
Periodically restore from the backupsto prove that you can. Make sure that you can use your baseline image on a new machine and restore your project from the backups. Practice this every month or so. It's extremely embarrassing to discover that the backups haven't worked for the past six months.
Keep some old versions of the backups. You never know when you'll discover that the file you deleted a year ago was actually needed. The price of the storage space is cheap compared to the cost of re-creating old files.
Keep the backups in offline storage. After all, you never know when some malicious software will get through the firewall to erase or damage all of your files.
Keep the backups for really valuable projects in a fireproof safe. Sometimes a little bit of paranoia is a healthy thing.
Keep copies of backups for very valuable projects in an offsite fireproof safe. Every week or so, take a copy of the current backup and store it offsite, just in case your paranoia was justified.
Effective Version Control
Your development machines have a baseline configuration and everything is being backed up daily. What more could you need? Simpleversion control.
Configuration control and daily backups give you reasonable protection from hardware failure. Version control gives you a measure of protection from the mistakes you make while developing software. Version control helps in many ways:
Many developers can work on the same project without overwriting each other's files. Most text editors are fairly dumbthe last person to save wins. This means that it's really easy for two people to both work on the same file and then the next morning the unlucky one will discover that all of his or her edits from yesterday are missing. Version control software prevents this problem, either by requiring developers to "lock" the file before they change it, or by warning of incompatible edits when the changes are resaved to the version control system.
You can tag a related set of files as a coherent working version. When you get a set of classes all working together correctly, it's great to be able to label that related set of files and then continue making changes, knowing that you can always re-create that working set of files when needed.
When you make a mistake, you can easily revert to a known working version. Sometime or other you will try out a design idea and discover it doesn't really work well. Rather than lean on the undo key in your editor for 10 minutes, just check out a clean copy of the most recent version of the file.
Version control allows you to see who made what changes to a file. Some days, code that you know was working just breaks. By using the version control system to see what files have been changed recently, you can often shorten the time it takes to hunt for the mistake. Sometimes you'll be lucky and it will just be a matter of asking the person who made the last change whether he or she has any idea what might have gone wrong. Other times, you might have to go as far as looking at the differences between two versions of the same file before you can discover the mistake.
To get all of these benefits, you must remember to check your work in frequently. I recommend checking in your changes every 30 to 90 minutes. This encourages me to make small, controlled changes to the code. This is very easy when using test-driven development (see Part 2 of this series), but can be harder to manage when using more traditional programming practices.
Whatever you do, don't delay checking in your changes beyond the end of your normal workday. Making very big changes means that you won't have any intermediate steps to revert to if (or rather when) you make a mistake late in the day. It can also mean a lot more hassle for the rest of your team when you eventually check in your files, since they suddenly have to deal with a large number of changes. Even worse, you might have locked someone out of a file that he or she needed to change, or your changes might be incompatible with other changes that have been made during the day.
Simple Version Control Using RCS
Although many different version control systems are out there, the ones I use the most often are RCS for my personal projects and CVS for commercial projects (normally with the WinCVS front end). Although CVS, the Concurrent Versions System, is really nice, I prefer the simpler command-line RCS tool for my solo projects. The main difference between the two is that RCS uses an exclusive checkout model, whereas CVS allows multiple people to work on a file at the same time, seamlessly merging the edits when the file is checked in. Having used both on large projects, I prefer CVS when working in a team because it allows many people to work simultaneously, but I find the extra complexity of CVS is not worthwhile for solo projects.
RCS itself is available from http://ftp.cvshome.org/rcs/; you need to download both diff27nt.zip and rcs57nt.zip. To install RCS, unzip the files into a convenient directory. (You'll probably want to make sure that this directory is part of your system PATH.) Once you've unzipped the files, enter the command rcs -V to confirm that everything is working okay (it should report RCS version 5.7).
Using RCS is very simple. To set it up, create an RCS directory inside your project directory, and then you can start to use RCS.
To check in a file, use the ci (check in) command. The first time through, RCS will ask you for a description of the file. The file is then removed from the project directory and saved away into the RCS directory.
D:\InformIT>ci CircularCounter.rb RCS/CircularCounter.rb,v <-- CircularCounter.rb enter description, terminated with single '.' or end of file: NOTE: This is NOT the log message! >> Sample file for the Ruby for the OO Nuby article >> . initial revision: 1.1 done
To check out the file so that you can just use it, use the co (check out) command. If you want to be able to edit the file, use co -l as the command. (The -l means to lock the file. RCS gives you an exclusive, writeable copy of the file that you can modify and check back in.)
D:\InformIT>co Odometer.rb RCS/Odometer.rb,v --> Odometer.rb revision 1.1 done D:\InformIT>co -l Odometer.rb RCS/Odometer.rb,v --> Odometer.rb revision 1.1 (locked) done
After you have updated the file, check it back into version control with the ci command, this time with a log message stating what you changed:
D:\InformIT>ci Odometer.rb RCS/Odometer.rb,v <-- Odometer.rb new revision: 1.2; previous revision: 1.1 enter log message, terminated with single '.' or end of file: >> Added some comments >> . done
Make your comments informative, because they'll help you understand how the file has been changed when looking through the revision history.
D:\InformIT>rlog -zLT CircularCounter.rb RCS file: RCS/CircularCounter.rb,v Working file: CircularCounter.rb head: 1.5 branch: locks: strict pete: 1.5 access list: symbolic names: RubyNuby: 1.2 TestDriven: 1.5 keyword substitution: kv total revisions: 5; selected revisions: 5 description: Sample counter class for Ruby for the OO Nuby article ---------------------------- revision 1.5 locked by: pete; date: 2002-03-07 17:39:43-07; author: pete; state: Exp; lines: +4 -1 Deliberate mistake fixed, minimal decrement added ----------------------------
The symbolic names in the revision history above are the key to getting a coherent set of files out of a directory. They're added using this command:
rcs -Nlabel:version filename
If the version is omitted, RCS defaults to the highest version:
rcs -NRubyNuby:1.2 CircularCounter.rb rcs -NTestDriven: CircularCounter.rb
Once these labels are in place, it's easy to grab a related set of files from RCS using the command co -rlabel, which will check out all files with the specified label. (RCS will warn for all files that don't have the right label.)
D:\InformIT>co -rTestDriven RCS\*.rb,v RCS\CircularCounter.rb,v --> CircularCounter.rb revision 1.5 done RCS\Odometer.rb,v --> Odometer.rb revision 1.2 done RCS\OdometerTest.rb,v --> OdometerTest.rb co: RCS\OdometerTest.rb,v: Symbolic name ´TestDriven' is undefined.
The last bit of RCS that's very useful to know is the rcsdiff command, which lets you know what has changed between two versions. It compares the currently checked out file against the revision version you specify, 1.3 in this case (to make it more interesting, I first checked out the RubyNuby version):
D:\InformIT>co -rRubyNuby CircularCounter.rb RCS/CircularCounter.rb,v --> CircularCounter.rb revision 1.2 done D:\InformIt>rcsdiff -r1.3 CircularCounter.rb =========================================================== RCS file: RCS/CircularCounter.rb,v retrieving revision 1.3 diff -r1.3 CircularCounter.rb 13c13 < if (value > @limit) --- > if (value >= @limit)
And there you have ita very simple, easy-to-use command-line version control system with just five commands:
ci Odometer.rb
co -rversion Odometer.rb or co -rversion -l Odometer.rb to edit the file
rcs -NRubyNuby:1.2 Odometer.rb
rcsdiff -r1.3 Odometer.rb
rlog Odometer.rb