Now that we've inspected the existing format, the next step is to design the Java data structures that will hold this information inside the program after it is parsed. Sometimes you'll want to design custom Java classes that closely represent the data. If I were writing a budget analysis program, I might do that here. But because we're not planning to do anything more complex than write the data back out again as XML, the generic data structures in the Java Collections API will more than suffice.
There are an indefinite number of records from year to year as new budget items are added. Thus the list of records will be kept in a java.util.ArrayList. Any other form of java.util.List such as a Vector or a LinkedList would work equally well. After initial construction, I'll only access this object through the methods of the abstract List superclass. The program will not depend on any implementation details of the list.
The records themselves can be represented as arrays, vectors, instances of a custom class, hash tables, or maps. If the data is reasonably clean, I find it easier to use a custom class or a map. An array or vector works well when there may be extra data in some lines or perhaps missing information. In my initial experiments, the data proved to be fairly clean, so I chose to use a Map. The keys will be reasonable approximations to the field names, so they can be stored in a static array for easy extraction and iteration in a later part of the code. Again, there are no API calls that set this up for you. You have to do it yourself.
When complete, you'll have a list of maps, one map for each record, as diagrammed in Figure 4.1. This is very close to the form of the input data and still requires manipulation before it's in the form for the output data. Some manipulations may be straightforward. For example, it's very easy to extract all of the data for 1982: just iterate through the list and pull out only the fields that are relevant to 1982 from each map. Other manipulations are more complex. For example, if you wanted to convert this into a hierarchical structure in which each bureau was part of its agency, you might need to use a sorted data structure or make multiple passes through the list. You might want to reorganize the data by calendar year instead of fiscal year. Or perhaps beyond merely reorganizing, you might want to perform some calculations on the data, such as summing the total budget for each agency each year. Whatever output you want, it's just a matter of writing the code to generate it. Once the input data has been parsed, it's easy to write it out as XML.
Figure 4.1. The List of Maps Data Structure for the Budget