Stages of Reverse Engineering
With forward engineering, developers first analyze an application and prepare a model of its intent. Only then do they design the applicationâ€”that is, choose strategies for solving the problem. Finally, they implement the application by writing the database and programming code.
As Figure 1 shows, you can organize reverse engineering on a corresponding basis. You start with an implementation of an application (primarily the database structure) and determine dependencies between fields. This yields a design model that you can then abstract and reconcile with other inputs. The ultimate result is an analysis model of the conceptual intent.
Figure 1Forward vs. reverse engineering. You can organize them on a similar basis.
You can reverse engineer by constructing models that describe the existing software and the presumed intent. This process has three main stages:
Implementation recovery. Quickly learn about the application and prepare an initial model.
Design recovery. Undo the mechanics of the database structure and resolve foreign key references.
Analysis recovery. Remove design artifacts and eliminate any errors in the model.
In implementation recovery, you prepare an initial model (see Figure 2) that forms the basis for reverse engineering. Because the initial model will serve as a reference, it should purely reflect the implementation and have no inferences.
Figure 2Implementation recovery. Quickly learn about an application and prepare an initial model.
The first task is to browse existing documentation and learn about an application. The resulting context clarifies the developer's intent and makes it easier to communicate with application experts. You should finish this task in a few hours. What you learn is incidental to the actual reverse engineering, but it is important because it helps you notice more as you proceed.
The next step is to enter the database structure into a modeling toolâ€”by typing or automation. Some tools can read the system tables of an RDBMS and seed a model. If you use these tools, you should at least skim the database structure to get a feel for the development style. There are four steps to converting database structures into a model.
Create tentative entity types. Represent each physical data unit (COBOL record, IMS segment, CODASYL record type, or RDBMS table) as an entity type7. Give each entity type the same name as its corresponding physical data unit.
Create tentative relationship types. For a CODASYL application, represent the set types as relationship types. Otherwise, defer relationship types until design.
Create tentative attributes. The data elements in the legacy system become attributes of the entity types. Indicate not-null restrictions, data types, and lengths if the information is available.
Note keys and indexes. Note primary keys, candidate keys, and foreign keys if they happen to be defined. Otherwise, note unique and secondary indexes.
During design recovery, you undo the mechanics of the database and perform only straightforward actions (see Figure 3).
Figure 3Design recovery. Undo the mechanics of the database structure.
You should postpone conjecture and interpretation until the analysis-recovery stage. Typically, you can perform design recovery autonomously, without help from application experts. During this stage, you resolve three main issues.
Identity. Most often, unique indexes will be defined for the candidate keys of the entity types. Otherwise, look for unique combinations of data; such data can suggest, but do not prove, a candidate key. You can also infer candidate keys by considering names and conventions of style. A suspected foreign key may imply a corresponding candidate key.
Foreign keys. Foreign key (references from one table to another) determination is usually the most difficult aspect of design recovery. Matching names and data types can suggest foreign keys. Some DBMSs, such as RDBMSs, let developers declare foreign keys and their referent, but (unfortunately) most legacy applications do not use this capability.
Queries. When queries are available, you can use them to refine your understanding of identity and foreign keys.
The final product of design recovery still reflects the DBMS paradigm and may include optimizations and errors. In practice, the model will seldom be complete. Portions of the structure may be confusing.
The final phase is analysis recoveryâ€”interpret the model, refine it, and make it more abstract (see Figure 4). It is primarily during this phase that you should consult with available application experts. Analysis recovery consists of four main tasks.
Figure 4-Analysis recovery. Remove the artifacts of design and eliminate any errors in the model.
Clarification. Remove any remaining artifacts of design. For example, an analysis model need not include file and database access keys; they are merely design decisions and contain no essential information.
Redundancy. Normally remove derived data that optimize the database design or that were included for misguided reasons. You may need to examine data before determining that a data structure is a duplicate.
Errors. Eliminate any remaining database errors. I include this step during analysis recovery because you must thoroughly understand the database before concluding that the developer erred. In the earlier stages, an apparent error could instead have been a reasonable practice or the result of incompletely understanding the database.
Model integration. Multiple information sources can lead to multiple models. For example, it is common to have a reverse-engineered model from study of the structure and data. A forward-engineered model might be prepared from a user manual. The final analysis model must fuse any separate models.
The reverse engineering process is, of course, somewhat idealistic and not quite as neatly divided as the three stages imply. In practice, there is much iteration and backtracking. Portions of a model may proceed more rapidly than others. You will also need to backtrack to correct occasional mistakes and oversights. Nevertheless, the process provides a useful starting point, even for complex problems.
Reverse Engineering Principles
Several broad principles govern the reverse engineering process.
Don't mistake hypotheses for conclusions. Reverse engineering yields hypotheses. You must thoroughly understand the application before reaching firm conclusions.
Expect multiple interpretations. There is no single answer as in forward engineering. Alternative interpretations of the database structure and data can yield different models. The more information that is available, the less judgments should vary among reverse engineers.
Don't be discouraged by approximate results. It is worth a modest amount of time to extract 80 percent of an existing database's meaning. You can use the typical forward engineering techniques (such as interviewing knowledgeable users) to obtain the remaining 20 percent. Many people find this lack of perfection uncomfortable because it is a paradigm shift from forward engineering.
Expect odd constructs. Database designers, even the experts, occasionally use uncommon constructs. In some cases, you won't be able to produce a complete, accurate model of the database because that model never existed.
Watch for a consistent style. Databases are typically designed using a consistent strategy, including consistent violations of good design practice. You should be able to deduce the underlying strategy.