Designing an Application
Whether the object is to understand and modify an existing computer-based business system or to create an entirely new one, a large obstacle to successful engineering is our inability to analyze and communicate the numerous interacting activities that make up our business process. Conversational languages such as English prove too ambiguous to be effective. More formal languages are unintelligible to most functional (business) experts. What is needed instead is a technique that structures conversational language to eliminate ambiguity, facilitating effective communication and understanding.
Database modeling has progressed exponentially in recent years. There are many good tools available, creating competition, which further moves the techniques along. On the other hand, process modeling has not progressed as well. Database modeling tools cater to the current Relational Database Management Systems standard SQL (Structured Query Language). We still have only the beginnings of process models because there is no standard process language. We have Third Generation Languages (3GLs) and Fourth Generation Languages (4GLs) being used in a myriad of combinations (A 3GL is a high-level programming language that is structured, modular, and usually procedural. A 4GL is a high-level programming language that is non-procedural and declarative). We have COBOL, C, and Java mixed with PowerBuilder, Visual Basic, and various Web development tools used in the same shop and sometimes on the same project. Currently, many vendors are working to create process-modeling tools that will generate target code for many popular process program languages. Be aware that a good database design is not just a nice relational model but also one that supports the business processes with data manipulation SQL that provides for fast access coupled with integrity.
Why Create a Model?
Modeling is an effective technique for understanding and communicating that has been used for centuries. In a process model, extraneous detail is eliminated, thus reducing the apparent complexity of the system under study. The remaining detail is structured to eliminate any ambiguity while highlighting important information. Graphics (pictures, lines, arrows, graphic standards) are used to provide much of the structure, so most people consider process models to be pictorial representations. However, well-written definitions of the objects appearing in the model, as well as supporting text, are also critical to the model.
In engineering disciplines, the model is typically constructed before an actual working system is built. In most cases, modeling the target business process is a necessary first step in developing a database application. It becomes a road map that will establish the final destination. Deciding the exact functionality of the target destination is essential. It must be captured and represented in as much detail as possible. Figure 3.1 shows a sample model illustrating a database connected to multiple servers with MS SQL Server 7 databases.
A Model Is a Tool to Facilitate the Design Process
Once a model is created (see Figure 3.2 for an example of a database model) and is made available to the developers, refinement of the product can be accomplished at a more logical level, increasing the likelihood of a successful software product. Even though there are tools available to facilitate the construction of a model, no tool is able to tell you what you want to do. Formulating your requirements with a tool such as Visio, which can be used to create graphical representations of almost anything, can be helpful (see Figure 3.1). Although you will still need to do the modeling with the business experts, these tools can help with presentation and control. For more information regarding the Visio product, visit http://www.microsoft.com/office/visio/.
Figure 3.1 Model of database/server configuration.
Figure 3.2 Model of a database for a sample Web application.
With process modeling, you can look at a system of interest in depth. This allows you to analyze, understand, and communicate to others the subtle nuances of the organization. There are several reasons to begin a process-modeling project. First, it is an excellent way to document a multiple-step business process. One of the rules of thumb of business is that the longer a document is, the less it will be read. You can look over a short document with a Visio or PowerPoint diagram in a reasonable amount of time. Process modeling is also a valuable technique for gaining consensus on what is being accomplished and quickly proposing alternative approaches. It costs much less to develop a model than it does to develop a whole information system, especially if the new approach merely creates a new set of problems and inefficiencies.
Develop an Entity Relationship
Once you have a basic model of the business process functionality, you should gather information about the data requirements to support that functionality. Before computer aided software engineering (CASE) tools became available, this was a tedious process. Even CASE was sometimes too large to handle. Thankfully, tools have evolved and now there are a host of viable alternatives (for example, ERwin, Together, Rational Rose).
Entity Relationship Basics
Database design is critical. The degree to which it is completed before full-scale development can make or break the development process. The process of designing an application database is unique to each development environment. We will discuss it in general terms and then we will explore how CASE tools can facilitate the process. The specifics of the process may vary from shop to shop.
Determine the Application Entities
An entity is a person, place, object, event, or activity that is of interest to the functionality being created. It can be thought of as any noun that can represent information of importance to the organization. In logical database design, an entity/relationship model is built by identifying the entities and their attributes as well as the relationships between the different entities.
Database logical design begins as soon as you meet the users. Entities can be determined through interviews with the users. As an example, when developing a database to track the schedules at a television station, entities may include series and episodes. The attributes of an entity are the things that describe and define it; for example, a series has a descriptive name, so name is an attribute of series. The date it was originally produced may also be an attribute.
Relationships can also be revealed through user interviews. To further define a relationship, if an entity cannot exist without a parent, it is dependent (that is, it belongs to another entity). This is a relationship.
Subentities are important subsets of an entity with special attributes. To continue the example of a television station, a TV series may be of type special with attributes that only pertain to a special series. We might have a subentity called special. Special is a part of a series (that is, it could be merged with series and has the same primary key), while a dependent entity, such as an episode belonging to a series, has a primary key that contains the parent key and an additional key to identify the dependent.
Refine Each Entity and Attribute
After you have collected the business entities and attributes from interviews with the user, the project standards should be checked for naming conventions. When applicable, these objects should be checked using firm-wide abbreviations to ensure consistent naming. Next, description and validation criteria for each entity and attribute should be captured. If the project has a data dictionary, the description and validation criteria should be entered in it. In addition, the dictionary should be checked to determine whether the attribute or entity exists already.
PowerBuilder has its own catalog tables that can house descriptions and validation rules. These attributes can be populated from CASE tools such as ERwin or PowerDesigner. (For more information about ERwin, go to http://www.cai.com/products/alm/erwin.htm; for more information about PowerDesigner, go to http://www.sybase.com/products/enterprisemodeling/powerdesigner.) The primary key for each entity should be determined. The primary key of an entity uniquely identifies entity instances (rows). The primary key of a dependent entity includes parent key and descriptive column. In our television series example, the primary key for the episode entity is the series_code and episode_code concatenated. You can document past or future states in detail by adding last_update_timestamp. The primary key will promote database integrity by preventing table row duplication and the creation of database orphans (that is, rows in a dependent table with no parent).
Determine Relationships
Relationships can be determined through interviews with the user. The roles of entities in recursive relationships should be documented. A series may be a part of another series. Determine the cardinality of the relationship, which is the number of instances of one entity associated with the instances of another entity. For example, how many episodes are there in a series? This will facilitate the choice of keys or indices to access the entity.
Relationships should not be confused with entities or attributes. Attributes that designate entities are relationships. If the primary key of an entity consists of other primary keys, it may be a relationship. In our example, the TV episode entity would have a primary key consisting of the TV series identifier and the episode identifier. The TV series identifier is the primary key of the series entity.
CASE Tools "Lite"
So how do you create an entity relationship diagram in the new world? ERwin and PowerDesigner are generally considered to be the best choices for serious developers when it comes to CASE tools. They are similar and both perform the really important functions. In this book, we will use ERwin for illustrating the capabilities of these newer CASE tools. ERwin's stated focus is on quickly creating high quality, robust physical databases. ERwin is sold as a database design tool. ERwin does not create processing modeling diagrams, just entity relationship diagrams (these are known as ER diagrams; hence the name ERwin). ERwin's specialization is on the physical side. Around the time of this book's publication, ERwin 3.5 was released with support for MS SQL Server 7 and several other database management systems. When you finish modeling the database, you can have ERwin build the data definition language. This is essentially what has been missing from upper CASE tools.
Using old, heavy, and complex CASE tools such as ADW, you may have found that even after spending a couple of years designing something, you still didn't know much more about good relational database design, which is critical for client/server success. The newer, more lightweight tools, such as ERwin and PowerDesigner, show you the physical definition right from the start, not hiding it like these older CASE tools do. To build a good database model for physical implementation using MS SQL Server 7, you have to think like a relational database.
Benefits of "Lite" CASE
We can break the benefits of "lite" CASE into two categories. The first one you will notice is increased productivity. This is true for those projects that already have an existing database, and also for new projects with a blank slate for a data model.
The second benefit you will see after you create your database. The quality and robustness of the physically generated SQL DDL (Data Definition Language) is unsurpassed by even the largest and most expensive CASE tools on the market. An interesting point to note is the duration of ERwin's usefulness on a project. I have found that when tools such as ERwin are chosen, they are used throughout the entire project, without exception. Where upper CASE tools (LBMS, ADW, and so on) are used, the tool is usually abandoned somewhere during the initial development phase. I've never seen a project that adopted an upper CASE tool in the middle of development. However, on many occasions, I've seen projects that weren't using any form of CASE tool easily pick up ERwin in midstream and keep with it to the end.
Data Modeling Using ERwin
ERwin uses a traditional methodology called IDEF1X. IDEF1X is a method for designing relational databases (see http://www.idef.com/idef1x.html for more information). Not unlike other tools, with ERwin you start creating an ER diagram by placing entities (tables) on your diagram and adding relationships between them. Figure 3.3 shows the ERwin environment, which allows you to do this. MS SQL Server has its own diagramming tool, but it is immature at this stage and as such is unusable.
Figure 3.3 ERwin environment depicted with a sample application database.
Reverse Engineering
Sometimes the data we will use is contained in a legacy data model. Both ERwin and PowerDesigner include the capability to reverse engineer from any of the supported databases. Reverse engineering is the process of examining the previously existing table structure and getting your ERwin data model up to date with what's going on in the real database. In previous versions of ERwin, reverse engineering required you to have the original SQL DDL that was used to create your tables. However, quite often you can find yourself with a DDL that is out of date, or, in the worst case, you may have no DDL. ERwin 4.0 solves this problem by providing a reverse engineering option in the Tasks/Reverse Engineer menu. Figure 3.4 is the first in a series of wizard-like dialogs for the reverse engineering process.
Figure 3.4 Setting the target server to SQL Server.
If you prefer, you can still use the traditional approach in the latest version. You are basically using the Open option from the File menu to read in SQL and create a model. Here are the steps involved:
Start ERwin.
From the Tasks menu, select Reverse Engineer.
Specify the DBMS type that the SQL syntax is written for.
In the Reverse Engineer window, you now have several options. You can specify which components of SQL you want to capture. You must reverse engineer tables for obvious reasons, but you can capture foreign keys and indexes as well. You also have the ability to set case conversion options. Lastly, you can display the parse of the SQL as it is happening. When all options are set, press the Next button. Now would be a good time to calculate how much time you are saving by letting ERwin do the reverse engineering as opposed to doing it yourself! Unfortunately, you may not even have time to do thatit's pretty fast. It even runs in the background, so you can minimize and go do something else.
ERwin 4.0 has a much better method of reverse engineering than previous versions offered. With this capability, ERwin will actually connect to your DBMS and read directly from the system catalogs. This is not a one-time process, but an ongoing synchronization. Basically, you connect to the MS SQL Server 7 database you want to re-engineer. Then you choose Update Model or Alter Database from the Tasks menu bar item. The ERwin tool will read the database system catalog and build the model based on the tables, columns, indices, and so on found in the system catalog. (See Figures 3.5 and 3.6.) If a developer makes a change directly on the database, it is very easy to pick that up the next time you synchronize. This capability can only strengthen ERwin's projected longevity.
Figure 3.5 ERwin setting options for Reverse Engineer.
Figure 3.6 Choosing the file containing the DDL to be reverse engineered.
Entity Modeling
Whether you are starting from scratch or working with an existing database, there are two types of entities: independent and dependent. Independent entities are those that can be uniquely identified without depending on relationships with other entities. Conversely, dependent entities cannot be identified uniquely without depending on relationships with other entities. Both types are available within the ERwin environment shown in Figure 3.3. Sharp cornered entities are independent; entities with rounded corners are dependent. To add an entity to your diagram, select the appropriate icon from the toolbox and click anywhere on your diagram. ERwin will give your entity a default name. To change the name, you will need to use the right mouse button and click on the entity to bring up the list of available editors for that entity. Invoke the Entity-Attribute Editor.
The Entity-Attribute Editor (shown in Figure 3.7) allows you to enter not only the entity name, but also the attributes (fields, columns) for the entity. When you enter an attribute, you need to decide whether it forms part of the primary key or not, and enter it into the appropriate window. Keep in mind that what we are entering so far are only logical names for the attributes and the entity. These names can have spaces in them. The logical names will be used to generate default physical names. It might be wise to limit your logical names according to the specifications of your particular database. That way, you may never need to adjust your physical names.
Figure 3.7 Entity-Attribute Editor.
The Column Editor (shown in Figure 3.8) allows you to enter the data type and other physical properties of the particular attribute (table column) within each entity.
Relationship Modeling
One of the nicest features of ERwin is its foreign key migration capability. This is a feature you really need to understand before you start adding relationships between entities. In the IDEF1X terminology, there are two types of relationships between entities: identifying and non-identifying. Identifying relationships are used when a child entity is identified through its association with the parent entity. In other words, the foreign key column of the child table is also part of the primary key. Non-identifying relationships indicate that the child entity is not identified by its relationship to the parent entity. Identifying relationships and non-identifying relationships are represented by solid and dashed lines, respectively.
Figure 3.8 Column Editor.
Relationships in ERwin are generally referred to as foreign keys on the physical level. Normally, when you create foreign keys you first define the parent table (with a primary key) and the child table (with some columns in common). Afterwards, you add a foreign key on the child table's related columns, and make that point to the primary key of the parent table. You'll always get an error if the number of columns in the foreign key is not the same as the number of columns in the primary key of the parent table (or if the data types don't match). This is where the foreign key migration comes into play.
To create a relationship in ERwin, select the appropriate icon (Identifying or Non-Identifying) from the ERwin toolbox and click first on the parent table and then on the child table. ERwin will automatically migrate the primary key attributes of the parent table into the child table. If you draw an identifying relationship (solid line), the primary key attributes of the parent table will migrate into the primary key section of the child entity. Conversely, non-identifying relationships will migrate the primary key columns of the parent table into the non-key area of the child entity.
Take a look at Figure 3.9. It shows a dialog that is defining a non-identifying relationship. When completed, the non-identifying relationship causes the employeeId number to be automatically inserted in the non-primary key area of the employees table.
The best part about this foreign key migration is that it is dynamic. If you change the data type of a primary key column in a parent table, the change is reflected in all child tables. If you add columns to the primary key of a parent table, they are migrated down to the child tables.
Figure 3.9 ERwin relationship definition.
Referential Integrity
After a relationship has been added between tables, ERwin lets you control the referential integrity between them. To access this feature, you need to right-click on the relationship line to bring up the relationship's menu, and select RelationshipEditor. This will display the referential integrity actions (RI Actions) tab of the Relationship Editor (see Figure 3.10).
Figure 3.10 The Relationship Editor's referential integrity actions.
Once in the RI Actions table of the Relationship Editor, ERwin will display the name of the relationship. The first thing you can do is change the verb phrase and physical name of the relationship. This will be used for the name of the foreign key.
ERwin allows you to control the behavior for an insert, delete, or update on either the parent or the child. Your options for any given action are Restrict, Cascade, Set Null, Set Default, and None:
The Restrict option will cause the database to return an error if the action will violate referential integrity. For example, when a Parent Delete-Restrict setting is in effect, if an attempt is made to delete a customer who has orders, the deletion will fail.
The Cascade option will cause the database to apply any changes between the related entities. A Parent Delete-Cascade setting would cause an attempt to delete a customer with orders to also delete the associated orders. A Parent Update-Cascade setting would result in any attempt to change a customer number (the primary key of the Customer table) also changing the customer number for all orders that belong to this customer.
The Set Null option will cause the database to make the foreign key columns in the child table null during a deletion or an update. For example, a Parent Delete-Set Null option would result in Order records having a null customer number for any orders that once belonged to a deleted customer.
The Set Default option is very similar to Set Null, but instead of setting the foreign key columns to null, it resets them to their original default values.
The None option means that no action is taken to ensure referential integrity.
Not all of the options are available for each action. For example, a Child Update-Set Null setting doesn't make much sense if you think about it for a minute; why would you want to set columns in the child to null just because an update to the parent occurs? Also, not all of the options are available for foreign key declaration syntax in your particular DBMS. In fact, most of the options are not supported by the popular DBMSes. However, ERwin provides physical support for all of its options through the use of triggers. If your DBMS supports triggers, you can take advantage of most, if not all, of the options.
ERwin accomplishes this through a set of trigger templates. These templates are created with a large and feature-filled set of macros. You can customize and even create your own triggershowever, the macro language is poorly documented. This same macro language has now been extended to support stored procedures and ad hoc scripts. You can create your own stored procedure templates and apply them to tables or to the schema as a whole. This can be a powerful means for creating standard select, insert, update, and delete stored procedures to use in your DataWindows.
Supported Databases
One of ERwin's strengths is its support for a wide variety of databases. Figure 3.11 shows the Reverse EngineerSelect Target Server dialog with all of the supported databases. ERwin doesn't make a big deal out of it, but you can easily change databases on the fly. ERwin maintains a datatype map between databases, so the process of porting from one database to another is only a few clicks away.
Figure 3.11 Supported databases.
It may also be noteworthy to mention that ERwin/ERX now supports physical storage parameters for most DBMSes including MS SQL Server 7, Informix, DB2, Oracle, and Sybase to name just a few.