- Views of Data
- A Brief History of Data Architecture
- Advanced Data Management·Meta-data
- Graphics·Data Modeling
- Using Entity/Relationship and Object Models
- Data Modeling Conventions
- Entity/Relationship Model Validation
- The Requirements Analysis Deliverable·Column One
- Data and the Other Columns
Data Modeling Conventions
Data modeling is a relatively new field, and standards have not yet fully been laid out. There are actually three levels of conventions to be defined in the data modeling arena:
The first is syntactic, about the symbols to be used. Barker's technique, the UML, and IDEF1X are all examples of syntactic conventions.
Positional conventions dictate how entity types are laid out. These guide the shape of the model. Often they are not followed, resulting in models that are chaotic, confusing, and very difficult to read.
Semantic conventions describe standard ways for representing common business situations. Semantic conventions are relatively new in the industry. They were first described in 1995 book by David Hay, Data Model Patterns: Conventions of Thought, and then in the 1997 book, Analysis Patterns, by Martin Fowler. Other books on the subject have been published since then.
These three sets of conventions are, in principle, completely independent of each other. Given any of the syntactic conventions described here, you can follow any of the available positional or semantic conventions. In practice, however, promoters of each syntactic convention typically also promote at least particular positional conventions.
Different sets of syntactic conventions are presented in Appendix B of this book. The appendix is organized in terms of the appropriate audience for each. The basic elements are the same across all the techniques, but each approach is concerned with different kinds of details, as was deemed appropriate for its particular audience.
All notations have symbols for entity types (or object classes), relationships (or associations), and attributes. The Barker notation has an additional symbol to represent the case where an entity type may be related to one or another different entity types, but not both. IDEF1X has special symbols to represent foreign key implementations. The UML has room for describing the behavior of entity types, and the ability to describe business rules between relationships.
Positional—The Crow's Foot Rule
The Barker technique imposes an additional constraint upon the drawing, over and above determining what symbols to use to represent different things: Entity types on a drawing are to be arranged so that the crow's feet in the relationships point either to the top of the diagram or to its left. In addition, boxes are stretched so that all relationship lines are straight, without “elbows”.
This has the effect of providing some sort of “shape” to the drawing. A drawing with a random assortment of boxes and lines going every which way is very difficult to read. Figure 3.24, for example, shows a data model organized randomly. What is the model about? Hard to say, really. People seem to be visiting someone. A block party?
Figure 3.24. A Random Model.
Following the positional convention has the effect of placing the reference entity types (those that don't depend on any others) in the lower right, and the intersect or transaction entity types in the upper left. This makes it much easier to see the basic elements in the model, separately from what is done to them. Figure 3.25 is the model from Figure 3.24 rearranged according to this rule. Here it is easier to see that the model is basically about STUDIES of PEOPLE. OBSERVATIONS of VARIABLES are then collected on PEOPLE during these STUDIES.
Figure 3.25. An Orderly Model.
There are those with the opinion that crow's feet should point down and to the right. While your author stands firmly against this heresy, the fact of the matter is that as long as you are consistent, you will achieve the same result.
Semantic—Data Model Patterns
If you follow the crow's foot rule, you will begin to notice certain things about your models. The aesthetics of the drawing suggest similarities. Figure 3.26, for example, shows a model of PURCHASE ORDERs, each of which is to a VENDOR and composed of one or more LINE ITEMS, each of which is for a PRODUCT TYPE.8
Figure 3.26. Purchase Orders.
Figure 3.27, on the other hand, shows a model of sales orders. Sales orders, of course, are handled by a completely different department from purchase orders, so there would seem to be no relationship whatsoever between them. In this figure, a SALES ORDER must be from one CUSTOMER and composed of one or more LINE ITEMS, each of which is for a PRODUCT TYPE.
Figure 3.27. Sales Orders.
As you look at these two models, what do you see? First, the structures of SALES ORDER and PURCHASE ORDER are identical. Each must be composed of one or more LINE ITEMS, each of which is for a PRODUCT TYPE. The big difference is that a PURCHASE ORDER must be to a VENDOR, and a SALES ORDER must be from a CUSTOMER. But what are CUSTOMER and VENDOR? Each of these is fundamentally either a PERSON or an ORGANIZATION. Having said that, we can now recognize that our own enterprise is in fact an ORGANIZATION.
An ORGANIZATION or a PERSON is only a “customer” if it is a buyer in a (purchase or sales) ORDER. It is only a “vendor” if it is a seller in such an ORDER.9 Thus, the real model is of an ORDER that has two relationships to PARTY: it must be from one PARTY and to another PARTY. Each party may be the buyer in one or more ORDERS, and each PARTY may be the seller in one or more ORDERS. This is shown in Figure 3.28.
Figure 3.28. Orders.
The vendor's sales order is exactly the same order as the customer's purchase order. If we are the buyer, we call it a PURCHASE ORDER. If we are the seller, we call it a SALES ORDER. Any order always has both a buyer and a seller. Moreover, some organizations might well be both buyer and seller. The fact of the matter is that our organization is assuming one or the other of those roles.
Note that both PERSON AND ORGANIZATION share roles in an ORDER. In a large model, they will share other roles as well, so it is useful to define a super-type PARTY which is defined as either a person or organization of interest to the enterprise.
It turns out that this model for ORDER (which could also be called CONTRACT or AGREEMENT, or whatever) is common to any enterprise that does business with others. Once you recognize that this is the generic pattern for orders, you can now use it for almost any company. It may have to be elaborated on, but the underlying structure will be pretty much the same, whether the company is selling advertising for a cable television network, gasoline, pharmaceuticals, aluminum futures, or what have you.
In addition, there are other patterns available that apply to most commercial businesses. Many of these are laid out in Data Model Patterns, and a few will be presented below. These can be viewed from a high level, describing common business phenomena, or at a lower level, describing components of those higher-level models.
At the higher level, you have (among others):
People and organizations
Products, assets, or materials
At the lower level, you have (among others):
These are described more fully in the following sections.
People and Organizations
As described above, a PARTY is a person or an organization of interest to the enterprise. In Figure 3.29, you can see that not only is a PARTY either a PERSON or an ORGANIZATION, but an ORGANIZATION, in turn, must be either a COMPANY, an INTERNAL ORGANIZATION (such as a department), a GOVERNMENT AGENCY, or an OTHER ORGANIZATION. Depending on your situation, you can define sub-types further, itemizing kinds of COMPANIES, for example.
Figure 3.29. Parties.
Note that the sub-type structure is fundamental. That is, any PARTY must be either a PERSON or an ORGANIZATION and not both. ORGANIZATION is then further subcategorized. There are in fact other kinds of classification that companies often wish to identify, but these are not so fundamental. These categories could be demographic categories, for example, such as “annual income”. These can be accommodated by the entity types PARTY CATEGORY, PARTY CATEGORY SET, and PARTY CLASSIFICATION.
A PARTY CATEGORY is one of those classifications that someone is interested in, such as “Income greater than $50,000 per year”. A PARTY CLASSIFICATION is the fact that a particular PARTY falls into that category, such as “Sam has an income of greater than $50,000 per year.” Note that the PARTY CATEGORY, “Income greater than $50,000 per year”, must be part of the PARTY CATEGORY SET, “Annual Income”.
Note also that a PARTY CLASSIFICATION must also be by someone (a PARTY). This could be the Marketing Department, for example.
PARTIES are related to each other. An INTERNAL ORGANIZATION may be part of another INTERNAL ORGANIZATION; a PERSON may be married to another PERSON; a PERSON may be a member of an OTHER ORGANIZATION, such as the Teamsters Union or The Data Administration Management Association. Each of these is an example of a PARTY RELATIONSHIP from one PARTY to another. Each PARTY RELATIONSHIP, in turn, is also an example of a PARTY RELATIONSHIP TYPE, such as “organizational structure”, “marriage”, or “membership”. This is also shown in Figure 3.29.
Note that making the entity types more generic makes any systems based on them more robust. For example, new categories of PARTY can be added without changing the data structure. This is at the cost, however, of losing representation of the business rules that lie behind the data. There might be rules that say, for example, that only PEOPLE can participate in the PARTY RELATIONSHIP of PARTY RELATIONSHIP TYPE “marriage”, or that a DEPARTMENT in PARTY CATEGORY “sales office” can only report to a DEPARTMENT in PARTY CATEGORY “sales district”.
These rules must be documented separately from the model drawing.
It may seem that an appropriate attribute of party is “Address”. But unfortunately, many parties have more than one address. These include “billing address”, “delivery address”, “home address”, and so forth. To have multiple addresses as an attribute of party would thus violate First Normal Form.
What is required is a separate “address” entity type, shown in Figure 3.30 as SITE. A PARTY PLACEMENT is then defined as the fact that a PARTY is located at a particular SITE. Thus, not only can one PARTY be located in one or more SITES, but one SITE may be the location of one or more PARTIES.
Figure 3.30. Geography.
Note that by recognizing both PHYSICAL SITE and VIRTUAL SITE, the same entity type can be a place to store not only street addresses, but also telephone numbers, web addresses, and e-mail addresses.
A SITE is a particular virtual or real place with a purpose, such as a house, office, or website. A SITE must be an example of a SITE TYPE, such as “one-family home”, “office building”, “archeological dig”, “warehouse”, and so forth. A PARTY PLACEMENT—the fact that a particular PARTY is located at a particular SITE—must be an example of a PARTY PLACEMENT TYPE, such as “home address”, “billing address”, etc.
Each PHYSICAL SITE, in addition, must be located in at least one but possibly more GEOPOLITICAL AREAS. A GEOPOLITICAL AREA is a kind of GEOGRAPHIC AREA whose boundaries are defined by law or international treaty. A GEOGRAPHIC AREA is simply any bounded area on the earth. If it is not a GEOPOLITICAL AREA, a GEOGRAPHIC AREA may be either an ADMINISTRATIVE AREA (whose boundaries are defined by an enterprise's policies), a SURVEYED AREA (whose boundaries are defined by a survey, in terms of townships, sections, and the like), or a NATURAL AREA (whose boundaries are determined by a natural phenomenon, such as a lake or habitat).
So, what does the company make? What does it use? What does it otherwise manipulate? These fundamentally are products. The word “product” is troublesome, however, because while in principle it refers to anything tangible that can be bought, sold, or handled, in fact many companies' products (like those of banks) are intangible. In many ways, though, even these behave the same as tangible ones, such as computers and steam compressors.
In this model we will constrain PRODUCT to mean something discrete that is bought or sold, distinguishing it from EQUIPMENT which is used in the manufacturing process and MATERIAL such as powder or goo. The super-type we will define that encompasses all of these is ITEM. (In Data Model Patterns, this is called ASSET.)
We want to distinguish between ITEM TYPE, which is the definition of a thing, such as might be found in a catalogue or specification sheet, and ITEM, an occurrence or instance of the thing that exists in a physical place. In Figure 3.31 ITEM TYPE is shown with the sub-types PRODUCT TYPE, MATERIAL TYPE, EQUIPMENT TYPE, and OTHER ITEM TYPE, as were described above.
Figure 3.31. Items and Item Types.
ITEM, on the other hand, distinguishes between INVENTORY, whose primary attribute is “Quantity”, and DISCRETE ITEM, whose primary attribute is “Serial number”. That is, a DISCRETE ITEM can be identified, piece by piece, while INVENTORY is an undifferentiated quantity of things.
Each ITEM must be (currently) located at only one SITE. This implies that an INVENTORY is defined to be that quantity of a particular ITEM that is stored in a particular SITE, just as a DISCRETE ITEM is a single object stored in a particular SITE.
Also an ITEM or ITEM TYPE may refer to anything—finished good, intermediate, or raw material. By itself, an occurrence of ITEM or ITEM TYPE tells you nothing of its composition. For this we need ITEM TYPE STRUCTURE ELEMENT and ITEM STRUCTURE ELEMENT.
At the ITEM TYPE level, an ITEM TYPE STRUCTURE ELEMENT is the fact that a particular ITEM TYPE is a component of another ITEM TYPE. This information is usually found in engineering specifications. That is, each ITEM TYPE may be part of one or more ITEM STRUCTURE ELEMENTS, each of which must be the use (of that ITEM TYPE) in one and only one other ITEM TYPE. Looking at it from the other direction, each ITEM TYPE may be composed of one or more ITEM TYPE STRUCTURE ELEMENTS, each of which must be the use of one and only one other ITEM TYPE.
An ITEM STRUCTURE ELEMENT is similar. Instead of being concerned with ITEM TYPES, however, it is concerned with actual occurrences of ITEMS. That is, each ITEM STRUCTURE ELEMENT must be the use in a physical ITEM and the use of a physical ITEM. Actually, an ITEM may contain not only identifiable other ITEMS, but also a quantity of an unidentified ITEM TYPE, such as “water” or “natural gas”. Consequently, what the model actually says is that an ITEM STRUCTURE ELEMENT may be either the use of an ITEM or the use of an ITEM TYPE. (The arc across the two relationships denotes this “exclusive or” concept.)
Ideally, each ITEM STRUCTURE ELEMENT would be based on an ITEM TYPE STRUCTURE ELEMENT. Whether this is possible or not in your company is another question.
Note that attributes of both ITEM TYPE STRUCTURE ELEMENT and ITEM STRUCTURE ELEMENT include “Quantity per” (the amount of the component required to make one unit of the assembly), “Effective date”, and “Until date”.
The work of an enterprise is defined by its activity types. In Column Two (Chapter 4), we will model the nature of activity types. To the extent that activities and activity types are themselves things of significance to the business, however, with data describing them, they will show up here in the Column One model.
An ACTIVITY is an example of either a SERVICE or an ACTIVITY TYPE. If we are talking about something offered for sale, it is probably a SERVICE. If, instead, we are talking about something done inside the company only, it could be called an ACTIVITY TYPE or PROCEDURE. Structurally, these are identical. A SERVICE or ACTIVITY TYPE may be embodied in one or more ACTIVITIES. Each ACTIVITY must occur on a particular “Date” (and optionally, “Time”), and must be performed at a SITE. All of this is shown in Figure 3.32.
Figure 3.32. Activities.
Also shown in Figure 3.32 is the fact that an ACTIVITY may be part of a WORK ORDER. A WORK ORDER is a specific authorization for a relatively large effort to be carried out. A WORK ORDER is usually composed of one or more ACTIVITIES.
A WORK ORDER must be either a PRODUCTION WORK ORDER (to make an ITEM TYPE) or a MAINTENANCE WORK ORDER (to fix, install, or replace a DISCRETE ITEM).
In those companies where the model is concerned primarily with SERVICES that are sold to customers (or bought, for that matter), it may be simpler to make SERVICE a sub-type of ITEM TYPE. This has some odd implications, but most relationships actually apply both to SERVICE and the more tangible kinds of ITEM TYPES.
Figure 3.33 expands on the ACTIVITY idea, adding the PARTIES who participate in a WORK ORDER or in a particular ACTIVITY and adding also the consumption of labor and other resources by an ACTIVITY.
Figure 3.33. Activity Management.
Note the attributes shown in this diagram. You can see that ACTIVITY, for example, has a “Scheduled start date”, “Scheduled end date”, “Actual start date”, and “Actual end date”. These position the ACTIVITY in time. By specifying scheduled dates only, you can define a planned activity before it is actually carried out. Similarly, WORK ORDER had an “Order date” and a “Due date”, as well as a “Completion date”.
A WORK ORDER ROLE is the fact that a PARTY (a PERSON or an ORGANIZATION) has something to do with a WORK ORDER. This could be its manager, someone contributing to its execution, or even a beneficiary of it. Each WORK ORDER ROLE must be played by a PARTY, for a WORK ORDER.
Similarly, an ACTIVITY ROLE is the fact that a PARTY has something to do with an ACTIVITY. Each ACTIVITY ROLE must be played by a PARTY, for an ACTIVITY.
WORK ORDERS and ACTIVITIES consume both labor and other resources, such as materials. The fact that time is consumed for an ACTIVITY is called a TIMESHEET ENTRY. Each TIMESHEET ENTRY must be by one PERSON and charged to a single ACTIVITY. Its most interesting attribute is “Hours”: the number of hours (including fractions of hours) spent by this PERSON on this particular ACTIVITY. An attribute of the PERSON is “Charge rate”: the dollars per hour charged for this PERSON'S work. This allows us to calculate the derived attribute “(Value)” as the “Hours” from this TIMESHEET ENTRY times the “Charge rate” of the PERSON that is the source of this TIMESHEET ENTRY. (Parentheses denote the fact that the attribute is derived.)
The “(Value)” of all the TIMESHEET ENTRIES that are charged to an ACTIVITY can then be summed up to yield the derived attribute “(Labor cost)” for the ACTIVITY. The “(Labor cost)” of all ACTIVITIES that are part of a WORK ORDER can similarly be summed up to yield the “(Labor cost)” of the WORK ORDER.
A RESOURCE USAGE is the fact that an ITEM or ITEM TYPE is consumed during the course of an ACTIVITY. That is, each RESOURCE USAGE must be of either a particular ITEM (a specific DISCRETE ITEM or an INVENTORY), or of a generic ITEM TYPE such as “natural gas” or “water”.
As with the calculation of labor cost, the cost of resources used combines the “Unit cost” of an ITEM or the “Standard cost” of an ITEM TYPE with the “Quantity” used of the RESOURCE USAGE. The “(Value)” of the RESOURCE USAGE is calculated by multiplying its “Quantity” by either the “Standard cost” of the ITEM TYPE that is consumed as the RESOURCE USAGE or the “Unit cost” of the ITEM that is consumed as the RESOURCE USAGE. The attribute “(Value)” can then be summed across all the RESOURCE USAGES that are charged to an ACTIVITY, to get the “(Resource cost)” of that ACTIVITY. Similarly, the “(Resource cost)” of an ACTIVITY can be summed across all ACTIVITIES that are part of a WORK ORDER to compute the total “(Resource cost)” for that WORK ORDER.
The “(Total cost)” of either an ACTIVITY or a WORK ORDER can then be computed by adding together that entity type's “(Labor cost)” and “(Resource cost)”.
The business of any enterprise is contracts—purchase orders, sales orders, leases, and the like. As we have seen, all contracts fundamentally have the same structure. In the introduction to this section we saw that a SALES ORDER and a PURCHASE ORDER are simply examples of different kinds of ORDERS. Here we will generalize the concept even further to the idea of CONTRACT. (ORDER is simply a kind of CONTRACT.) A CONTRACT can be any agreement between two PARTIES¸ for the supply from one to the other of any product or service. This is shown in Figure 3.34. In addition to a simple SALES ORDER or PURCHASE ORDER, it could be a lease, a financial instrument, or an employment contract, among others.
Figure 3.34. Contracts.
In most cases, a CONTRACT is simply from one PARTY (the buyer) and to another PARTY (the seller). There can be multi-party CONTRACTS, but these are rare enough to leave the model of that to the reader. In addition to the primary PARTIES, however, there can be other PARTIES playing various CONTRACT ROLES. These could be “contract manager”, “vendor agent”, and so forth.
Each CONTRACT must be composed of one or more references to the things being sold—LINE ITEMS. Each LINE ITEM, in turn, must be for either a SERVICE or an ITEM TYPE. Again, as mentioned above, if most of the enterprise's ACTIVITIES are bought from contractors or sold to the public, it may be appropriate to make SERVICE or ACTIVITY TYPE simply a sub-type of ITEM TYPE. Then each LINE ITEM must be simply for one ITEM TYPE. It seems a little strange, but it works. Virtually all of the relationships defined for ITEM TYPE can also apply to SERVICE.
Note that each CONTRACT may be over (the basis for) one or more other CONTRACTS. For example, a blanket purchase order may be the basis for one or more specific purchase orders.
The CONTRACT and its LINE ITEMS describe what has been ordered. The nature of delivery against a CONTRACT depends on what that was. If a SERVICE was ordered, the contract may be fulfilled by conduct of one or more actual ACTIVITIES, as described above.
On the other hand, an ITEM TYPE bought via the LINE ITEM may be fulfilled by delivery of one or more specific ITEMS.
Note the business rule that requires that ACTIVITY be an example of the same ACTIVITY TYPE that the LINE ITEM was for. Similarly, each ITEM delivered under a LINE ITEM should be an example of the same ITEM TYPE that the LINE ITEM is for.
The above patterns describe aspects of the business in business terms. Close examination of these models, however, reveals that within each pattern there are components that are themselves significant patterns. There are many of these, but two in particular are worth discussing here.
The first is the model of a hierarchy and a network. Note in Figure 3.28 that each CONTRACT may be over one or more other CONTRACTS. This is a simple hierarchy of a sort that occurs frequently. Note, however, that each CONTRACT may be under only one CONTRACT. This imposes a constraint that may not always be appropriate.
In Figure 3.31, reproduced here as Figure 3.35, each ITEM TYPE was shown as being composed of one or more other ITEM TYPES and being part of one or more other ITEM TYPES. That is, each ITEM TYPE is composed of one or more ITEM TYPE STRUCTURE ELEMENTS, each of which must be the use of another ITEM TYPE. Each ITEM TYPE may also be part of one or more ITEM TYPE STRUCTURE ELEMENTS, each of which must be the use in another ITEM TYPE.
Figure 3.35. Product Structure.
An example of such a structure is shown in Figure 3.36. This is sometimes called a bill of materials—a list of component assemblies and parts that comprise a manufactured product. Note that the two models of bicycles and all of their components are really ITEM TYPES. Each line connecting two (such as from “Bicycle B” to “Crank Assembly T-23”) is represented by an occurrence of ITEM TYPE STRUCTURE ELEMENT. Note that an attribute of ITEM TYPE STRUCTURE ELEMENT is “Quantity per”—the “1” and “24”, which is the quantity of the component required to build one of the assembly.
Figure 3.36. A Sample Product Structure.
Each “Bicycle B”, for example, requires but 1 “Frame X-23”, “Crank Assembly T-23”, “Front Wheel Assembly K-43”, and “Rear Wheel Assembly D-64”. On the other hand, “Front Wheel Assembly K-43” and “Rear Wheel Assembly D-64” each require 24 “18” Spokes”.
Note that this model pattern can describe more than a product structure. We used it in Figure 3.29 to describe the complete set of possible relationships among parties. A department, for example may be composed of several other departments, and over time it may report to several other departments. A critical-path PERT chart is also a network, relating projected activities to each other. There are many different uses for this configuration.
In Figure 3.29, the fact is that a PARTY RELATIONSHIP, for example, must be an example of one and only one PARTY RELATIONSHIP TYPE. That is, the WORK ORDER ROLE TYPE describes the kind of relationship between two PARTIES. The set of PARTY RELATIONSHIP TYPES is defined in advance, before any actual PARTY RELATIONSHIPS exist. The idea of a “...TYPE” entity type that categorizes an entity type occurs throughout any typical complete model. Figure 3.37 shows some of the ...TYPE entity types that could appear in our model: a WORK ORDER ROLE TYPE defines the kinds of WORK ORDER ROLES that could be played in a WORK ORDER, such as “authorizer”. (See Figure 3.33.) A CONTRACT ROLE TYPE defines the kind of CONTRACT ROLE being played in a CONTRACT, such as “contract manager”, “attorney of record”, and so on. (See Figure 3.34.) An ACTIVITY ROLE TYPE defines the kind of ACTIVITY ROLE being played in an ACTIVITY—“project manager”, “chief engineer”, and the like. (See Figure 3.33.) A PARTY RELATIONSHIP TYPE is the kind of PARTY RELATIONSHIP: “spouse”, “departmental structure”, “union membership”, and so forth. (See Figure 3.29.)
Figure 3.37. Types.
Note that each thing must be an example of one and only one thing type. This is exactly the same information conveyed by the sub-type/super-type structure. In Figure 3.38, for example, the occurrences of PARTY TYPE will be “Person”, “Organization”, “Company”, “Department”, “Government Agency”, and “Other Party”.
Figure 3.38. Party Types.
The advantage of using the ...TYPE entity type instead of sub-types and super-types is that it can easily be changed. Rows can be added and deleted as necessary. The advantage of super-types and sub-types, however, is that each can have a different set of attributes. In deciding which approach to take, judgment is required.
Even though this adds redundancy, it can be useful to have both. First, more detailed PARTY TYPES can be specified in an implemented system, without having to restructure the database. Note that each PARTY TYPE may be a super-type of one or more other PARTY TYPES.
Second, it is sometimes useful to be able to refer to the …TYPE as an entity type. Another entity type could be related to one or more occurrences of this …TYPE.