Modeling XML Vocabularies with UML: Part I
The arrival of the W3C's XML Schema specification has evoked a variety of responses from software developers, system integrators, XML document analysts and authors, and designers of B2B vocabularies. Some like the richer structure and semantics that can be captured with these new schemas when compared to DTDs, while others complain about excessive complexity. Many find that the resulting schemas are difficult to share with wider audiences of users and business partners.
I overlook many of these differences of opinion and simply view XML Schema as implementation syntax for models of business vocabularies. Other forms of model representation and presentation are more effective than W3C XML Schema when specifying new vocabularies or sharing definitions with users. In particular, I favor the Unified Modeling Language (UML) as a widely adopted standard for system specification and design. My goal in this article and the following two in this series is to share some thoughts about how these two standards are complementary and to work through a simple example that makes the ideas concrete.
Although this discussion is focused on the W3C XML Schema specification, the same concepts are easily transferred to other XML schema languages. Indeed, I have already applied the same techniques to creating and reverse-engineering DTDs and SOX schemas, as well as RELAX, TREX, and their new integrated offspring RELAX NG. In general, I will use the term schema with a lowercase s when referring to this entire family of XML schema languages.
The Role of Models in XML Applications
Few people can fully comprehend all aspects of a large inter-enterprise system at one time; they must divide and conquer the problem as a set of alternate models and views. Each of these models deliberately ignores aspects of the system that are not relevant to its purpose. Building these kinds of models is fundamental to the way we cope with the complexity of everyday life by ignoring unnecessary details to enable us to focus on the task at hand. Different stakeholder groups have different needs with respect to abstraction and focus.
In the context of B2B system integration, all business partners must agree on the information models that define the vocabulary for task-oriented communication. The models include the data structure for XML documents that are exchanged, as well as the process models of the extended dialogues that are required to complete complex business transactions.
Historically in system analysis and design, there have been a variety of techniques, tools, and methodologies for guiding and supporting these alternative models of the system structure and behavior. When no formal methods or tools are applied, models are still created using PowerPoint, Visio, or paper and pencil to help communicate a system's purpose and function. Even when you don't write them down, you create models in your mind as a way to comprehend the myriad of details. An XML schema is also a vocabulary model, written in the syntax of that specification language.
A high-level process for developing XML vocabularies is shown in Figure 1 below. It includes three decision points that determine the final vocabulary definition, regardless of which schema language is used. Data-oriented versus text-oriented applications may have different usage requirements. For example, a data-oriented vocabulary can be optimized for serialization of objects or database query results and its constraints should be carefully aligned with the datatypes and referential integrity constraints of its sources. These data-oriented documents may never be viewed by humans, other than by developers testing the application.
A text-oriented vocabulary often has human users who need to edit the XML documents, with or without the assistance of GUI editing tools. Its structure must be easily understood by webmasters who write stylesheets that transform and present the documents' content. An XML vocabulary design that works perfectly for data interchange might cause human users to call for the lynching of its developers. Don't forget the needs of your users when creating the XML schema!
Figure 1 UML activity diagram for schema development process.
The process diagram in Figure 1 is a UML activity diagram, which is one of nine diagram types defined by that standard. This diagram was created using Rational Rose, one of the most widely used UML modeling tools. Most of our discussion, however, is focused on the UML class diagram that is used to specify static information structure of a systeman XML vocabulary in our application context.