16.3 Data Warehouse Specification
This section deals with the data warehouse specification. An XML data warehouse is defined as a set of materialized views. In the first subsection we present our view model for XML documents. Next, we present the graphic tool that enables the data warehouse designer to specify the XML views.
16.3.1 View Model for XML Documents
Since the data warehouse is defined as a set of views, the main issue of data warehouse definition is the view model. We present briefly in this section the main characteristics of our view model. This model has been presented in detail in work by X. Baril and Z. Bellahsène (Baril and Bellahsène 2000).
Our view model fulfills the following requirements:
Closure property: A view defined on XML document(s) should yield an XML document as output. This allows us to transparently use a view or a document. From the data warehouse point of view, this property implies that the unified view of sources is an XML document.
Restructuring possibilities: The view mechanism enables restructuring elements of the source(s) document(s). We can distinguish two classes of views: (1) select views that extract existing documents from sources, and (2) composite views that create new elements or attributes. For this latter class, new elements of the result may be created from several source elements. Furthermore, aggregation functions (i.e., sum, avg, min, max, count, etc.) can be used to define new values. Moreover, sorting and grouping elements is also provided.
DTD inference: The view result should be associated to a DTD. This DTD is inferred from the view definition and possibly from source DTDs if they exist. The inferred DTD can be used to optimize the view storage or to query the view. From the data warehouse point of view, the inferred DTD is used to give a global integrated schema on which user queries can be formulated.
Each view is composed of a result pattern that specifies the structure of the result. This result pattern uses variables that are defined in fragments. A fragment is a collection of patterns: Each pattern uses variables to define data to match in a source. A fragment is composed of several patterns defining the same variables on different sources and provides the union of their data.
For example, let us consider a source "senior.xml" containing information about senior researchers and a source "senior.xml" containing information about Ph.D. students. To define a fragment "f1" containing the names and birthdays of senior researchers and Ph.D. students, we would define two patterns: one pattern matching names and birthdays of the senior researchers on source "senior.xml" and another one matching names and birthdays of the Ph.D. students on source "student.xml".
To define composite views, the result pattern can be based on several fragments. For this purpose, fragments are linked using join conditions. A join condition involves two variables defined in two different fragments.
Listing 16.1 shows a complete example of a view specification involving two fragments. Let us consider a view retrieving for each author their name, surname, and a list of the titles of their publications. The view is composed of a result pattern, two fragments, and a join condition. The fragment "f3" contains a pattern that matches the "author" elements, while "f4" contains a pattern that matches the "inproceedings" elements, with their "title" attribute and "authorlink" subelements. These subelements contain a "ref" attribute that references the author of the publication. The join element gives the join condition between the two fragments "f3" and "f4". The result element contains the result pattern. Each item of the view result is an "author" element, containing a "name" attribute (having the value of the "name" variable) and a "title" subelement (having the value of the "title" variable). The group-by element indicates that the result is grouped by "name" values (i.e., for an author there are possibly several "title" subelements). The part of the DTD validating this specification is presented in section 16.4.2, "View Definition," later in this chapter.
Listing 16.1 Example of View Definition
<view id="authorspublications"> <result> <result.node name="author" type="element"> <result.node name="name" type="attribute"> <result.node name="name" type="variable"></result.node> </result.node> <result.node name="title" type="element"> <result.node name="title" type="variable"></result.node> </result.node> </result.node> <groupby name="name"></groupby> </result> <fragment id="f3"> <pattern id="p3" source="1"> <pattern.node name="author" type="element"> <pattern.node name="id" type="attribute" bind="id"></pattern.node> <pattern.node name="name" type="attribute" bind="name"></pattern.node> <pattern.node name="surname" type="attribute" bind="surname"></pattern.node> </pattern.node> </pattern> </fragment> <fragment id="f4"> <pattern id="p4" source="1"> <pattern.node name="inproceedings" type="element"> <pattern.node name="title" type="attribute" bind="title"></pattern.node> <pattern.node name="authorlink" type="element"> <pattern.node name="ref" type="attribute" bind="ref"></pattern.node> </pattern.node> </pattern.node> </pattern> </fragment> <join leftfragment="f3" leftvariable="id" rightfragment="f4" rightvariable="ref"> </join> </view>
16.3.2 Graphic Tool for Data Warehouse Specification
We propose a graphic tool to help the user in the specification of the data warehouse. The editor allows us to create this specification without knowledge of the exact structure of the warehouse definition. We have proposed (in Baril and Bellahsène 2001) helpers for view definitions that we plan to integrate with DAWAX. These helpers allow us to define patterns without knowledge of the source structure. They use the DTD (if available) and the dataguide to propose choices for the pattern specification.
Figure 16.2 shows the graphic editor for the data warehouse specification. The XML document defining the data warehouse is represented as a tree. New elements (sources, views, fragments, etc.) can be added by way of a contextual popup menu. The popup menu suggests possible choices for adding or updating the current element. In the example, the popup menu for a view element suggests the addition of a fragment or a join, and the deletion of a view.
Figure 16.2. Data Warehouse Definition Editor
The fragment "f4" of the view given as an example is displayed in Figure 16.2. It contains a pattern ("p4") with an identifier and a source attribute. The root pattern node of the pattern is displayed, and due to space limitations, its child nodes are not expanded in the tree.