IDS 9.x Object-Relational Features
Before we can take advantage of IDS 9.x extensibility, we must understand its main features. Table 11 lists these features.
Table 1-1 IDS 9.x Object-Relational Features.
Primary and secondary access methods
Row type inheritance
Let's first look at the types available. These types are divided into two categories: simple types and complex types. We start with the simple types, followed by the complex types.
Informix has added two new types to handle objects that require a large amount of storage. The first type is called Character Large Object (CLOB), the other one Binary Large Object (BLOB). They are base types of the system and are stored in specialized storage called sbspaces.
Smart blobs provide two advantages over the traditional blob types (BYTE and TEXT). First, they can store a much larger amount of data (4 * 240 bytes). The second benefit is that it is possible to modify the content of a smart blob without having to extract the entire blob from the database. This can have a significant impact in the time required for data transfer.
A distinct type takes its definition from an existing type. For example, you can define a type USDollar based on the Money type. This is done with the following statement:
CREATE DISTINCT TYPE UDDollar AS money(10,2);
The use of distinct types makes database schemas more meaningful. Instead of relying on the name of a column to find out what it is for, we now have the use of specific types that provide more discrimination in their use.
Yet, a more important benefit is the addition of strong typing within the database server. Because you can now differentiate between different types of integers or decimal values, you can implement specific business processing to manipulate them. You can also decide what the interactions are between different types. This is done by implementing polymorphic user-defined routines, as we will see later.
The importance of distinct types cannot be overstated. This parallels the addition of strong typing to the C language and the use of strong typing in object-oriented languages. These advances helped reduce the programming errors in applications by ensuring that the right type of argument is used in functions or method calls. Similarly, without strong typing, it is possible to write SQL statements that will be syntactically correct and complete their execution. These same SQL statements could be wrong at the business level. Just imagine what would happen if you filled up an intercontinental plane with a specific number of liters of fuel instead of gallons.
An opaque type provides the capability to define the structure of an area of storage. Opaque types come in two flavors: fixed-size and variable-size. The variable-size opaque type can accommodate any size, as its storage can vary based on the current amount of storage required.
System architects can decide on the best way to represent their business data to provide optimal processing speed. Opaque types provide total control of the external and internal representation. If comparison operators are provided, they can also be indexed.
Implementations can take advantage of opaque types to include better ways to process business data to simplify processing and improve performance.
Informix provides three types of collection types: set, multiset, and list. A set is an unordered collection of unique values. A multiset is an unordered collection that allows for duplicate values. A list is an ordered collection of values, allowing for duplicates.
The values stored in a collection type can be of any type, including opaque types, row types, and even collection types.
Row Type and Inheritance
A row type defines a multicolumn value similar to a table definition. Row types can be named or unnamed and are used as columns in tables or as table types in a table hierarchy.
Figure 11 illustrates a table hierarchy for loans that specialize into different industries. The definition of the manufacturing table starts with the definition of a manufacturing row type that inherits from the loans row type. This definition follows the following format:
Figure 11 Table hierarchy for loans in multiple industries.
CREATE ROW TYPE manufacturing_t ( . . . ) UNDER loans_t;
Following the same logic, the manufacturing table is defined as being under the loans table:
CREATE TABLE manufacturing OF TYPE manufacturing_t UNDER loans;
Once the table hierarchy is defined, you can issue statements that operate on the entire hierarchy as one table or on a specific subtable. For example, if the loans table includes a loan_amount attribute, you can operate on the entire hierarchy to find the total loan amounts for each company branch (branch_id) as in:
SELECT branch_id, SUM(loan_amount) FROM loans GROUP BY branch_id ORDER BY branch_id;
This opens the door to interesting approaches, as we'll see in the later section on aggregates. This feature also helps in the matching of object-oriented analysis and design in the database implementation.
User-Defined Functions (UDFs)
UDFs implement the business logic you want as part of the database server. The functions can be written using Informix stored-procedure language (SPL) and C. Informix also provides the ability to write UDFs in Java using the J/Foundation option.
You could consider UDFs as object methods. The syntax differences are mainly cosmetic. Consider the following statement syntaxes:
Object.method([args]); method(Object [, args]);
The first syntax represents the usual object method execution where the method is part of the object definition. The second syntax shows a method that requires a first argument, which is the object it operates on. The end result is the same. The system can differentiate between multiple methods with the same name by looking at the type and number of arguments. The second method is the approach taken in IDS 9.x.
When a UDF is called within an SQL statement, the database server examines the function name and its argument types. It then chooses the function that fits the definition. This polymorphic dispatch provides another example of object-oriented capabilities. You can then define multiple functions with the same name and have the database server figure out which one to call depending on the context.
Looking back at Figure 11, we can imagine a set of riskfactor() functions, each one taking a different row type. The function definitions would look like the following:
CREATE FUNCTION riskfactor(manufacturing_t) RETURNING double precision . . .
These functions are relatively straightforward. They receive a specific type of record and return the risk factor based on the input values. The database server provides the framework for set processing. For example, if we want to find out the average risk factor that each branch is taking, we can submit an SQL statement such as:
SELECT branch_id, AVG(riskfactor(loans)) FROM loans GROUP BY branch_id;
This statement executes over the entire hierarchy. It returns rows from different types. This means that the appropriate riskfactor() function is called depending on the context. It is important to note that the riskfactor() is used in the processing of the different sets of values but is unaware of set processing. There is a direct benefit here because the application code does not have to include additional processing to handle the processing of sets, keeping track of partial sums, counts, and so on. The database framework is used to reduce the complexity of the business code.
IDS 9.x allows you to add new types of aggregate functions. An aggregate function has four stages of processing: initialization, iteration, combine, and final. An aggregate function must then implement these four functions. Only the iteration and combine stages are mandatory. A UDA is in fact a grouping of up to four functions that are used to implement the aggregation.
The UDAs add some interesting features compared to standard aggregate functions. First, they can receive complex types as input. This means that a row type can be passed as the value to aggregate. Second, it allows for an additional parameter that is used as an initialization value. This initialization parameter can also be a row type. This opens the door to interesting possibilities where the initialization value can impact the way the aggregation is done.
Let's go back to our loan example. In the previous section, we described the use of a set of riskfactor() functions. The result of the SQL statement was an average risk factor. This is not necessarily what management wants to see. Instead, we may want to provide the average risk taken by each branch. To do this, each risk must be weighted with the amount of the loan. We can do this by using the riskfactor() functions we already have in the context of a UDA. The new aggregate function must handle row types. Assuming that we called the UDA avgrisk(), we can find the average risk taken by each branch on its portfolio of loans with the following statement:
SELECT branch_id, avgrisk(loans) FROM loans GROUP BY branch_id;
All the processing added to the server to implement the UDA would be required in an application program and more. The end result of using this new approach is less complexity and increased performance.
Informix supports two types of indexing: B-tree and R-tree. The B-tree is the default indexing method. If you want to use the R-tree, you must indicate it on the create index. In up to version 9.30 of IDS, you must also create opaque types and support functions to be able to use the R-tree. An example of implementation can be found in Chapter 6.
A B-tree is stored as a balance tree structure. Because of the balance feature, all searches take around the same number of comparisons to find the answer. The B-tree is a standard indexing method provided by database vendors. As for previous versions, it is used to index all the built-in types, excluding BYTE and TEXT. Informix has modified its B-tree implementation to support user-defined types. This includes distinct types and opaque types. The two criteria for indexing are that the type has comparison operators (equal, less than, greater than, etc.) and that the total key length does not exceed 390 bytes.
The indexing of new types makes sense but what about distinct types? Can't we just use the support provided by the base type the distinct type is based on? The short answer is yes. There are some situations in which the comparison provided by the base type does not correspond with the business way to use the distinct type. In this case, you only have to add comparison functions for your distinct type and the indexing will be done appropriately.
Another interesting application of the B-tree relates to what we call functional indexes. It is possible to create an index on the result of a function. For example:
CREATE INDEX func_idx ON order(quarter(order_date)) USING btree;
The advantage of this approach is that you don't need to add calculated fields to your table. Even a four-byte field has an impact when we are talking about millions of rows.
The other indexing method available is the R-tree. This indexing method has a perfect fit in spatial queries. It allows the indexing of multiple dimensions where each dimension is considered in the query. This is better explained with an example. Consider a spatial query in which you want to find all the objects that are within a certain radius of a location. With an R-tree, you can answer this by asking for all the objects within a specified circle:
SELECT * FROM location WHERE within(longlat, myCircle);
The myCircle value is a new opaque type that defines a circle at a specific location. If you only had the B-tree available, you could submit a query such as:
SELECT * FROM location2 WHERE long >begin AND long < end;
Of course, you could go further and provide a similar comparison for latitude. The problem with this approach is that the indexes may not discriminate enough on the values and a table scan will be issued, causing the entire table to be read. Because the R-tree handles all the dimensions, it can provide a better discrimination on the data and can be picked where the B-tree would be ignored. If you are dealing with more complex searches involving more complex shapes, then the advantage of the R-tree is even more obvious.
Of course, you may not care much about longitude and latitude today. If you look carefully at the functionality provided by the R-tree, you should realize that it can index any numbers, not just coordinates. What about indexing age and income? This could facilitate data analysis for better customer targeting. And why stop at two dimensions? It should be safe to go to five or six if not a little higher.
Informix also provides a way to add access methods to the database server. They consist of the Virtual Table Interface (VTI) and the Virtual Index Interface (VII). As their names indicate, the VTI is used to make things look like a table in the database, and the VII is used to add indexing methods such as the R-tree and Excalibur and Verity text search engines.
VTI and VII should be considered advanced features and should be approached carefully. I liken VTI and VII to device drivers. Depending on the functionality you provide, it will be simple or complex. If you provide support for write operations, you must then worry about transaction control. However, if used appropriately, these interfaces can be very beneficial.
The use of VTI to access external files from the database server is the subject of Chapter 4.
The Spatial DataBlade
Informix added the Environmental Systems Research Institute (ESRI) spatial support to the base IDS 9.x product. It is a commercial DataBlade, not an open-source product. This DataBlade provides support for queries containing locations. It defines several new data types and functions to accommodate this capability. These location data types can be indexed using the R-tree mentioned previously to dramatically speed up retrieval of information.
If you have a need to store and retrieve geographical information, this should be the first thing you look at to solve your problem. It is a supported commercial product and it comes with the IDS 9.x server.