- Information Is Interesting Stuff
- Information and Structure Are Inseparable
- Formal Languages Are Easier to Compute Than Natural Languages
- Generic Markup Makes Natural Languages More Formal
- A Brief History of the Topic Maps Paradigm
- Data and Metadata: The Resource-Centric View
- Subjects and Data: The Subject-Centric View
- Understanding Sophisticated Markup Vocabularies
- The Topic Maps Attitude
Understanding Sophisticated Markup Vocabularies
If you want to understand the topic maps paradigm, you must understand something about markup vocabularies in general that is not yet widely understood: the structure of an interchangeable resource is not necessarily the same as the structure of the information that is being conveyed.
Back in 1986, SGML had just been adopted by the community of nations as the one-and-only markup language for everything and everybody. But Charles Goldfarb, its inventor and guardian, knew that much work remained to be done. He saw that many kinds of multimedia information and many business niches for such information would continue to be invented indefinitely. One of the things he wanted to do was to show that SGML could be used to encode multidimensional synchronizing information: to impose simultaneous, arbitrary temporal structures on arbitrary collections of information objects and their components.
Accordingly (and not coincidentally in order to have some fun), Dr. Goldfarb turned his attention to the problem of representing music abstractly.16 Musical works are inherently multidimensional; to begin with, musical harmony is the result of multiple simultaneous melodies. Since an interchangeable document is necessarily a one-dimensional sequence of characters, the question immediately arises, in the case of a musical document, as to whether the concurrent melodies (or instrumental and/or vocal parts) should be expressed separately or whether all the notes that are supposed to sound synchronously in all of the concurrent melodies should appear adjacent to one another in the interchange file. Either way, the structure of the interchange syntax will be inconvenient for at least some applications. Either way, at least some of the basic structure of the information will be obscured by the interchange syntax. Therefore, for the sake of reliable information interchange, there must be a separate and distinct model of the information that is being conveyed by the music language, in addition to the syntactic model that governs the structure of that information while it is represented as an interchangeable document.
There are many kinds of information whose structure, like the structure of music information, must respond to one set of requirements when the information is being interchanged and to another, often contradictory set of requirements when the information is in ready-to-use form. Many decision makers are not yet ready to hear this message, for a variety of reasons.
Historically, the overwhelming majority of markup applications have been basically batch-typesetting jobs, which start at the beginning of the document and process each data segment in more or less the same sequence in which it appears in the document. The rendering of HTML documents by Web browsers is one example. The use of the word document to denote a class of information objects appears to have the connotation that all such information objects are intended to be rendered and used in the same order in which they are interchanged.
Currently, significant investments in the marketing of XML technology are directed at business-oriented information technology professionals. Such professionals are urged to regard XML as an opportunity to represent relational databases as interchangeable documents. All such documents, regardless of their schemas, are parsable by a single standard parsing technology, without reconfiguration. It's obvious that a relational table is exportable and importable as a sequence of named or numbered rows, each of which is itself a sequence of named or numbered fields.
The Document Object Model (DOM)17 recommended by the World Wide Web Consortium (W3C) provides a convenient application programming interface (API) to the syntactic structure of information being interchanged in the form of XML documents. The DOM is extremely useful, but it has been oversold as the ne plus ultra API to interchangeable information. The DOM does provide applications with random access to every part of an interchangeable document, so it makes many applications much easier to develop than they otherwise would be. However, the DOM cannot provide direct access to the semantic components of what a document means; it can only provide direct access to the syntactic components of how a document is represented for interchange.
Fortunately for the widespread acceptance of XML technology, which is basically a tremendous step toward global knowledge interchange, there are many popular kinds of information whose interchange is required for many kinds of economic reasons, including virtually all of the billboards on the information highway, for which the interchange structure can quite usefully be the same as the structure of the API. The DOM is a great all-purpose API for all of these kinds of information.
Topic maps are another matter, however. As in the case of music information, the structure of topic map information is not the same as the structure of interchangeable documents.
Topic map documents can point to other topic map documents, saying, in effect, "The referenced topic map must be merged with the current one before the current one can be understood as its author intends." If any single subject is represented by <topic> elements in both topic maps, the topic maps paradigm requires that the result of processing the two documents must be, among other things, exactly one resulting topic (represented in some application-internal form) that has the union of the characteristics (the names, occurrences, and participations in associations with other topics) of the two <topic> elements. Therefore, the only way to understand an interchangeable topic map document is to process it fully, performing such merging and redundancy-elimination tasks as the paradigm requires.
The element-containment structure of a topic map document, even in the absence of any requirement to merge it with another topic map document, bears no resemblance to the structure of the relationships between topics that are expressed by that document.
In other words, the API to topic map information is not, and can never be, the same as an interchangeable topic map that conveys that same information. From this interesting fact the question arises, "What is meant by an element type name, such as <topic>, in an interchange syntax like the interchange syntax of topic maps, in which there is no direct correspondence of the element structure to the structure of the information being interchanged?"
The answer is that the meaning of such a tag name is, like all other tag names, exactly what the designers of the interchange syntax intended it to mean. For example, for every <topic> element, a conforming topic map application must have an application-internal representation of that topic (that is, a topic whose subject is the same as that of the <topic> element). If there is no such internally represented topic, the application must create one; if there is already such an internally represented topic, the application must add to it (union it with) all the information about that topic that is represented by the <topic> element. The meaning of the <topic> tag name is still quite clear and rigorous; the only difference is that the meaning has to do with the creation of an application-internal form of the interchanged informationa form with its own API that must be used by conforming applications.