Emerging Trends and Technologies
As the scope of enterprise portals grows, the associated information retrieval problems will worsen if new technologies and techniques are not enlisted. One of the primary values of portals is that they allow us rapid access to a broad range of content and applications; however, that benefit is undermined by the very success and growth of portals. To ensure that we can keep ahead of the information overload problem today we must design portals with effective search, directory, and information architectures. In the near future we will require additional tools to manage content and to allow our applications to assume more responsibility for weeding out irrelevant information.
Researchers have worked on problems in natural language processing and knowledge representation for over 40 years, and the practical, commercial benefit of those undertakings will be realized in enterprise information portals (among other applications). The most pressing problem for advanced portal users and designers is to create applications customized to particular users' needs. Several systems and technologies are especially important to this effort.
Cyc: Cyc is a knowledge base of over 100,000 terms, over 1,000,000 facts or assertions, and a reasoning engine for drawing conclusions about those facts. Cyc was developed by Cycorp (http://www.cyc.com) and has been used in organizations ranging from Lycos, which uses it to improve search engine results, to the U.S. Department of Defense, which has invested heavily in Cyc development for military applications. The use of large-scale, general knowledge bases may help improve search and navigation in portals by improving the modeling of user behavior.
MESH: Medical Subject Headings (MESH) is a controlled vocabulary thesaurus developed by the U.S. National Library of Medicine with over 21,000 descriptors, over 132,000 supplementary descriptors in a separate chemical thesaurus, and thousands of cross-references between terms. MESH is used to index articles from over 4,000 biomedical journals as well as the MEDLINE database.
WordNet: WordNet is an online lexical reference developed by researchers at Princeton University based on psycholinguistic theories about human memory. WordNet contains over 146,000 words and over 195,000 word senses. WordNet contains both word senses (e.g., ten meanings of the word book and five meanings of search) and synonym sets of related terms. This lexical resource is currently used in some search engines to improve search results.
Cyc, MESH, and WordNet are currently used or have the potential for use in enterprise portals. These are just three examples of the general and specialized knowledge representation tools that are of growing importance to portals. Much of the work now under way in knowledge representation centers around three approaches.
Ontologies are organized representations of concepts, often for specific domains, such as pharmaceuticals, health care, and electronics. The Cyc knowledge base supports multiple ontologies. (See http://ksl-web.stanford.edu/kst/ontology-sources.html.)
Topic maps are groups of addressable information objects (e.g., documents) around topics and the relationship between those objects. The TopicsMaps.org consortium is developing standards to use XML to develop topic maps for the Web. (See http://www.topicmaps.org/.)
Semantic Web is an effort to embody semantic information in Web resources so that both humans and automated agents can more effectively manage those resources. (See http://www.w3.org/2001/sw/.)
With these technologies emerging in the portal arena, what can users realistically expect in the next few years? First, anticipate improved search capabilities in highly specialized domains like pharmaceuticals and medicine. Second, expect incremental improvements in general search and categorization. Third, do not expect radical breakthroughs. Technologies like the semantic Web hold great promise, but much of the work is still in the research phase. Finally, we will continue to have significant amounts of manual work, from developing and tuning ontologies to defining topic maps. These technologies, however, will make it easier to share knowledge bases across applications.