Introduction to Augmented Reality
Virtual reality is becoming increasingly popular, as computer graphics have progressed to a point where the images are often indistinguishable from the real world. However, the computer-generated images presented in games, movies, and other media are detached from our physical surroundings. This is both a virtue—everything becomes possible—and a limitation.
The limitation comes from the main interest we have in our daily life, which is not directed toward some virtual world, but rather toward the real world surrounding us. Smartphones and other mobile devices provide access to a vast amount of information, anytime and anywhere. However, this information is generally disconnected from the real world. Consumers with an interest in retrieving online information from and about the real world, or linking up online information with the real world, must do so individually and indirectly, which, in turn, requires constant cognitive effort.
In many ways, enhancing mobile computing so that the association with the real world happens automatically seems an attractive proposition. A few examples readily illustrate this idea’s appeal. Location-based services can provide personal navigation based on the Global Positioning System (GPS), while barcode scanners can help identify books in a library or products in a supermarket. These approaches require explicit actions by the user, however, and are rather coarse grained. Barcodes are useful for identifying books, but not for naming mountain peaks during a hiking trip; likewise, they cannot help in identifying tiny parts of a watch being repaired, let alone anatomic structures during surgery.
Augmented reality holds the promise of creating direct, automatic, and actionable links between the physical world and electronic information. It provides a simple and immediate user interface to an electronically enhanced physical world. The immense potential of augmented reality as a paradigm-shifting user interface metaphor becomes apparent when we review the most recent few milestones in human–computer interaction: the emergence of the World Wide Web, the social web, and the mobile device revolution.
The trajectory of this series of milestones is clear: First, there was an immense increase in access to online information, leading to a massive audience of information consumers. These consumers were subsequently enabled to also act as information producers and communicate with one another, and finally were given the means to manage their communications from anywhere, in any situation. Yet, the physical world, in which all this information retrieval, authoring, and communication takes place, was not readily linked to the users’ electronic activity. That is, the model was stuck in a world of abstract web pages and services without directly involving the physical world. A lot of technological advancement has occurred in the field of location-based computing and services, which is sometimes referred to as situated computing. Even so, the user interfaces to location-based services remain predominantly rooted in desktop-, app-, and web-based usage paradigms.
Augmented reality can change this situation, and, in doing so, redefine information browsing and authoring. This user interface metaphor and its enabling technologies form one of today’s most fascinating and future-oriented areas of computer science and application development. Augmented reality can overlay computer-generated information on views of the real world, amplifying human perception and cognition in remarkable new ways.
After providing a working definition of augmented reality, we will briefly review important developments in the history of the research field, and then present examples from various application areas, showcasing the power of this physical user interface metaphor.
Definition and Scope
Whereas virtual reality (VR) places a user inside a completely computer-generated environment, augmented reality (AR) aims to present information that is directly registered to the physical environment. AR goes beyond mobile computing in that it bridges the gap between virtual world and real world, both spatially and cognitively. With AR, the digital information appears to become part of the real world, at least in the user’s perception.
Achieving this connection is a grand goal—one that draws upon knowledge from many areas of computer science, yet can lead to misconceptions about what AR really is. For example, many people associate the visual combination of virtual and real elements with the special effects in movies such as Jurassic Park and Avatar. While the computer graphics techniques used in movies may be applicable to AR as well, movies lack one crucial aspect of AR—interactivity. To avoid such confusion, we need to set a scope for the topics discussed in this book. In other words, we need to answer a key question: What is AR?
The most widely accepted definition of AR was proposed by Azuma in his 1997 survey paper. According to Azuma , AR must have the following three characteristics:
Combines real and virtual
Interactive in real time
Registered in 3D
This definition does not require a specific output device, such as a head-mounted display (HMD), nor does it limit AR to visual media. Audio, haptics, and even olfactory or gustatory AR are included in its scope, even though they may be difficult to realize. Note that the definition does require real-time control and spatial registration, meaning precise real-time alignment of corresponding virtual and real information. This mandate implies that the user of an AR display can at least exercise some sort of interactive viewpoint control, and the computer-generated augmentations in the display will remain registered to the referenced objects in the environment.
While opinions on what qualifies as real-time performance may vary depending on the individual and on the task or application, interactivity implies that the human–computer interface operates in a tightly coupled feedback loop. The user continuously navigates the AR scene and controls the AR experience. The system, in turn, picks up the user’s input by tracking the user’s viewpoint or pose. It registers the pose in the real world with the virtual content, and then presents to the user a situated visualization (a visualization that is registered to objects in the real world).
We can see that a complete AR system requires at least three components: a tracking component, a registration component, and a visualization component. A fourth component—a spatial model (i.e., a database)—stores information about the real world and about the virtual world (Figure 1.1). The real-world model is required to serve as a reference for the tracking component, which must determine the user’s location in the real world. The virtual-world model consists of the content used for the augmentation. Both parts of the spatial model must be registered in the same coordinate system.
Figure 1.1 AR uses a feedback loop between human user and computer system. The user observes the AR display and controls the viewpoint. The system tracks the user’s viewpoint, registers the pose in the real world with the virtual content, and presents situated visualizations.