There are three notable and proven approaches for introducing interactivity into TV content: MPEG-2 extensions, MPEG-4 native, and AVTEF VBI; the latter suffers from the hotspot problem and is not covered in this section. MPEG-2-based approaches use private sections (which are not part of the video elementary stream) to extend the MPEG-2 standard to accommodate data, effectively introducing another layer of standardization. MPEG-4-based approaches use Object Descriptors (OD) and BInary Format for Scenes (BIFS), which are part of the video elementary stream.
3.3.1 MPEG-2 Interactivity
MPEG-2-based interactivity relies on the existence of receiver middleware that contains an execution environment. Data carried or otherwise associated with the video contains markup content or Java classes that encode some behavior. The execution environment processes, interprets the data, and executes the software application it encodes. On execution, a GUI is presented to the viewer that renders the program interactive as it enables capturing and processing user input events.
Figure 3.13 depicts a simplified MPEG-2-based architecture. Packetized video and audio transport streams can be delivered directly into a multiplexer. In contrast, HTML files and Java Class files need to be converted into an MPEG-2 transport format using a DSM-CC file broadcasting system module, whose output could be fed directly into a multiplexer, whose output is a packetized MPEG-2 transport stream containing the iTV program.
Figure 3.13. Interactive MPEG-2 programs pass user events to the execution environment.
Through numerous multiplexing and demultiplexing steps not described here, the iTV program reaches a final demultiplexing step, typically performed by every MPEG-2 receiver. At this stage, the packetized video and audio are extracted and delivered directly to traditional MPEG-2 video and audio decoders, which in turn, produce an output that could be fed into the display. In contrast, the HTML and Java class files, are extracted using a Broadcast File System (BFS) decoder and passed into an Application Execution Environment (AEE); recall Figure 3.10 for the relationship between the transport and application layer. The AEE then identifies the initial entry HTML or Java class files and executes the application. On execution, the AEE supports the application with the production of a GUI, which can be used by the viewer to generate events to be passed back to the application through the AEE.
3.3.2 MPEG-4 V1 Interactivity
MPEG-4 interactivity relies on BIFS, which is, for the most part, a structure inherited from the Virtual Reality Modeling Language (VRML) 2.0, although its explicit bit stream representation is completely different. MPEG-4 adds several distinguishing mechanisms to VRML: data streaming, scene updates, and compression.
Figure 3.14 depicts a simplified MPEG-4 interactivity architecture. In addition to the visual and audio components, the demultiplexer assembles and renders the scene graph after extracting the BIFS, OD, Object Content Information/Intellectual Property Management, and Protection (OCI/IPMP) components. The rendering of the scene produces a GUI which can be used by the viewer to generate events that are interpreted as BIFS property changes.
Figure 3.14. Interactive MPEG-4 programs interpret user events as BIFS property changes.
As opposed to the MPEG-2-based approach, whereby the GUI is generated through a standard library (of Java classes or HTML renderers), in MPEG-4 there is no standard GUI library. Instead, MPEG-4 provides standard semantics for processing BIFS property changes; each receiver manufacturer can collect and interpret the events as it sees fit, as long as the end result of each event is a BIFS property change compliant with the MPEG-4 standard. For that matter, there is no distinction between scene changes introduced through user interaction and changes introduced through the broadcast.
The scene produced by a creative author typically passes through numerous format transformations. On receipt of that scene, the display should show a scene rendered according to MPEG-4 rules so that it duplicates the design of the scene's author. Depending on the degree of freedom allowed by the author, however, the viewer has the possibility to interact with the scene. Operations a user may be allowed to perform include the following:
Change the viewing/listening point of the scene, e.g., by navigation through a scene.
Drag objects within the scene to different positions.
Trigger a cascade of events by clicking on a specific object, e.g., starting or stopping a video.
Select the desired language when multiple language tracks are available.