Home > Articles > Home & Office Computing > The Web/Virtual Worlds/Social Networking

Multiple Disk Platforms for Streaming Media Servers

  • Print
  • + Share This
A single disk server rarely enough, but implementing a multi-disk server can be a headache. This chapter will help you through the implementation of multi-disk servers and teach you how to smooth over the rough spots.
This chapter is from the book

This chapter is from the book

A single disk server is not appropriate for most applications for several reasons. First, its storage capacity might not be large enough to store all required objects. Second, the bandwidth of a single disk is limited to a small number of simultaneous displays. Third, the bandwidth requirements of an object might exceed that of a single disk drive. Hence, multi-disk architectures are introduced to resolve these limitations.

3.1 Overview

Assuming SM servers with multiple disk drives, the data blocks are assigned to the disks in order to distribute the load of a display evenly across the disks. Thus, data placement can affect the continuous display and performance of SM servers in conjunction of the scheduling techniques. There are two well-known approaches to assign blocks of an object across multiple disk drives: constrained and unconstrained. A typical example of constrained data placement is round-robin [12], [67], [70]. As suggested by its name, this technique assigns the blocks of an object to the disks in a round-robin manner, starting with an arbitrarily chosen disk. Assuming d disks in the system, if the first block of an object X is assigned to disk di, jih block of X is assigned to disk d(i+j1) mod d where 0 ≤ i, j < d. An example of unconstrained data placement is random [116], [172], [68] which assigns data blocks to disk drives using a random number generator.

Based on scheduling and data placement techniques, we can classify continuous display techniques on multi-disk SM servers. There are four possible approaches: 1) cycle-based and round-robin, 2) deadline-driven and random, 3) cycle-based and random, and 4) deadline-driven and round-robin.

Cycle-based, Round-robin

Many studies [12], [70], [126], [175], [105] investigated the combination of cycle-based scheduling and round-robin data placement. With this approach, one block is retrieved from each disk drive for each display in every time period. Thus, assuming d disk drives in the system, data retrieval for a display cycles through all d disks in d successive time periods, following the round-robin data placement in a lock-step manner. The system load should be distributed across disk drives to prevent formation of bottlenecks. This load can be balanced by intentionally delaying the retrieval of the first block of requested object whenever a bottleneck is formed on a disk drive (see Section 3.2 for details). Due to the harmony of round-robin data placement and periodic cycle-based data retrieval, this approach provides a deterministic service guarantee for a hiccup-free display of a SM object once its retrieval is initiated. This approach maximizes the utilization of disk bandwidth by distributing the load of a display across the disks evenly. Thus, the system throughput scales linearly as a function of the number of disk drives in the system. The drawback of this approach is that the startup latency also scales [1] because the system might delay the initiation of data retrievals of objects. Thus, this approach is suitable to the applications that require a high throughput and can tolerate a long startup latency such as movie-on-demand systems.

Deadline-driven, Random

A few studies [138], [97], [172], [68] have investigated the approach with deadline-driven scheduling and random data placement. By controlling the deadlines for block retrievals, this approach can provide a shorter startup latency than the cycle-based and round-robin approach. Hence, this approach is more appropriate to the applications requiring a short start latency such as a digital editing system. However, this approach may suffer from the statistical variation of the number of block retrievals in a disk drive. Due to the nature of random data placement, a disk might receive more than its fair share of requests. A formation of bottleneck on a disk drive may result in the violation of deadlines set forth on requested blocks, causing some displays to incur hiccups. This hiccup probability might be significant depending on the system load.

Cycle-based, Random

The approach based on a cycle-based scheduling and random data placement is discarded from further considerations because of the following drawbacks. First, this approach cannot provide the deterministic service guarantee as with the cycle-based and round-robin approach due to a random placement of data. Second, this approach cannot provide a short startup latency as with the deadline-driven and random methods because of the cycle-based scheduling. Third, this approach results in a low utilization of disk bandwidth because requests might be distributed unevenly across the disks and those disks that finish a cycle early remain idle until other disks finish (based on the assumption that requests for the next blocks are issued at the end of the current cycle).

Deadline-driven, Round-robin

Similarly, we will not consider the approach with a deadline-driven scheduling and round-robin data placement. With this approach, once a bottleneck is formed on a disk drive, it will reoccur repeatedly due to the round-robin placement of data and the sequential data retrievals of SM displays. For example, when a bottleneck is formed on disk d0, it will reoccur most probably on the adjacent disk d1 that has the next blocks of displays participating in the formation of the bottleneck on d0. Thus, the formation of bottleneck reoccurs across disks in a round-robin manner. The bottleneck is resolved only when one or more participating displays terminate. Assuming a movie-on-demand system, a bottleneck could last for the entire display time of a movie, say 2 hours, in the worst case. With random data placement, the bottleneck is resolved quickly because the system load will be redistributed based on the random placement of data.

  • + Share This
  • 🔖 Save To Your Account