Home > Articles > Networking > Network Administration & Management

  • Print
  • + Share This
This chapter is from the book

Overcoming Transport Protocol Limitations

Now that you understand the fundamentals of how TCP operates, you are ready to examine how TCP can become a barrier to application performance in WAN environments. If you are wondering whether TCP can also be a barrier to application performance in LAN environments, the answer is unequivocally "yes." However, given that this book focuses on application acceleration and WAN optimization, which are geared toward improving performance for remote office and WAN environments, this book will not examine the TCP barriers to application performance in LAN environments.

You may also be wondering about UDP at this point. UDP, which is connectionless and provides no means for guaranteed delivery (it relies on application layer semantics), generally is not limited in terms of throughput on the network (outside of receiver/transmitter buffers). It is also not considered a good network citizen, particularly on a low-bandwidth WAN, because it consumes all available capacity that it can with no inherent fairness characteristics. Most enterprise applications, other than Voice over IP, video, TFTP, and some storage and file system protocols, do not use UDP. UDP traffic is generally best optimized in other ways, including stream splitting for video, which is discussed in Chapter 4, or through packet concatenation or header compression for VoIP. These topics, including UDP in general, are not discussed in the context of WAN optimization in this work.

The previous section examined at a high level how TCP provides connection-oriented service, provides guaranteed delivery, and adapts transmission characteristics to network conditions. TCP does have limitations, especially in WAN environments, in that it can be a significant barrier to application performance based on how it operates. This section examines ways to circumvent the performance challenges presented by TCP, including slow start and congestion avoidance, but note that this will not be an exhaustive study of every potential extension that can be applied.

Of Mice and Elephants: Short-Lived Connections and Long Fat Networks

No, this is not a John Steinbeck novel gone wrong. "Mice" and "elephants" are two of the creatures that can be found in the zoo known as networking. The term mice generally refers to very short-lived connections. Mice connections are commonly set up by an application to perform a single task, or a relatively small number of tasks, and then torn down. Mice connections are often used in support of another, longer-lived connection. An example of mice connections can be found in HTTP environments, where a connection is established to download a container page, and ancillary connections are established to fetch objects. These ancillary connections are commonly torn down immediately after the objects are fetched, so they are considered short-lived, or mice, connections.

Elephants are not connection related; rather, the term elephant refers to a network that is deemed to be "long" and "fat." "Elephant" is derived from the acronym for long fat network, LFN. An LFN is a network that is composed of a long distance connection ("long") and high bandwidth capacity ("fat").

The next two sections describe mice connections and elephant networks, and the performance challenges they create.

Mice: Short-Lived Connections

Short-lived connections suffer from performance challenges because each new connection that is established has to undergo TCP slow start. As mentioned earlier, TCP slow start is a misnomer because it has very fast throughput ramp-up capabilities based on bandwidth discovery but also can impede the ability of a short-lived connection to complete in a timely fashion. Slow start allows a new connection to use only a small amount of available bandwidth at the start, and there is significant overhead associated with the establishment of each of these connections, caused by latency between the two communicating nodes.

A good example of a short lived connection is a Web browser's fetch of a 50-KB object from within an HTTP container page. Internet browsers spawn a new connection specifically to request the object, and this new connection is subject to TCP slow start. With TCP slow start, the connection is able to transmit a single segment (cwnd is equal to one segment) and must wait until an acknowledgment is received before slow start doubles the cwnd value (up to the maximum, which is the receiver's advertised window size). Due to TCP slow start, only a small amount of data can be transmitted, and each exchange of data suffers the latency penalty of the WAN.

Once the connection is in congestion avoidance (assuming it was able to discover a fair amount of bandwidth), it would be able to send many segments without requiring tedious acknowledgments so quickly. If the initial segment size is 500 bytes, it would take a minimum of seven roundtrip exchanges (not counting the connection setup exchanges) to transfer the 50-KB object, assuming no packet loss was encountered. This is shown in Table 6-1.

Table 6-1. Bytes per RTT with Standard TCP Slow Start

Roundtrip Time Number

cwnd

cwnd (Bytes)

Bytes Remaining

1

1

500

49,500

2

2

1000

48,500

3

4

2000

46,500

4

8

4000

42,500

5

16

8000

34,500

6

32

16,000

18,500

7

64

32,000

Finished!

In a LAN environment, this series of exchanges would not be a problem, because the latency of the network is generally around 1–2 ms, meaning that the total completion time most likely would be under 20 ms. However, in a WAN environment with 100 ms of one-way latency (200 ms round trip), this short-lived connection would have taken approximately 1.4 seconds to complete. The challenges of TCP slow start as it relates to short-lived connections is, in effect, what gave birth to the term "World Wide Wait."

There are two primary means of overcoming the performance limitations of TCP slow start:

  • Circumvent it completely by using a preconfigured rate-based transmission protocol. A rate-based transmission solution will have a preconfigured or dynamically learned understanding of the network capacity to shape the transmission characteristics immediately, thereby mitigating slow start.
  • Increase the initial permitted segment count (cwnd) to a larger value at the beginning of the connection, otherwise known as large initial windows. This means of overcoming performance limitations of TCP slow start is generally the more adaptive and elegant solution, because it allows each connection to continue to gradually consume bandwidth rather than start at a predefined rate.

Using a preconfigured rate-based transmission solution requires that the sender, or an intermediary accelerator device, be preconfigured with knowledge of how much capacity is available in the network, or have the capability to dynamically learn what the capacity is. When the connection is established, the node (or accelerator) immediately begins to transmit based on the preconfigured or dynamically learned rate (that is, bandwidth capacity) of the network, thereby circumventing the problems of TCP slow start and congestion avoidance.

In environments that are largely static (bandwidth and latency are stable), rate-based transmission solutions work quite well. However, while rate-based transmission does overcome the performance challenges presented by TCP slow start and congestion avoidance, it has many other challenges and limitations that make it a less attractive solution for modern-day networks.

Today's networks are plagued with oversubscription, contention, loss, and other characteristics that can have an immediate and profound impact on available bandwidth and measured latency. For environments that are not as static, a rate-based transmission solution will need to continually adapt to the changing characteristics of the network, thereby causing excessive amounts of measurement to take place such that the accelerator can "guesstimate" the available network capacity. In a dynamic network environment, this can be a challenge for rate-based transmission solutions because network congestion, which may not be indicative of loss or changes in bandwidth, can make the sender believe that less capacity is available than what really is available.

Although rate-based transmission solutions may immediately circumvent slow start and congestion avoidance to improve performance for short- and long-lived connections, a more adaptive solution with no restrictions is available by using large initial windows. Large Initial Windows, originally defined in RFC 3390, "Increasing TCP's Initial Window," specifies that the initial window be increased to minimize the number of roundtrip message exchanges that must take place before a connection is able to exit slow start and enter congestion avoidance mode, thereby allowing it to consume network bandwidth and complete the operation much more quickly. This approach does not require previous understanding or configuration of available network capacity, and allows each connection to identify available bandwidth dynamically, while minimizing the performance limitations associated with slow start.

Figure 6-9 shows how using TCP large initial windows helps circumvent bandwidth starvation for short-lived connections and allows connections to more quickly utilize available network capacity.

Figure 6-9

Figure 6-9 TCP Large Initial Windows

Referring to the previous example of downloading a 50-KB object using HTTP, if RFC 3390 was employed and the initial window was 4000 bytes, the operation would complete in a matter of four roundtrip exchanges. In a 200-ms-roundtrip WAN environment, this equates to approximately 800 ms, or roughly a 50 percent improvement over the same transfer where large initial windows were not employed. (See Table 6-2.) Although this may not circumvent the entire process of slow start, as a rate-based transmission protocol would, it allows the connection to retain its adaptive characteristics and compete for bandwidth fairly on the WAN.

Table 6-2. Bytes per RTT with TCP Large Initial Windows

RTT Number

cwnd

cwnd (Bytes)

Bytes Remaining

1

1

4000

46,000

2

2

8000

38,000

3

4

16,000

22,000

4

8

32,000

Finished!

Comparing rate-based transmission and large initial windows, most find that the performance difference for short-lived connections is negligible, but the network overhead required for bandwidth discovery in rate-based solutions can be a limiting factor. TCP implementations that maintain semantics of slow start and congestion avoidance (including Large Initial Windows), however, are by design dynamic and adaptive with minimal restriction, thereby ensuring fairness among flows that are competing for available bandwidth.

Elephants: High Bandwidth Delay Product Networks

While the previous section examined optimizations primarily geared toward improving performance for mice connections, this section examines optimizations primarily geared toward improving performance for environments that contain elephants (LFNs). An LFN is a network that is long (distance, latency) and fat (bandwidth capacity).

Because every bit of data transmitted across a network link has some amount of travel time associated with it, it can be assumed that there can be multiple packets on a given link at any point in time. Thus, a node may be able to send ten packets before the first packet actually reaches the intended destination simply due to the distance contained in the network that is being traversed. With IP, there is no concept of send-and-wait at the packet layer. Therefore, network nodes place packets on the wire at the rate determined by the transport layer (or by the application layer in the case of connectionless transport protocols) and will continue to send based on the reaction of the transport layer to network events.

As an example, once a router transmits a packet, the router does not wait for that packet to reach its destination before the router sends another packet that is waiting in the queue. In this way, the links between network nodes (that is, routers and switches) are, when utilized, always holding some amount of data from conversations that are occurring. The network has some amount of capacity, which equates to the amount of data that can be in flight over a circuit at any given time that has not yet reached its intended destination or otherwise been acknowledged. This capacity is called the bandwidth delay product (BDP).

The BDP of a network is easily calculated. Simply convert the network data rate (in bits) to bytes (remember there is a necessary conversion from power of 10 to power of 2). Then, multiply the network data rate (in bytes) by the delay of the network (in seconds). The resultant value is the amount of data that can be in flight over a given network at any point in time. The greater the distance (latency) and the greater the bandwidth of the network, the more data that can be in flight across that link at any point in time. Figure 6-10 shows a comparison between a high BDP network and a low BDP network.

Figure 6-10

Figure 6-10 Comparing Bandwidth Delay Product—High vs. Low

The challenge with LFNs is not that they have a high BDP, but that the nodes exchanging data over the network do not have buffers or window sizes large enough to adequately utilize the available link capacity. With multiple concurrent connections, the link can certainly be utilized effectively, but a single connection has a difficult time taking advantage of (or simply cannot take advantage of) the large amount of network capacity available because of the lack of buffer space and TCP window capacity.

In situations where a single pair of nodes with inadequate buffer or window capacity is exchanging data, the sending node is easily able to exhaust the available window because of the amount of time taken to get an acknowledgment back from the recipient. Buffer exhaustion is especially profound in situations where the window size negotiated between the two nodes is small, because this can result in sporadic network utilization (burstiness) and periods of underutilization (due to buffer exhaustion). Figure 6-11 shows how window exhaustion leads to a node's inability to fully utilize available WAN capacity.

Figure 6-11

Figure 6-11 Window Exhaustion in High BDP Networks

Bandwidth scalability is a term that is often used when defining an optimization capability within an accelerator or configuration change that provides the functionality necessary for a pair of communicating nodes to take better advantage of the available bandwidth capacity. This is also known as fill-the-pipe optimization, which allows nodes communicating over an LFN to achieve higher levels of throughput, thereby overcoming issues with buffer or window exhaustion. This type of optimization can be implemented rather painlessly on a pair of end nodes (which requires configuration changes to each node where this type of optimization is desired, which can be difficult to implement on a global scale), or it can be implemented in an accelerator, which does not require any of the end nodes to undergo a change in configuration.

Two primary methods are available to enable fill-the-pipe optimization:

  • Window scaling
  • Scalable TCP implementation

The following sections describe both.

Window Scaling

Window scaling is an extension to TCP (see RFC 1323, "TCP Extensions for High Performance"). Window scaling, which is a TCP option that is negotiated during connection establishment, allows two communicating nodes to go beyond the 16-bit limit (65,536 bytes) for defining the available window size.

With window scaling enabled, an option is defined with a parameter advertised that is known as the window scale factor. The window scale factor dictates a binary shift of the value of the 16-bit advertised TCP window, thereby providing a multiplier effect on the advertised window. For instance, if the advertised TCP window is 1111 1111 1111 1111 (that is, decimal 64 KB) and the window scale factor is set to 2, the binary value of the window size will have two bits added to the end of it, which would then become 11 1111 1111 1111 1111. The advertised 64-KB window size would be handled by the end nodes as a 256-KB window size.

Larger scaled windows cause the end nodes to allocate more memory to TCP buffering, which means in effect that more data can be in flight and unacknowledged at any given time. Having more data in flight minimizes the opportunity for buffer exhaustion, which allows the conversing nodes to better leverage the available network capacity. The window scale TCP option (based on RFC 1323) allows for window sizes up to 1 GB (the scale factor is set to a value of 14). Figure 6-12 shows how window scaling allows nodes to better utilize available network capacity.

Figure 6-12

Figure 6-12 Using Window Scaling to Overcome High BDP Networks

Scalable TCP Implementation

The second method of enabling fill-the-pipe optimization that is available, but more difficult to implement in networks containing devices of mixed operating systems, is to use a more purpose-built and scalable TCP implementation. Many researchers, universities, and technology companies have spent a significant amount of money and time to rigorously design, test, and implement these advanced TCP stacks. One common theme exists across the majority of the implementations: each is designed to overcome performance limitations of TCP in WAN environments and high-speed environments while improving bandwidth scalability. Many of these advanced TCP stacks include functionality that helps enable bandwidth scalability and overcome other challenges related to loss and latency.

Although difficult to implement in a heterogeneous network of workstations and servers, most accelerator solutions provide an advanced TCP stack natively as part of an underlying TCP proxy architecture (discussed later in this chapter, in the "Accelerator TCP Proxy Functionality" section), thereby mitigating the need to make time-consuming and questionable configuration changes to the network end nodes. Several of the more common (and popular) implementations will be discussed later in the chapter.

The next section looks at performance challenges of TCP related to packet loss.

Overcoming Packet Loss-Related Performance Challenges

Packet loss occurs in any network and, interestingly enough, occurs most commonly outside of the network and within the host buffers. TCP has been designed in such a way that it can adapt when packet loss is encountered and recover from such situations. Packet loss is not always bad. For instance, when detected by TCP (segments in the retransmission queue encounter an expired timer, for instance), packet loss signals that the network characteristics may have changed (for example, if the available bandwidth capacity decreased). Packet loss could also signal that there is congestion on the network in the form of other nodes trying to consume or share the available bandwidth. This information allows TCP to react in such a way that allows other nodes to acquire their fair share of bandwidth on the network, a feature otherwise known as fairness.

Fairness is defined as a trait exhibited by networks that allow connections to evenly share available bandwidth capacity. TCP is considered to be a transport protocol that can generally ensure fairness, because it adapts to changes in network conditions, including packet loss and congestion, which are commonly encountered when flows compete for available bandwidth.

The result of packet loss in connection-oriented, guaranteed-delivery transport protocols such as TCP is a shift in the transmission characteristics to ensure that throughput is decreased to allow others to consume available capacity and also to accommodate potential changes in network capacity. This shift in transmission characteristics (in the case of TCP, decreasing the cwnd) could have a detrimental impact on the overall throughput if the cwnd available drops to a level that prevents the node from fully utilizing the network capacity.

Standard TCP implementations will drop cwnd by 50 percent when a loss of a segment is detected. In many cases, dropping cwnd by 50 percent will not have an adverse effect on throughput. In some cases, however, where the BDP of the network is relatively high, decreasing cwnd by 50 percent can have a noticeable impact on the throughput of an application.

TCP is designed to be overly conservative in that it will do its best to provide fairness across multiple contenders. This conservative nature is rooted in the fact that TCP was designed at a time when the amount of bandwidth available was far less than what exists today. The reality is that TCP needs to be able to provide fairness across concurrent connections (which it does), adaptive behavior for lossy networks (which it does), and efficient utilization of WAN resources (which it does not). Furthermore, when a loss of a packet is detected, all of the segments within the sliding window must be retransmitted from the retransmission queue, which proves very inefficient in WAN environments. Figure 6-13 shows the impact of packet loss on the TCP congestion window, which may lead to a decrease in application throughput.

Figure 6-13

Figure 6-13 Packet Loss Causing cwnd to Decrease

When TCP detects the loss of a segment, cwnd is dropped by 50 percent, which effectively limits the amount of data the transmitting node can have outstanding in the network at any given time. Given that standard TCP congestion avoidance uses a linear increase of one segment per successful round trip, it can take quite a long time before the cwnd returns to a level that is sufficient to sustain high levels of throughput for the application that experienced the loss. In this way, TCP is overly conservative and packet loss may cause a substantial impact on application throughput, especially given that bandwidth availability is far greater than it was 20 years ago.

Overcoming the impact of packet loss through the use of TCP extensions in a node or by leveraging functionality in an accelerator can provide nearly immediate improvements in application throughput. Making such changes to each end node may prove to be an administrative nightmare, whereas deploying accelerators that provide the same capability (among many others) is relatively simple. There are three primary means of overcoming the impact of packet loss:

  • Selective acknowledgment (SACK)
  • Forward error correction (FEC)
  • Advanced congestion avoidance algorithms

By employing SACK, acknowledgments can be sent to notify the transmitter of the specific blocks of data that have been received into the receiver's socket buffer. Upon detecting loss, the transmitting node can then resend the blocks that were not acknowledged rather than send the contents of the entire window. Figure 6-14 shows how an accelerator acting as a TCP proxy (discussed later, in the section "Accelerator TCP Proxy Functionality") can provide SACK to improve efficiency in retransmission upon detecting the loss of a segment.

Figure 6-14

Figure 6-14 Selective Acknowledgment in an Accelerator Solution

FEC is used to generate parity packets that allow the receiving node to recover the data from a lost packet based on parity information contained within the parity packets. FEC is primarily useful in moderately lossy networks (generally between .25 and 1 percent loss). Below this loss boundary, FEC may consume excessive amounts of unnecessary bandwidth (little return on bandwidth investment because the loss rate is not substantial). Above this loss boundary, FEC is largely ineffective compared to the total amount of loss (not enough parity data being sent to adequately re-create the lost packets) and therefore consumes excessive CPU cycles on the transmitting and receiving nodes or accelerators with little to no performance benefit.

Advanced TCP implementations generally decrease cwnd less aggressively upon encountering packet loss and use more aggressive congestion avoidance algorithms to better utilize network capacity and more quickly return to previous levels of throughput after encountering packet loss. This yields a smaller drop in cwnd and faster increase in cwnd after encountering packet loss, which helps to circumvent performance and throughput challenges. Advanced TCP stacks are discussed in the next section.

Advanced TCP Implementations

The previous sections have outlined some of the limitations that TCP imposes upon application performance. Many universities and companies have conducted research and development to find and develop ways to improve the behavior of TCP (or completely replace it) to improve performance and make it more applicable to today's enterprise and Internet network environments. The result of this research and development generally comes in one of two forms: an extension to TCP (commonly applied as a negotiated option between two nodes that support the extension) or an alternative stack that is used as a replacement for TCP. For instance, SACK is an extension to TCP and is enabled based on negotiation during connection establishment. Advanced stacks, such as Binary Increase Congestion TCP (BIC-TCP), High Speed TCP (HS-TCP), Scalable TCP (S-TCP), and others, are an alternative to TCP extensions and generally require that the peer nodes be running the same TCP implementation to leverage the advanced capabilities.

Accelerator devices commonly use an advanced TCP implementation, which circumvents the need to replace the TCP stack on each node in the network. When considering an implementation with an advanced TCP stack, it is important to consider three key characteristics of the implementation:

  • Bandwidth scalability: The ability to fully utilize available WAN capacity, otherwise known as fill-the-pipe. Figure 6-15 shows how fill-the-pipe optimizations such as window scaling can be employed in an accelerator to achieve better utilization of existing network capacity.
    Figure 6-15

    Figure 6-15 Accelerator Enables Efficient Utilization of Existing WAN

  • TCP friendliness: The ability to share available network capacity fairly with other transmitting nodes that may not be using the same advanced TCP implementation. This is another form of fairness, in that the optimized connections should be able to share bandwidth fairly with other connections on the network. Over time, the bandwidth allocated to optimized and unoptimized connections should converge such that resources are shared across connections.

    Figure 6-16 shows the impact of using accelerators that provide TCP optimization that is not friendly to other nonoptimized connections, which can lead to bandwidth starvation and other performance challenges. Figure 6-17 shows how accelerators that provide TCP optimization that is friendly to other nonoptimized connections will compete fairly for network bandwidth and stabilize with other nonoptimized connections.

    Figure 6-16

    Figure 6-16 TCP Friendliness—Accelerator with No Fairness

    Figure 6-17

    Figure 6-17 TCP Friendliness—Accelerator with Fairness

  • Roundtrip time (RTT) fairness: The ability to share bandwidth fairly across connections even if the RTT between the two communicating node pairs is unequal. RTT fairness is another component of fairness at large.

    With RTT disparity, the nodes that are closer to one another can generally consume a larger portion of available WAN bandwidth capacity than the nodes that are more distant when sharing bandwidth. This is due to the way TCP congestion avoidance relies on acknowledgment messages to increment the cwnd, which increases the amount of data that can be outstanding in the network. This leads to the two nodes that are closer to one another being able to transmit data more quickly because acknowledgments are received more quickly and the congestion window is advanced more rapidly. Figure 6-18 shows an example of a network of nodes where RTT disparity is present.

    Figure 6-18

    Figure 6-18 Roundtrip Time Differences and Fairness

These advanced TCP implementations commonly implement an advanced congestion avoidance algorithm that overcomes the performance challenge of using a conservative linear search such as found in TCP. With a conservative linear search (increment cwnd by one segment per successful RTT), it may take a significant amount of time before the cwnd increments to a level high enough to allow for substantial utilization of the network. Figure 6-19 shows how TCP's linear search congestion avoidance algorithm leads to the inability of a connection to quickly utilize available network capacity (lack of aggressiveness).

Figure 6-19

Figure 6-19 Linear Search Impedes Bandwidth Utilization

This section examines some of the advanced TCP implementations and characteristics of each but is not intended to be an exhaustive study of each or all of the available implementations.

High-Speed TCP

High-Speed TCP (HS-TCP) is an advanced TCP implementation that was developed primarily to address bandwidth scalability. HS-TCP uses an adaptive cwnd increase that is based on the current cwnd value of the connection. When the cwnd value is large, HS-TCP uses a larger cwnd increase when a segment is successfully acknowledged. In effect, this helps HS-TCP to more quickly find the available bandwidth, which leads to higher levels of throughput on large networks much more quickly.

HS-TCP also uses an adaptive cwnd decrease based on the current cwnd value. When the cwnd value for a connection is large, HS-TCP uses a very small decrease to the connection's cwnd value when loss of a segment is detected. In this way, HS-TCP allows a connection to remain at very high levels of throughput even in the presence of packet loss but can also lead to longer stabilization of TCP throughput when other, non-HS-TCP connections are contending for available network capacity. The aggressive cwnd handling of HS-TCP can lead to a lack of fairness when non-HS-TCP flows are competing for available network bandwidth. Over time, non-HS-TCP flows can stabilize with HS-TCP flows, but this period of time may be extended due to the aggressive behavior of HS-TCP.

HS-TCP also does not provide fairness in environments where there is RTT disparity between communicating nodes, again due to the aggressive handling of cwnd. This means that when using HS-TCP, nodes that are communicating over shorter distances will be able to starve other nodes that are communicating over longer distances due to the aggressive handling of cwnd. In this way, HS-TCP is a good fit for environments with high bandwidth where the network links are dedicated to a pair of nodes communicating using HS-TCP as a transport protocol but may not be a good fit for environments where a mix of TCP implementations or shared infrastructure is required. Figure 6-20 shows the aggressive cwnd handling characteristics displayed by HS-TCP.

Figure 6-20

Figure 6-20 High-Speed TCP

You can find more information on HS-TCP at http://www.icir.org/floyd/hstcp.html

Scalable TCP

Scalable TCP (S-TCP) is similar to HS-TCP in that it uses an adaptive increase to cwnd. S-TCP will increase cwnd by a value of (cwnd x .01) when increasing the congestion window, which means the increment is large when cwnd is large and the increment is small when cwnd is small.

Rather than use an adaptive decrease in cwnd, S-TCP will decrease cwnd by 12.5 percent (1/8) upon encountering a loss of a segment. In this way, S-TCP is more TCP friendly than HS-TCP in high-bandwidth environments. Like HS-TCP, S-TCP is not fair among flows where an RTT disparity exists due to the overly aggressive cwnd handling.

You can find more information on S-TCP at http://www.deneholme.net/tom/scalable/.

Binary Increase Congestion TCP

Binary Increase Congestion TCP (BIC-TCP) is an advanced TCP stack that uses a more adaptive increase than that used by HS-TCP and S-TCP. HS-TCP and S-TCP use a variable increment to cwnd directly based on the value of cwnd. BIC-TCP uses connection loss history to adjust the behavior of congestion avoidance to provide fairness.

BIC-TCP's congestion avoidance algorithm uses two search modes—linear search and binary search—as compared to the single search mode (linear or linear relative to cwnd) provided by standard TCP, HS-TCP, and S-TCP. These two search modes allow BIC-TCP to adequately maintain bandwidth scalability and fairness while also avoiding additional levels of packet loss caused by excessive cwnd aggressiveness:

  • Linear search: Uses a calculation of the difference between the current cwnd and the previous cwnd prior to the loss event to determine the rate of linear search.
  • Binary search: Used as congestion avoidance approaches the previous cwnd value prior to the loss event. This allows BIC-TCP to mitigate additional loss events caused by the connection exceeding available network capacity after a packet loss event.

The linear search provides aggressive handling to ensure a rapid return to previous levels of throughput, while the binary search not only helps to minimize an additional loss event, but also helps to improve fairness for environments with RTT disparity (that is, two nodes exchanging data are closer than two other nodes that are exchanging data) in that it allows convergence of TCP throughput across connections much more fairly and quickly.

Figure 6-21 shows how BIC-TCP provides fast returns to previous throughput levels while avoiding additional packet loss events.

Figure 6-21

Figure 6-21 Binary Increase Congestion TCP

For more information on BIC-TCP, visit http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/index.htm

Accelerator TCP Proxy Functionality

Most accelerator devices provide proxy functionality for TCP. This allows enterprise organizations to deploy technology that overcomes WAN conditions without having to make significant changes to the existing clients and servers on the network. In effect, a TCP proxy allows the accelerator to terminate TCP connections locally and take ownership of providing guaranteed delivery on behalf of the communicating nodes. With a TCP proxy, the accelerator manages local TCP transmit and receive buffers and provides TCP-layer acknowledgments and window management. This also allows the accelerator to effectively shield communicating nodes from packet loss and other congestion events that occur in the WAN.

Before this type of function can be employed, accelerator devices must either automatically discover one another or have preconfigured knowledge of who the peer device is. This allows existing clients and servers to retain their existing configurations and TCP implementations, and allows the accelerator devices to use an advanced TCP stack between one another for connections between communicating nodes that are being optimized. The optimized connections between accelerator devices may also be receiving additional levels of optimization through other means such as compression (discussed later, in the section "Accelerators and Compression"), caching, read-ahead, and others as described in Chapters 4 and 5.

Accelerators that use a TCP proxy terminate TCP locally on the LAN segment it is connected to and use optimized TCP connections to peer accelerators over the WAN. By acting as an intermediary TCP proxy device, an accelerator is uniquely positioned to take ownership of managing WAN conditions and changes on behalf of the communicating nodes. For instance, if an accelerator detects the loss of a packet that has been transmitted for an optimized connection, the accelerator retransmits that segment on behalf of the original node. This stops any WAN conditions from directly impacting the end nodes involved in the conversation that is being optimized by the accelerators, assuming the data remains in the socket buffers within the accelerator. The accelerator provides acceleration and throughput improvements to clients and servers with legacy TCP stacks, creating near-LAN TCP behavior while managing the connections over the WAN.

Figure 6-22 shows how accelerators can act as a proxy for TCP traffic, which shields LAN-attached nodes from WAN conditions.

Figure 6-22

Figure 6-22 TCP Proxy Architecture

By employing accelerators in the network that provide TCP proxy functionality, clients and servers communicating over the WAN experience better performance and loss recovery. First, loss events occurring in the WAN are wholly contained and managed by the accelerators. Second, acknowledgments are handled locally by the accelerator, thereby allowing the clients and servers to achieve potentially very high levels of throughput.

When a connection is further optimized through advanced compression (as discussed later in this chapter), not only are clients and servers able to fill the pipe, but accelerators are able to fill the pipe with compressed data (or redundancy-eliminated data), which yields exponentially higher levels of throughput in many situations. Figure 6-23 shows how accelerator-based TCP optimization can be leveraged in conjunction with advanced compression (discussed later in the chapter) to fill the pipe with compressed data, providing potentially exponential throughput increases.

Figure 6-23

Figure 6-23 TCP Optimization and Compression Combined

A good accelerator solution will include the TCP optimization components discussed earlier in this chapter and application acceleration capabilities discussed in Chapters 4 and 5, along with the compression techniques discussed in the next section. All of these factors combined can help improve efficiency and application performance over the WAN.

  • + Share This
  • 🔖 Save To Your Account