Introduction to Peer Architectures
With a clearer grasp of what the p2p concept means, we can now take on the task of classifying peer architectures. This analysis is fairly abstract and non-specific, at least to begin with, because any subsequent discussions of specific technologies and implementations must build on a common groundwork.
As a first step, we must define the fundamental terms and the basic p2p models. Including a summary of primary characteristics in this overview proves useful when comparing the functionality of different technologies, and also when we look in Part III at the future development of p2p applications.
We also need to discuss protocols, the "glue" that holds networks together. Later chapters dissect particular protocols in considerable detail, so this discussion is not made dependent on any particular implementation. It provides a familiar context in which to place the later specifics. In addition, it can serve as a map to help identify any omissions or simplifications in the protocol design of a particular implementation, which could otherwise be obscured in the mass of technical detail.
Finally, because the focus of the book is mainly from the end user perspective, and thus on application implementations where the p2p software is entirely or mostly administered by the user, this chapter is the appropriate place to summarize some of the peer implementations that aren't covered in other chapters. The final section therefore mentions a selection of common peer networking technologies native to different operating systems. These technologies often form the underlying transport layer in host systems that application p2p implementations depend on, yet transcend with their own protocols.
Chapter 2 at a Glance
This chapter introduces the architectural models and fundamental terms used in peer-to-peer networking.
From Model to Reality starts with a summary of the conceptual models for p2p, before delving further into detailed terms.
The Protocol Types section analyzes the basics of protocol to introduce some terms and concepts used later to describe particular implementations. Network Purpose adds a perspective often overlooked, that of suitability to a particular purpose.
Architectural Models provides an overview of the main types of p2p technologies based on their architecture.
The models covered are Atomistic P2P, User-Centric P2P and Data-Centric P2P. The Leveraged P2P section discusses how distributed implementations might incorporate and blend aspects from these p2p models to improve functionality and performance.
Specific Architectures is mainly a background to Part II and gives an overview of some p2p possibilities not included elsewhere in this book.
Native Networking describes common solutions already built into operating systems, while Other Application Groups zips through some p2p and p2p-related technologies not covered elsewhere.
From Model to Reality
First of all, let's summarize into a concise structure the conceptual models of p2p that were introduced in the historical overview in Chapter 1.
Of the many possible ways to structure the information about different conceptual models for information exchange using computers, Table 2.1 takes the perspective of client-server analysis. All the main client-server models are included to give context, while the shaded section in the middle of the table encompasses the established and accepted p2p architectures discussed in this book.
As used in the table, the term "index" means collections of logical links to distributed resources or data, while "directory" refers to collections of logical links to users. Not all applications or networks make this distinction between the two terms and services, but it is a useful one. In either case, this logical addressing is usually independent of the underlying addressing scheme for the network (for instance, the Internet). The latter just functions as a transport layer for the specific p2p protocols.
Although distributed resource sharing, or what might be called the computation-centric model, is often included as a p2p technology, its exchange model actually says nothing about the presence of any p2p architecture. A distributed computation-centric implementation might indeed include p2p characteristics from any of the table's preceding three p2p models, likely data-centric. Quite often, however, node communication is only with the central server that owns the data and distributes the tasks. Next-generation Web or a fully deployed .NET infrastructure, both still in the future, might also include many aspects of p2p but are unlikely as a whole to build on any one p2p architectural model.
The Table 2.1 summary also gives a chronological overview of popular trends in communication modes, even though the actual development and use of each listed technology is nowhere as linear and simple as the popularized "paradigm shifts" would suggest. In the earliest dumb-terminal systems, for example, the mainframe servers could be interconnected in an atomistic p2p model, refuting the impression that this model came later. Yet it's still true that the PC p2p model did.
TABLE 2.1 Conceptual models for information exchange
Centralized processing (or "dumb terminal systems")
Display on many (local) dumb clients (terminals)
Data storage, processing, indexing, policies on one server "mainframe"
Client-server (such as corporate NT domain networks)
Processing and display on many smart clients (such as LAN PCs)
Data storage, processing, indexing, policies on one main server
Web server and browser (current paradigm)
Limited processing, display on many clients on WAN or Internet
Data storage and limited processing on many (distributed) servers
Peer-to-peer models: The client-server distinction blurs in all these peer models. Data storage, processing, and display on many peers.
No separate servers
Directory services on one or few servers
Indexing services on one or few servers
Computation-centric or distributed process
Processing on many distributed clients
Administration, storage, indexing, and display on one or few servers
Next-generation Web (and perhaps .NET)
Data, processing, co-authoring, and display on many clients
Many kinds of distributed services on many servers
The original vision of the World Wide Web by its "creator" Tim Berners-Lee and others was a predominantly data-centric p2p onea globally hyperlinked content space where no single server had precedence over any other. This vision implied extensive co-authoring and open collaboration by all users. However, much to the disappointment of the first visionaries, the Web instead evolved to have overwhelmingly static and server-centric content, locked to the visitors. There came to be few actual content providers compared to the many users. With users constrained to the role of passive consumers, and little peer communication between them, functionality development of Web browsers focused mainly on snazzy presentation features, not user utility, nor for that matter many of the more basic navigational and collaborative improvements proposed from the very start.
Nevertheless, the importance and use of open p2p models has returned on a new level, user-to-user rather than machine-to-machine, making the developmental chronology implied by the previous table reasonably accurate in that sense.
Protocol forms the "glue" that holds a network together by defining how nodes communicate with each other to achieve network functionality. We therefore need to examine the terms and concepts that are later used to describe how particular implementations function.
Protocols are specified at many different levels, but we can define some common characteristics concerning how a protocol is implemented. For example, we can consider the kind of modality focus of the communication:
Message-based protocol, where the focus is on sending and receiving discrete, packaged and addressed messages. How the messages are carried between two parties is delegated to an autonomous agent, which is like a conversation with messages exchanged by courier, carrier pigeon, or mail.
Connection-based protocol, where the focus is on establishing connections over which messages can be sent. This situation is more like a telephone conversation, where a dial-up connection allows "raw" messages (unpackaged and unaddressed) to be exchanged in real time.
Another, related aspect is the relative timing of messages:
Asynchronous conversation, in which one side need not wait for the other side's response before sending another message.
Synchronous conversation, where explicit or implicit dependencies exist between a message and its response. Additionally, there are often internal timing constraints between parts of the same message.
A third aspect is state, as applied to the network used for message transport:
Stateless network, which treats all messages the same with no reference to previous messages. The same message will be processed and interpreted the same way every time it is sent.
State-aware network, which exhibits dependencies. Processing one message can influence how a future message will be handled. Memory of past events is preservedthe same message can give different results at different times.
The current Internet TCP/IP connectivity model is connection-based, asynchronous, and inherently stateless. However, many message-based Internet applications introduce various ad hoc mechanisms and protocols to track state.
Constraints of Internet Transport
Internet messages are packaged, addressed, and sent as a number of packets determined by underlying transport layers. The packets are routed by independent routers with locally determined priorities and paths. The "sockets" one sometimes sees reference to are logical abstractions of virtual endpoint connections between, for example, server and client, but have no correlation with how messages really travel between them. Figure 2.1 is a simple illustration of that concept.
FIGURE 2.1 Simple illustration of how data transfer between Internet routers is accomplished by asynchronous forwarding of individual packets along whatever path each router at that moment deems suitable
There is little concern at higher or application levels for how the data transfer is accomplished. The requirement is only that all packets are received within a "reasonable" time, or can be requested again if missing so that the message can ultimately be reconstructed. Hence TCP/IP is characterized as a reliable transport, in that it implements handshaking to track and acknowledge packets received.
This pervasive and asynchronous nature of packet transport is not a problem unless dealing with various forms of "streaming" content, where packets must be received in a particular order. The severe timing constraints of "real-time" content mean that any missing packets can't easily be requested again. Generally speaking, implementing streaming media over the Internet requires both an acceptance of a buffering delay at the receiver and some tolerance for missing packets. Therefore, streaming connections rely on UDP (User Datagram Protocol) as transport, which doesn't try to guarantee packet reception in return for less overhead to manage.
Internet-based p2p applications must usually accept at least the constraint of the asynchronous and stateless nature of the underlying TCP/IP packet routing. They must also build on the current IP addressing model, even if they subsequently construct other directory services with different scope and resolution.
The current Internet paradigm, especially the Web, has for some time been predominantly unidirectional in its information flowfrom content server to consumer client. This aspect has hampered the development of support at all levels for arbitrary conversations.
In the strict sense, most current Internet applications don't easily support real conversations, only requests for predetermined, static content at some more or less permanent address on the network. This is reflected in, for example, Web browser design, which for the sake of efficiency caches content locally but can have problems dealing with sites that generate dynamic content.
That's not to say it's impossible, or even necessarily hard, to support flexible conversations in the existing protocolsp2p applications are a case in pointonly that the majority of deployed Internet applications tend to be very rigid and limited in this respect, or to ignore the conversational aspect altogether.
It's mainly for this reason that the deployment of application talking to application is still rare and limited in its scope. The vision of autonomous agents deployed to roam the Web, capable of gathering and filtering information according to rules relevant to the interests defined by the user, has long been a compelling one, yet remains largely unfulfilled. To achieve this, not only must bidirectional conversations be a natural mode in the infrastructure protocols, but the information must be accessible in a common structure or metastructure, and the various agents and servers fully interoperable.
Next-generation Web applications do promise a far greater degree of bidirectional conversation support, partly because of proposed extensions to the underlying Web transport protocols, and partly because of a whole range of services geared to distributed authoring and management of content, as exemplified by DAV (Distributed Authoring and Versioning). The other part of the equation is that newer content protocols such as eXtensible Markup Language (XML) are designed to meet the linked requirements of common structure, configurable functionality, and ease of extensibility for particular, perhaps unforeseen needs.
Until then, various protocol overlays in the form of existing p2p technologies allow users to retrofit at least some functionality that easily and transparently can support real conversations between nodes in all modes. Some of this is evident in the discussions of architecture models later in this chapter, and in Part II where practical implementations are examined in detail.
Protocols and infrastructure are only one part of the story. The technology must also have an aim, a reason for its design that manifests in the implementation.
Each implementation of a p2p network has a stated purpose or intent at some level. Usually, this implicit or explicit goal was made early in the design process and so to a great extent determines both just how and for what you can use it.
A given implementation might be admirable in many critical aspects, yet unsuited to the purpose you intend. Many current p2p architectures are relatively specialized. Some focus areas for current p2p implementations include
- File sharing and data sharing
- Content publishing
- Content retrieval (including search and distribution)
- Distributed storage
- Distributed network services
- Distributed processing (and presentation)
- Decentralized collaboration
- Content management/control
- General resource sharing
In addition, the prospective user must evaluate the impact of specific design decisions concerning scalability, security, reliability, storage model, and so on.
The network purpose also includes the dimension of actors, and it is common (and natural) to use the idiom of "conversation" when describing network interactions between peers. Looking at this idiom, we can more clearly see some useful ways of talking about the process of communicating over a p2p network, and some of the essential components of such architecture.
We see three communication situations in p2p network conversations.
- Person to person (P-P)
- Person to application (P-A)
- Application to application (A-A)
Peer applications such as messaging are of P-P type, while file-sharing nodes that automatically fulfill typed-in requests by remote users are basically P-A. Both P-A and A-A will likely grow in importance as we get better at designing automata that can interface with people in "normal" conversational contexts, and with each other to manage routine transactions for particular content or services on the network.
We've seen the same trend in telephony P-A, especially recently in the maturing field of advanced voice recognition services, where human operators are rapidly vanishing and becoming only instances of last resort for fulfilling user requests. Web-based customer care centers have also seen rapid deployment as cost-saving ways to allow customers to fulfill their own requests and manage their own accounts. Online Internet booking, shopping, and banking are other P-A technologies that have seen significant investments and deployment, albeit occasionally with mixed results. Online government, so far usually in the somewhat limited sense of requesting information, retrieving forms, and filing tax returns, is yet another P-A area.
It's been stated, probably correctly, that the real transformation of the Internet will occur first when A-A conversations become commonplace. This transformation would be at least as great a shift in communication patterns as when telephony (and especially cellular phones), e-mail, or instant messaging became popular.
Extensive use of A-A would probably imply an equally extensive deployment of P-A, where people communicate with the local client interfaces to the user applications implementing the delegated user authoritysee the following section.
Anyone interested in seeing where both P-A and A-A implementations of p2p might lead should look more closely at the open, overtly protocol-based implementations discussed later in Part II, and at the speculations in Part III.
Properties and Components
Looking at conversation properties in general lets us identify some essential p2p properties and components more clearly. Defining these basic terms is important for later discussions.
Identity. This property simply serves to uniquely identify any user, client, service, or resource, and is fundamental to any p2p context. The practical implementation of a network's identity namespace critically determines or limits the scope of application of the p2p network. Another critical identity issue deals with identifying and tracking individual messages.
Presence. As a property, presence defines the arbitrary coming and going of actors in a dynamic conversation. As a component, it represents the mechanism by which users, applications, services, or resources can manage and communicate information about their current state. Note that "presence" can go beyond the simple state model to convey all manner of context-specific information for a particular conversation.
Roster. One most often identifies roster as a list of frequently contacted identities. The corresponding component provides short-list entry points into a chosen peer community but is often underestimated in terms of its potential automation utility to the usersee "agency". In particular, applications and services can make use of a roster to intelligently share resources, filter conversations, and determine appropriate levels of trust automatically, without the user's constant attention and intervention.
Agency. With relationships to both identity and roster, agency defines the ability for an application to act as an autonomous client. This can mean initiating, managing, coordinating, and filtering conversations that the user would be interested in or has set up rules fore-mail filtering rule sets are but a simple precursor to agency. In some cases, the agent might act with vested formal authority on behalf of the user, probably leveraged with one or another form of digital signature or certificate.
Browsing. As yet comparatively rare in p2p implementations, the ability to browse available peers, services, applications, resources, and relationships is an important but underrated feature that is only marginally supported in the current paradigm of searching the network or central user/resource database. We find its best known implementation in the form of the browser built into Microsoft Network Neighborhood.
Architecture. In this context, architecture mainly denotes how messages are managed and passed between endpoints in a conversation. One dimension for describing this term is to locate the process somewhere on the scale of client-server to atomistic peer. Another is degree of distribution of services and storage. The relevance of any description depends on the p2p context.
Protocol. Current p2p implementations rest on the packet protocol layer of TCP/IP, sometimes UDP/IP, overlaid with more sophisticated application and session protocols to create the virtual network that defines a particular implementation space. Ideally, the implementation should be fairly agnostic about this layering process and be able to transparently translate across different protocols as required.
This list of components is used in the later implementation chapters in Part II to provide a baseline table for a summary comparison between the different implementation architectures.
Although neither exhaustive nor the only way to examine functionality, these items are a useful way to highlight significant differences between implementations. Maintaining a focus on these primary characteristics helps a user evaluate critical features and discount non-essentials in a given implementation's feature list.
The relative importance of each characteristic is largely dependent on intended purpose and scope of the application, but some general conclusions are still possible. One is the overall importance of identity. This might seem trivialsurely you need some form of identity to communicateyet some implementations totally lack any concept of user identity. In such cases, the implementation works because the purpose doesn't require any defined identity. Others allow arbitrary, even multiple, user-selectable names that are unique only within a particular context.
Even with a well-defined identity, the question remains as to what exactly that identity is tied tomessage, person, role, digital signature, software, computer componenteach has advantages in certain contexts, and clear limitations in others, particularly in the degree of addressability, security, or portability each offers.
Another conclusion is that some characteristics, such as presence, are often undervalued in p2p designs. This view applies even to basic online status, let alone to more advanced concepts of presence which might be applicable to a wider context involving autonomous agencies acting on the behalf of a user.
Autonomous agency remains very much in the realm of speculation and vision, but is an enticing goal in the face of the ever-mounting floods of unfiltered and unsorted information that confront a human user on the Internet or at work. Such prospects are dealt with in more detail in Part III.
We next examine the primary architecture models of p2p.