SyncML: Synchronizing and Managing Your Mobile Data
Mobile computing gives everybody access to their business or personal data everywhere, using devices like Personal Digital Assistants (PDAs), smart phones, mobile phones, and laptops. An online connection to a corporate datastore might not always be possible, due to lack of network coverage, for example. Sometimes even if a connection is available, using it might not necessarily be the fastest and most cost-effective way for the application to operate. In situations such as these, data synchronization is a key technology to alleviate those shortcomings.
Data synchronization allows a consistent local "copy" of various kinds of data, from a central corporate datastore or a service provider datastore on the user's device. It is therefore possible to look up or change data locally on the device without requiring an online connection to the master copy of that datastore. The simplest case is a user retrieving data for his local copy. Here, the application needs to get only the changes from the master datastore to the local copy, without copying the complete datastore again. Synchronization gets more complicated as soon as a lot of different users make modifications to their local copies of the datastore. Now, somehow, these modifications need to be reconciled between all copies of that datastore.
Data synchronization is the technology used to keep all these distributed copies of a datastore consistent by communicating the actual changes between these copies and by resolving conflicts that may arise due to contradictory changes in different copies of the same datastore.
Today, synchronization services support Personal Information Management (PIM) data, such as addresses, calendar entries, memos, and to-do's, as well as Relational Databases and file systems.
The following paragraphs describe the different possible synchronization topologies: one-to-one, many-to-one, many-to-many, and two hybrid versions. These definitions are followed by explanations of the different synchronization modes: local, pass-through, and remote.
The different challenges and problems that arise while keeping data synchronized are elaborated in the following part. This chapter closes with an overview of related standards organizations, most of which have chosen to change their synchronization technology in favor of SyncML®.
The Different Topologies
Changes made to different copies of a datastore can be propagated to other copies of that datastore in different ways. The synchronization topology defines the logical flow of the changes propagating through the network of computers hosting instances of that datastore. The four major topologies are:
Hybrid of many-to-one and many-to-many
The one-to-one topology is the simplest case. The other topologies can be seen as an extension of this one. Here the data is only shared between one server (the square in Figure 11) and one client (the circle in Figure 11). A possible usage scenario for this topology is a datastore that is mirrored for backup purposes. All changes made to the client are
Figure 11 One-to-one topology
also sent to the server to ensure that its copy of the data reflects the current version of the client copy. Assuming that data is only changed in the client directly (i.e. no modification is made to the server copy besides synchronizing with the client), then there is no risk of any conflict in this topology. The one-to-one topology is also known as the "Dedicated Pair" topology.
This kind of topology is also used between someone's PDA and personal computer, with the difference that changes are usually made on both the PDA and the personal computer. In this case, the conflicts are typically identified by the PC and directly resolved on the PC. In some cases the conflict is marked and the user is asked to resolve it.
Numerous commercial systems are examples of the many-to-one topology (also known as central master or star topology). In this topology, data is propagated from a central master to the different entities containing copies of the data, as shown in Figure 12.
Figure 12 Many-to-one topology
The main advantage of many-to-one topology is its relative simplicity to implement compared to many-to-many topology, which is described in the next section.
All clients exchange data with the central server onlytwo clients cannot exchange data directly without the intermediary central server. Because of this characteristic, conflicts can only arise at the central server, which needs to detect and resolve them. The clients themselves do not need to worry about conflicts. They just inform the central master about the local modifications and process the change requests they receive from the central master. There is no need for the client to determine where to send it, as in the many-to-many topology.
This topology is common when a person has a PDA, a cellular phone, and a personal computer sharing an application such as the calendar application, and both the cellular phone and the PDA are synchronized with the personal computer (but not between themselves). This kind of interaction is also common when family members carry cellular phones and update their shared family Web calendar independently or when mobile employees in an enterprise update inventory datastores independently.
The drawback of this architecture is that the central master could become a bottleneck, a single point of failure that could immobilize the entire system. Let's consider an Internet service provider scenario with a central master that serves several hundred thousand accounts, all trying to synchronize with the same central datastore. Here the central master should not be a single server, but a cluster of high-performance servers to limit the latency in response time even if one of the servers fails.
In many-to-many (or peer-to-peer) topology, there is no central server. Every client is also a server, as shown in Figure 13. For simplicity in this chapter, the client/server combination on each device in the many-to-many topology is just called client.
Every client gets updates from and sends updates to every other client. After a record on one client is updated, this client is responsible for updating all the other copies of the data on all the other clients to ensure that the consistency of the distributed datastore is maintained. This might be by directly contacting the other clients or by sending the updates to the clients nearby, which are then responsible for propagating it further.
Figure 13 Many-to-many topology
Consequently, every client must be able to detect and resolve conflicts. This requires more complex software on each client, which naturally increases the implementation cost, especially on small mobile clients, like mobile phones, in which memory is a scarce resource.
Compared to the many-to-one topology, the many-to-many topology is more robust but also clearly adds to the complexity. In this topology, it is very difficult to find out if a modification was indeed propagated to all clients at a given point in time.
One advantage of the peer-to-peer topology is that without a central server, there is no single point of failure. Every client has a copy of the data and can act as a server. The clients can continue to work and exchange data despite failures in other parts of the network. A client can retrieve updates from the closest server in the network, which gives quicker access to data otherwise stored remotely.
This topology may occur whenever there is no notion of a primary datastore involved in the system. Consider a team of emergency response workers taking readings such as measured temperatures, toxin levels, and structural stress conditions in a building or an affected area. They can synchronize these readings as they pass by each other using direct wireless or infrared links between their handheld devices.
Hybrids of Many-to-One and Many-to-Many
In an effort to combine the advantages of many-to-one and many-to-many topologies, hybrids containing characteristics of both types can be used, as shown in Figure 14.
Figure 14 Hybrids of many-to-one and many-to-many topologies
The cluster consists of a two-level structure of data copies. The top level consists of a cluster of servers. All servers contain copies of the data and replicate between each other, but for each data object only one
server keeps the authoritative copy. The other servers are unaffected by the failure of one of them. Using geographically distributed servers can contribute to reducing the distance between server and clients.
In a hierarchy, the server structure could be modeled according to the organizational structure of a company. The top part of the figure shows servers, which are at the same time clients of a server one level above. In this structure, even when one section experiences a failure, the overall topology can still work properly.
In commercial implementations using a central master topology, the master server itself consists of a cluster of servers accessing a central datastore. This setup guarantees high availability and reduces the disadvantages of a central master topology with regard to the single point of failure. Nevertheless in this setup the servers are physically at the same location and a network failure could make them unreachable. That would not be the case in the cluster or hierarchy topology, as described above.