Skip to content

SyncML Client Do-It-Yourself Style

You want to synchronize data in a local database with a SyncML server and there is no SyncML client which supports that? Perhaps you are using KDE PIM or GPE? This post explains how you can write your own SyncML client for them using the SyncEvolution framework. There are several arguments in favor of taking that route compared to directly using a library like the one provided by Funambol:

  • The existing SyncEvolution user interfaces can be reused (command line and GUI).
  • The existing autotools build system can be reused.
  • Simpler, because less interfaces have to be learned and used. Everything one has to know is documented in one place: API in the header files, HOWTO in this post.
  • Automated testing can be enabled in a few lines of code.
  • If wanted, the result can be distributed as part of SyncEvolution. This is less work overall than doing releases, bug tracking, user support etc. for multiple independent clients.
I Want You… To Sync!

Of course, not all of these arguments may apply. Perhaps some information is still missing. In that case I count on your feedback and questions! I intend to keep this blog post up-to-date so that there always is a good starting point. For further information please refer to the Doxygen documentation of the different classes mentioned in this post.

2011-04-19: unfortunately I did not have enough time to keep the content really up-to-date. The whole backend API was redesigned. For the rationale and information about this change, see the mailing list discussion. The file backend continues to serve as a functional example for a source derived from TrackingSyncSource. The KCalExtended (aka mKCal) backend shows how to compose a source differently, with change tracking implemented using a backend specific method.

Requirements

Enough said, let’s get down to business. You need some understanding of C++. A compile environment which is supported by autoconf/automake/libtool (i.e., a Linux/Unix/Mac OS X system or Windows with something like Cygwin) is useful; otherwise you’ll have to figure out yourself how to compile the Funambol client library and SyncEvolution. No knowledge of SyncML is required, but you need to know how to read and modify items in your database.

The code that you have to write currently has to be in C++. I have thought a bit about extending SyncEvolution with modules written in Python: it could be done by exposing the essential classes (there aren’t that many) to Python in a syncevolution module and calling a Python module by embedding the Python interpreter. By providing some more classes this might even become useful for GUI frontends, like Genesis (which currently calls the syncevolution binary and parses its output). If someone would like to have such a facility or (even better) wants to implement it, please leave a comment.

Some Definitions

Client and server:
SyncML synchronizes data between unequal partners. The server is the central hub which coordinates data synchronization between multiple, possibly heterogeneous clients.
Item:
in SyncML, each item is an opaque binary blob as far as the protocol is concerned. An item can be in one of three different states: updated/added/deleted since the last sync.
Database:
that’s the collection of items that is to be kept consistent between client and server. “local” from a client developers perspective is the database which the client directly has access to.
Synchronization session, or just “sync”:
after initiating the session, first the client tells the server about its changes, the server imports them and then sends any changes that it has itself. This is the normal “two-way” sync. If a sync fails, then usually client and server cannot know for sure whether their peer has received all changes. The next sync then has to be a “slow” one where the client sends all items, the server compares against the ones it has, and sends back those items that the client doesn’t have or that must be updated. The comparison on the server is based on heuristics, so unwanted duplicates are likely. In this mode deleted items are recreated because the server cannot determine whether items missing on one side were intentionally deleted or lost during a failed sync. Other modes are also possible (sending changes only one way; replacing all items on one side with all items from the other side).
Sync source:
this is a term introduced by Funambol. It stands for the class which connects a specific database to the sync infrastructure, both on the server and in clients. In SyncEvolution, sync sources are implemented by backends. Each backend can provide one or more modules, which can be compiled into the main binary (easy to install and debug) or loaded dynamically (useful when some optional modules might have unsatisfied library dependencies). Sync sources and the data format they are expected to use are selected by a source’s type property. Multiple sync sources can be active during the same sync session. However, the databases on client and server have to be different. On the server, databases are identified by an URI. Strictly speaking, this is path relative to the sync URL of the server (./contacts is the same as contacts), but some servers treat it as a string which has to match exactly (./contacts not the same as contacts)
Collision/conflict:
if an item was modified or deleted on the client and the server also has a change for that item, then the server must resolve this conflict. It cannot know which change was made earlier, because the clients do not send time stamps for their changes, so servers usually arbitrarily choose “server wins” or “client wins” and discard the other change. When dealing with two updated revisions of the same item, a server might also try to merge them automatically or keep both revisions. In the former case old, unwanted data might be preserved, in the later case the user has to do the merging manually.

Data Format

Ideally the SyncML server understands the data format that you intend to exchange with it. The standard formats are:

Contacts:
vCard 2.1/3.0, with 2.1 being supported by all servers
Calendar events and tasks:
vCalendar 1.0 or (less often, but more capable) iCalendar 2.0
Notes:
plain text, first line serves as summary

When using these standard formats, the server is aware of the semantic of the data and can do conversions, like vCalendar 1.0 <-> vCalendar 2.0. It can also handle clients which support only a subset of the full standard: when receiving an update of an item from such a client, the server has a chance to preserve properties that the client was unable to store. A dumb server would replace a complete item with the incomplete one just received from a less capable client.

If you want to synchronize other kinds of data, then you need a server which accepts arbitrary binary blobs and stores them verbatim. In the Funambol server one can configure a file sync source which does that. Going that route limits synchronization to clients which all use exactly the same binary format.

Planning your Sync Source

The first step is determining the data format. If you have access to code which imports/exports items in the local database in one of the standard formats, then you are ready to go forward. If not, then you will have to define your own mapping between the database and a suitable interchange format. This can easily be the hardest part of the whole exercise and is beyond the scope of this blog post. It is useful to support both vCard 2.1 and 3.0 to meet the server’s needs. The class EvolutionContactSource shows how this conversion can be done.

The next step is deciding about change tracking. There are two different ways to do that. The easy solution relies on the TrackingSyncSource to do all the heavy lifting for you. In this case your code must be able to iterate over all items in the database. It must provide a unique ID and a “revision string” for each item. Both are arbitrary strings, but keeping them simple (printable ASCII, no white spaces, no special characters) makes debugging easier because such strings can often be used as-is without escaping problematic characters. The ID only has to be unique among all items currently in the database, so recycling an ID after deleting an older item is possible. However, this makes debugging harder and is better avoided.

The “revision string” must change if the item was changed and should not change when the item was not changed. If the revision string changes between syncs, then the item is sent as “updated” to the server even if it hasn’t really changed. This is usually not a problem, but can lead to unnecessary conflicts and wastes some bandwidth.

If the database keeps a modification date and time for each item, then a textual representation of that time stamp is a good revision string. Take care that changes made after a sync really assign a different time stamp: if the time stamp has seconds as resolution, then waiting for one second at the end of a sync is a good idea. This can be done in the close() method of your sync source.

If there is no modification time stamp, then a hash of a textual representation of the item can be used instead. The drawback of that is that even minor changes in the way how the item is formatted make it look like the item was modified.

Instead of using TrackingSyncSource, it is also possible to derive directly from its base class, EvolutionSyncSource. In that case change tracking has to be implemented explicitly, possibly by utilizing facilities in the database API. This is not explained further here; for an example, see EvolutionContactSource.

One last point worth further considerations is whether the backend shall support more than one local database, how the user can select them with the evolutionsource property and whether the backend creates databases that don’t exist yet. Supporting more than one database is required for automatic testing against a server; creating them automatically simplifies the setup of that testing. The convention adopted by the existing backends is that databases starting with file:// are created. Other strings have to match an existing database. This is done to catch typos.

Error Handling

SyncEvolution keeps track of whether a sync source has failed in a property of the EvolutionSyncSource base class. If only one of several active sources fails, then the sync continues with those. SyncEvolution forces a slow sync by resetting all sync anchors (a string which is stored in the client and the server) before a sync and only updating the anchors of sources which have not failed at the end of a sync. For the failed ones the server will detect the anchor mismatch during the next sync and force a slow sync.

There was a discussion on the Funambol developers list about resuming from a failed sync more gracefully. This is still in the planning stage and the necessary support in the client library has not been implemented yet.

Errors have to be reported back to the caller via a C++ exception. The methods EvolutionSyncSource/Client::throwError() should be used for that. The one in EvolutionSyncSource adds the name of the source to the error description and also sets its state to “failed”.

Getting Started

This section describes how to create your new backend and how to compile it as part of SyncEvolution using the autotools. As a first step make sure that you can compile the SyncEvolution 0.8 beta 2 (or later) source .tar.gz, using configure && make. You don’t need Evolution for that: there is a local file backend which compiles on all platforms. In the next step check that you can rebuild configure and the Makefiles by invoking ./autogen.sh. You may have to install autotools/automake/libtool packages for that to succeed.

Backends are stored in subdirectories of src/backends. Each subdirectory contains one subconfigure.in fragment that gets included in the base configure and a corresponding Makefile.am. This slightly unusual setup has the advantage that configure --help lists all options and common tests only need to be run once. Maintainer mode (= regenerating configure and rerunning it when one of build files is edited) still works.

To create a new backend, copy the existing src/backends/file directory. It is a very minimal backend, so you can easily replace the parts that have to be implemented differently for your sync source. At a minimum, change the enable part of subconfigure.in and rename the .cpp|h files. Remember to rename the files accordingly in Makefile.am. There can be more then one backend per directory. The two original Evolution backends for contacts and calendars/task lists/notes are built like that inside backends/evolution, sharing a common configure-sub.in.

In the *Register.cpp file(s) your backend adds itself to the SyncEvolution framework, simply by being linked into the executable or (if you link dynamically) when loading the module. You decide how users select the sync source via the type. In order to work with the default configuration templates, treat the standard names (like e.g. addressbook) as an alias for your source. The first backend which accepts a certain type “wins”, therefore also add a dedicated type that can be used to select your backend directly in case of conflicts. Your backend will always be asked before the file backend, but might come after other backends.

In the same file you’ll also register tests for your new class. We’ll go into that in more detail below.

Note that none of the changes above require modifications to SyncEvolution source files. This is intentional: you can start working on your own sync source without needing write access to the SyncEvolution repository. Instead of working with the source tar ball (as suggested above for simplicity reasons) you can also check out the source directly and keep track of changes in SyncEvolution and the client library that way. If you decide to keep your source separate from SyncEvolution permanently, then that’s okay, but I would rather like to have all code in one place (no copyright transfer needed!). Ask [for Subversion access], and it shall be given you…

Implementation

Have a look at the base class that you are deriving from: both src/core/TrackingSyncSource.h as well as src/core/EvolutionSyncSource.h contain comments that describe what you are expected to implement. make doc in the root directory invokes Doxygen, which turns these comments into HTML files in the html directory of the build directory. Doxygen and dot (from the graphviz package) must be installed.

I hope that those explanations are good enough so that I don’t have to add further instructions here. If not, please let me know in the comments.

Testing

You have implemented your own backend and it compiles? Great, now let’s make sure that it really works ;-)

In this section I’m assuming that you implemented one sync source, configured the compilation so that only your backend is active (check the configuration summary at the end of the configure output, use --disable-foo as necessary) and don’t using a modular build (the default, i.e., no --enable-shared). When I write about invoking syncevolution, I mean the binary which is created in the src directory – be carefully to not accidentally invoke one that might be installed elsewhere…

Run syncevolution --source-property type=?. The description should contain an entry for your sync source. If not, check that your registration code is linked into the main binary and invoked.

Then create a configuration. For each of the four standard sources (addressbook, calendar, todo, text) the createSource() that you have provided as part of an RegisterSyncSource instance will be called to check whether it can potentially provide such data. If it returns a sync source, then that sync source’s listDatabases() is called to verify that the user really has data of that kind. The convention is that the first database is used by sync sources by default. For testing purposes it makes sense to point the evolutionsource property towards a scratch database with no valuable data – I don’t have to tell you about backups, right? If any of these steps yields no results, the source will be disabled. If you developed a special sync source which does not match with any of the standard sources, then you will have to create your own source config file. syncevolution --print-servers prints the names of existing configurations and where they are stored.

Once you have created the configuration, you can use syncevolution --sync <mode> <server> <sourcename> to sync with your backend in different sync modes. syncevolution --sync ? prints a list of valid modes.

After creating a second configuration with the same server URI and a different local database, copying items and changes to and from the server can be tested. But this manual testing quickly gets tedious. A better and more convenient coverage of your code is achieved by integrating your backend also into the automatic SyncEvolution regression testing. This CPPUnit based testing was originally developed for SyncEvolution and later moved into the Funambol C++ client library so that other SyncML clients can also use it. It covers both local tests (importing/export items, change tracking) and synchronization with a server.

Local tests only require one database with change tracking for multiple different servers (something that TrackingSyncSource provides automatically). For real syncs, the sync source must support two different local databases: the first database is associated with a simulated client A that synchronizes via the server with a client B accessing the second database. Both “clients” run on the same machine in the same account because that’s where the main test program, client-test, runs. On Unix systems it might be possible to use multiple accounts if a sync source only supports one database per account (like the Mac OS X Addressbook), but that idea hasn’t been implemented yet.

Testing your sync source is plugged into the client-test test runner using the same *Register.cpp file that is also used for registering the sync source itself. These files will always be compiled with -DENABLE_UNIT_TESTS and -DENABLE_INTEGRATION_TESTS when building them for client-test, therefore the tests are always available even when configuring without the corresponding --enable switches. They do not get into the way when including them in syncevolution either.

Local instances of the RegisterSyncSourceTest class that you have to provide then add the tests: you have to choose a name for each test, which is how it will be listed by client-test: “file_vcard21″ appears as “Client::Source::file_vcard21″ and “Client::Sync::file_vcard21″. There are predefined tests for vCard 2.1/3.0 and iCalendar 2.0 events/tasks, but you can also provide your own test data in a ClientTest::Config struct from scratch (MemoTest in EvolutionCalendarSourceRegister.cpp) or just your own test case file (sqlite/Makefile.am and SQLiteContactSourceRegister.cpp). Unit tests for your code can also go into the *Register.cpp files.

Here’s a step-by-step guide for getting started with automated testing, using ScheduleWorld as example:

  • cd src
  • make client-test
  • CLIENT_TEST_SERVER=scheduleworld \
    CLIENT_TEST_EVOLUTION_PREFIX=file:///tmp/testing/ \
    ./client-test --help

    => should list your tests and creates ~/.config/syncevolution/scheduleworld_[12] configs, using local databases in /tmp/testing/. The corresponding evolutionsource is updated each time client-test is run, based on the current value of CLIENT_TEST_EVOLUTION_PREFIX. The default prefix is SyncEvolution_Test_. The name of your test and _1 resp. _2 will be appended. Your sync source must support creating these databases, otherwise you’ll have to use a different prefix and perhaps create suitable databases by hand.
  • check the configuration and enter your server account information (the syncevolution is your friend here: --print-servers, --print-config, --configure…)
  • CLIENT_TEST_SERVER=scheduleworld \
    CLIENT_TEST_EVOLUTION_PREFIX=file:///tmp/testing/ \
    ./client-test SyncEvolution Client::Source

    => runs all tests involving just local operations
  • CLIENT_TEST_SERVER=scheduleworld \
    CLIENT_TEST_EVOLUTION_PREFIX=file:///tmp/testing/ \
    ./client-test Client::Sync::your_test_name_goes_here::testCopy

    => runs one test that checks that one contact can be copied to and from the server using the two configurations
  • CLIENT_TEST_SERVER=scheduleworld \
    CLIENT_TEST_EVOLUTION_PREFIX=file:///tmp/testing/ \
    ./client-test Client::Sync

    => runs all tests which involve the SyncML server; tests with just one active source are run first, followed by the same tests with all enabled sources in two different orders

One not unlikely result is that items are slightly modified while they are handed from one client to the server and back to the other client. The synccompare script brings vCard/iCalendar items into a normal form before comparison (thus preventing diffs because of perfectly legitimate reformatting) and ignores changes that some servers are known to introduce. If you get diffs, then first check whether your sync source passes the local import test, Client::Source::*::testImport. Either fix that or use your own test data that doesn’t contain the problematic data. When a server is involved, the *send.client.A.log shows what was sent to it and *recv.update.client.B.log resp. *refresh.client.B.log what was received.

If it was the server, then work with the server developer to improve the server and/or add more suppressions to the script. Unfortunately this is currently not possible without modifying the original file in the Funambol source code repository. Temporarily a local file in SyncEvolution’s test/synccompare.pl can be used to override the one from Funambol. The script also doesn’t handle normalization of vCard 2.1 and vCalendar 1.0 well (f.i., it doesn’t remove redundant QUOTED-PRINTABLE parameters). Patches welcome…

If tests fail in some other way, then have a look at the definitions of the tests (LocalTests and SyncTests in ClientTest.cpp) to find out what a failing test does. In the current directory there are detailed client log files that can help, too.

Distribution

It’s your choice whether and how you publish your work. You can generate a source tar ball with the standard make dist and a binary package with the non-standard make distbin BINSUFFIX=foo. The later will produce syncevolution-<version>-foo.tar.gz with all the files necessary to run SyncEvolution, plus documentation. There’s also a make deb BINSUFFIX=foo. It depends on a CheckInstall which contains some patches that I wrote. When releasing source code make sure that your source tar ball is complete with make distcheck.

Just remember that you are linking with SyncEvolution which is licensed under “GPL v2 or later”, so you must publish your source under a compatible license if you redistribute SyncEvolution. As I said above, no copyright transfer is necessary for backends. For the core SyncEvolution or backends that I maintain I would like to keep the copyright situation unambiguous, so if you want to contribute to that, I kindly ask for an informal copyright transfer. You’ll be a granted full rights to your contributions, as if you still owned the copyright, of course.

Enough of this boring legalese. Happy hacking!

Post a Comment

Your email is never published nor shared. Required fields are marked *