(toiminnot)

hwechtla-tl: Migration to cyrus

Kierre.png

Mikä on WikiWiki?
nettipäiväkirja
koko wiki (etsi)
viime muutokset


Warning: this document is incomplete due to lack of time. My apologies.

This page tries to document my efforts in mass migrating email accounts to Cyrus mail. Although the 'net has many documents similar to this one, I still thought it would be worthwhile to add my own contribution.

There seems to be no document to address all possible kinds of migration to cyrus, nor any general-purpose tool to make a migration. These lacks stem from the same reason: the scenarios are so diverse and the ways of migrating the data so many that can be no single utility to suit all (or even most) users' needs.

However, I believe my scenarios are pretty common ones.

Migrating traditional Unix mbox mail to Cyrus

There are three fundamental approaches to this one:

(1) extract the messages in users' mbox files and copy them to Cyrus' mail store. This is the fastest one, but there are problems. First, the source mboxes and the Cyrus mail spool must obviously be accessible on the same system. Second, you have to be aware of the arrangement of the Cyrus mail spool, which is not the same on different Cyrus systems.

"Managing IMAP" from O'reilly has a long subsection for this kind of migration. Another place where I found utilities for this was http://acs-wiki.andrew.cmu.edu/twiki/bin/view/Cyrus/MboxCyrusMigration , or, more officially, http://www.onid.orst.edu/docs/technical/cyrusmigration.shtml .

(2) extract the messages in users' mbox files and send them to Cyrus side via IMAP. This is slower than (1), but leaves Cyrus as the black box it is supposed to be. mailutil (from UW mailutils, http://www.washington.edu/imap/) is able to copy data between different kinds of mail accounts. (mailsync can too, but is less scriptable.)

A similar approach is the "first Cyrus migration script ever", which processes mailboxes with formail and calls deliver (or cyrdeliver) for each message. Using deliver has the con of changing message headers.

(3) only use IMAP to transfer the data. This is generally even slower than (2), but there is an abundance of smart tools, like imapsync (http://www.linux-france.org/prj/imapsync/) to do the migration in small pieces. However, you would have to install some mbox-aware imapd and it should have some quirk for accessing all users' messages with a single "admin account". (Actually, it seems uw-imapd can do this; check http://www.mail-archive.com/info-cyrus@lists.andrew.cmu.edu/msg14292.html for details.)

I'm going for (2). The order of things is:

  1. find out which mailboxes to migrate and make a mapping from mailbox names to Cyrus folders
  2. create mailboxes on Cyrus' side
  3. move messages from mailboxes to Cyrus folders, while the system is online
  4. move account credentials to the Cyrus server
  5. prepare downtime for the system, and syncronise changes after the initial migration

Step 1 requires listing the users receiving mail on the system. For me, that is every non-system user on the host. Then we create a list of (mbox, cyrusfolder) pairs for every user:

grep /bin/bash /etc/passwd | cut -d: -f1 | \
while read user; do
 if test -f /var/mail/$user; then
  echo "/var/mail/$user:user.$user"
 fi
 if test -d /home/$user/mail; then
  find /home/$user/mail -type f | \
  while read folder; do
   echo -n "$folder:"
   echo $folder | sed -e "s#/home/$user/mail/#user.$user.#" -e 's#/#.#g'
  done
 fi
done > mailboxes

On our server, there are a lot of mail folders right under users' home directories. I searched for those with the following script, for symlinking under mail/. Note that a file having only a single From line is probably not a folder, but a saved message:

find /home -type f -maxdepth 2 | \
sed -e '/interrupted-mail/d' -e '/dead.letter/d' -e '/\/[0-9]*$/d' | \
while read file; do
 (head -1 "$file" | grep -q '^From ') || continue
 (sed -e 1d "$file" | grep -ql '^From ') || continue
 echo "$file"
done

In step 2, we create cyradm commands from the mailboxes list. I'm using sed for this; you might want to use something else. (I've set up a cyrus account named "migration" which is to be used in the migration. We use a quota of 150M.)

#!/bin/sed -f
$a\
sam user.* migration lrid\
sam user.% anyone p
h
s/^.*:\([^:]*\)$/cm "\1"/p
g
/var\/mail/s/^.*:\([^:]*\)$/sq "\1" 150000/p
d

The generated script can be run with (a relatively recent) cyradm like so:

cyradm -u cyrus --userrc <scriptfile> hostname

For step 3, we must create an appropriate .mailsync file.


A digression about libc-client and Debian

Both mailutil and mailsync use shared libc-client library to access mailboxes and folders. The version that ships in Debian (which is from uw-imapd distribution) has a patch (debian/patches/10_disallow_escaping_home.diff) which breaks libc-client totally for this kind of use. The patch supposedly makes sense within the context of uw-imap (although FAQ warns against using restrictBox) as it improves the security of the server. But libc-client is a shared library, so this kind of hacks which, mind you, can be done by runtime options, should not be done by patching the source. Not a word in c-client's Debian documentation, either.

Finding exactly where the problem is has cost me six or more working hours.

Not that libc-client itself is very well documented. This behaviour is only hinted at in the distribution's imaprc.txt, and the description there makes exceptions to drivers.txt and naming.txt (which users are more likely to read because of the absence of three-times-screen-height warnings). If you want to really know how libc-client's naming mechanisms work (for local files), check function mailboxfile() in src/osdep/unix/env_unix.c.

Now that I've begun ranting, I could also mention that libc-client's error message for a mailbox with a forbidden name is "no such mailbox", while trying to access a mailbox to which you don't have permissions does not generate error messages at all. "mailutil check /path/to/somebody/elses/file" says: "1 new message(s) (1 unseen), 1 total in /path/to/somebody/elses/file" (which is not true).


A digression about mailsync

mailsync seems to try to find out the hierarchy delimiter on the remote side, by using libc-client's obscure callback mechanism. (I'm not sure how libc-client is supposed to find out.) This process seems to fail under some circumstances, and mailsync's error message is: "mailsync: mailsync_main.cc:129: int main(int, char**): Assertion `0' failed." I think it would be easier to just set this in the configuration file. Here is a patch to allow that:

--- mailsync-5.2.1/src/configuration.cc 2004-06-14 14:34:04.000000000 +0300
+++ mailsync-5.2.1.new/src/configuration.cc     2005-08-24 10:47:56.403350944 +0300
@@ -188,6 +188,10 @@
           get_token(f, t);
           store->set_passwd(t->buf);
         }
+       else if (t->buf == "delim") {
+         get_token(f, t);
+         store->delim = t->buf[0];
+       }
         else
           die_with_fatal_parse_error(t, "Unknown store field");
       }

mailsync tries to convert delimiters by mapping them into DEFAULT_DELIMITER upon reading mailbox names and mapping DEFAULT_DELIMITERs back into store-specific delimiters upon forming full mailbox names. The mechanism is protected against creating a bogus delimiter upon reading, but not upon writing. That is, you can't sync mailboxes in a Cyrus store that contain "/" (DEFAULT_DELIMITER), but you can sync mailboxes in a UW imap store that contain "." into a Cyrus store -- creating a different hierarchy structure. If this bugs you, grep the source code for DEFAULT_DELIMITER and change the files c-client_callbacks.cc and store.cc to suit your needs.

mailsync is also very slow, it apparently opens a new IMAP/SSL connection for every folder it transfers, which performs horribly for a great number of small folders. offlineimap would probably be faster, but it is only capable of handling local mail in Maildir format.


I used a script like this to create the mailsync configuration file:

#!/bin/sh

SERVER=your.server.name
USER=migration
PASS=XXXX

cat <<EOT
store cyrus-inboxes { server {$SERVER/user=$USER/ssl/novalidate-cert}
ref {$SERVER} pat user.% prefix user. passwd $PASS delim . }
store local-inboxes { pat /var/mail/% prefix /var/mail/ }
channel inboxes local-inboxes cyrus-inboxes { msinfo inboxes.sync }
EOT

for account in `grep /bin/bash /etc/passwd | cut -d: -f1`; do
cat <<EOT
store cyrus-$account { server {$SERVER/user=$USER/ssl/novalidate-cert}
ref {$SERVER} pat user.$account.* prefix user.$account. passwd $PASS delim . }
store local-$account { pat /home/$account/mail/* prefix /home/$account/mail/ }
channel $account local-$account cyrus-$account { msinfo $account.sync }
EOT
done

Then there is this script, which I use to read in mailsync.cfg (the output of the previous script) and invoke mailsync with the right arguments. There is some random parallelisation because mailsync has many unneeded latencies and this was the easiest way I could think of to deal with that. (This script, by the way, might not work under anything but bash. Comments welcome.)

for channel in `grep channel mailsync.cfg | cut -d' ' -f2`; do
 if test "$channel" = inboxes -o `expr $RANDOM % 3` != 0; then
  time mailsync -n -f mailsync.cfg $channel &
 else
  time mailsync -n -f mailsync.cfg $channel
 fi
done

(to be continued)

Migrating between Cyrus servers

Here again, there are two possible approaches. Either migrate the data on the filesystem side, or use IMAP to transfer the data. Filesystem transfer is faster, but IMAP transfers can benefit from sophisticated tools and incremental migration.

...

This place documents (partially) an approach that uses rsync to transfer the data but synchronises with Cyrus so that the technique can be used on a live system: http://tomster.org/blog/archive/2004/11/03/migrating-cyrus . The blog has a lot of other information on Cyrus, too (http://tomster.org/blog/topics/cyrus).

...

kategoria: projektit kategoria: työkalut


Pikalinkit:


kommentoi (viimeksi muutettu 19.06.2006 00:48)