Gnus, Isync, Dovecot, and Lucene searches
For someone who is completely reliant on email for both his professional and personal lives, I seem to have a hell of a time getting my email environment just the way I like it. I use Emacs, with the Gnus newsreader, caching IMAP email locally with Dovecot.
In the course of setting up better full-text email search within Gnus, I’ve switched to a slightly more complex email setup, and am blogging for posterity.
My email environment
My basic requirements:
- I’m on a single-user computer, so would prefer not to create a bunch
of system-level users just to read my email. Don’t put my mail in
/home/vmail
etc., as detailed in some Dovecot setups. - I have multiple email addresses.
- A depressing number of these email addresses are Gmail.
- I want all messages for all accounts stored under a single location in my own home directory, for ease of encryption, backup, etc.
- I want to search like a boss, and I want to do it in Chinese.
The problem
A couple of years ago I set things up largely based on this blog post, which is offlineimap-specific, but fairly easy to tailor to mbsync. This is the bare-bones easiest way to use Dovecot: Dovecot doesn’t run as a server, and only wakes up when it is called by either mbysnc or gnus. Getting this working required having Dovecot installed on the system, and nothing else: zero configuration.
This worked great for ages, but only because I didn’t care about IMAP search. I did my searching using a Notmuch search index that existed in parallel to my IMAP installation. That started to prove annoying, however, because while Notmuch is an awesome search tool, it doesn’t integrate well with Gnus by default. Getting from Notmuch search results to the “real” messages in a Gnus summary buffer requires a hack, and it all just felt wrong.
Searching in Gnus is ideally done with the nnir meta-search engine: it searches messages from the server under point, based on the search engine you’ve configured for that server, and creates a native summary buffer – it feels just like regular Gnus, regardless of how the messages were collected.
There are two problems with using nnir to search IMAP: 1) Native IMAP search is dog slow, and 2) the nnir’s query syntax for IMAP is oddly limited. This blog post will address problem one; problem two might have to wait.
I started looking into Dovecot-based full-text search indexes, and Lucene presented itself as the simplest solution, though I don’t need Java so it was the clucene C++ version that made the most sense.
The problem is that full text search (FTS) with Lucene seems to require a running Dovecot daemon (if I’m wrong please email eric at this domain name and tell me!). So all of a sudden we’re going from our beautiful bare-bones Dovecot setup, to requiring an actual running daemon, with configuration and everything.
So be it! But the goal is to minimize configuration, because really I was perfectly happy with the original no-configuration arrangement, I just want to add FTS.
Dovecot
Dovecot setup instructions often seem to assume one email account per
system user, and multiple users per machine. Many of us are in the
opposite situation, however: a single-user computer, with multiple
email accounts. Dovecot has the concept of virtual users, which is
fairly well suited to this situation. The following is the basic
/etc/dovecot/dovecot.conf
file, as simple as possible:
protocols = imap listen = *, :: log_path = /var/log/dovecot.log info_log_path = /var/log/dovecot-info.log ssl = no disable_plaintext_auth = no auth_verbose = yes auth_mechanisms = plain passdb { driver = passwd-file args = /etc/dovecot/passwd } userdb { driver = static args = uid=eric gid=users home=/home/eric/.mail/%d/%n default_fields = mail=maildir:/home/eric/.mail/%d/%n/mail } mail_plugins = $mail_plugins fts fts_lucene plugin { fts = lucene fts_lucene = whitespace_chars=@. fts_autoindex = yes }
The upshot of all all this is, I’m creating only virtual users, no system users. I did not create a dovecot user, nor a dovenull user, nor a vmail user, or anything else the HOWTOs tell you to do. I’m the only user on my system, and I can do without those. Dovecot is flexible.
The “passwd” section specifies a file where I’ve stored user
information: ie, the username and (local-use only) password for each
of my email addresses. The contents of my /etc/dovecot/passwd
look
like:
eric@ericabrahamsen.net:{PLAIN}passwurd eric@paper-republic.org:{PLAIN}prasswowrdy2 info@paper-republic.org:{PLAIN}plasswsword [etc]
You might like to use a better authentication mechanism than PLAIN, see this page for options. If you use a different mechanism, you might be need to change the auth_mechanisms entry in dovecot.conf.
Then the ‘userdb’ section says where each of these accounts keeps its mail, and the ownership of those files. The args and default_fields stuff is opaque to me, but specifying the values this way works.
Because all of the accounts belong to my user, the uid and gid
correspond to my system user. The home directories for each account
are under ~/.mail
folder, in directories that look like
domainname/user (specified by the “%d/%n” escapes). The home
directories hold more than just the mail, they hold Dovecot’s index
files, the uidvalidity stuff, and the Lucene indexes – the whole
point of this exercise to begin with.
On Archlinux, start the server (and set it up for automatic restart) with:
$ sudo systemctl start dovecot $ sudo systemctl enable dovecot
Adjust for your distribution.
Isync
Dovecot is done, so now we move to the ~/.mbsyncrc
. Here’s the account
configuration for one address:
IMAPAccount ea Host imap.gmail.com User eric@ericabrahamsen.net PassCmd "/usr/bin/pass email/ea" # retrieves the remote password UseIMAPS yes CertificateFile /etc/ssl/certs/ca-certificates.crt IMAPStore ea-remote Account ea IMAPAccount ea-dovecot RequireSSL no Host localhost User eric@ericabrahamsen.net Pass passwurd # local password I don't care much about UseIMAPS no UseTLSV1 no IMAPStore ea-local Account ea-dovecot Channel ea Master :ea-remote: Slave :ea-local: Patterns * !"[Gmail]/All Mail" Create Both
We’re good to go! That’s enough to run “mbsync ea” in the terminal, and get a complete sync of messages from the server.
Note that, because the dovecot config file activate the FTS plugin and sets fts_autoindex to “yes”, the simple act of syncing mail with the server will also create a local full text search index of mail. You don’t have to do anything else, or worry about keeping it up to date.
Gnus
Now we configure Gnus similarly:
(nnimap "EA" (nnimap-stream network) (nnimap-address "localhost") (nnimap-authenticator login) (nnimap-user "eric@ericabrahamsen.net"))
You’ll probably have other server parameters in there, but that’s
enough to get going. The first time you sync in Gnus, it will ask you
for the local password (“passwurd”, in this case), and prompt to save
it in ~/.authinfo
. Because I don’t care much about this password, I
leave it saved plain in that file. You could choose to GPG encrypt it.
Create one server entry for each of your addresses.
Searching
Now we search in Gnus using nnir: “G G” on a group name, or on several marked groups, or on a topic heading. Or just “G” on a server name in the Server buffer.
Actually, this is where things fall down just a little bit. Indexing is painless and searches are fast, but there are two remaining problems:
The first is that nnir search syntax for imap searches is weird. By
default it searches on only one field (which you choose with
nnir-imap-default-search-key
), or, with a prefix arg, allows you to
select a different field to search on. If you want to search multiple
fields, you have to fall back to raw imap search syntax, which is
cumbersome. The whole thing is awkward, but will eventually get
addressed.
A potentially bigger issue is encoding in searches. The Lucene index assumes utf-8 encoding for all your emails, and in a perfect world, that would be enough. Many emails come in different encodings, however, and/or are base-64 munged. I and others have found that Lucene isn’t indexing messages properly, however, and some encoded strings in message headers and bodies aren’t located by searches. Some people run filters in the indexing process so that the messages are converted to utf-8 before they’re indexed. So far I’m just ignoring this problem – I’ve been bitten by it very rarely.
The third problem is a combination of the first two: if you want to search for non-ascii strings via an IMAP server’s SEARCH command, there are two ways to enter the string. Most servers (including Dovecot, but possibly not Gmail?) let you do it by enclosing the string in double quotes (see RFC-2060), which you simply enter as part of the nnir search.
Servers that don’t support this can search for non-ascii strings using a fairly complicated system of feeding literal search strings to the server, along with the number of bytes in the string. Gnus doesn’t currently support this, though I have a patch that partially addresses it.
Obviously, searching imap via nnir isn’t quite there yet. Over the next few months, I’m hoping it will make a little progress…