Querying Mail from the Command Line

16 Apr 2018

1 Context
2 GNU Mailutils
3 Python' imaplib Module
- 3.1 Retrieving Messages Faster
- 3.2 Kerberos Authentication?
4 Reference

If mutt is great for many mail-related operations, being an interactive tool makes it clumsy for some of them. For instance, how do you list all the senders?

1 Context

And once you get hold of a list of all the senders who ever wrote to you, how do you sort them, then make them unique, à la | sort | uniq? Mutt will allow you to sort by sender, but odds are your window will never be long enough to list them all and let you copy the contents of your terminal for further processing.

Luckily, there are command-line – non-interactive – tools which can query your mailboxes. A collection of such tools comes for instance with the GNU Mailutils suite.

2 GNU Mailutils

2.1 Listing Headers

Mailutils comes with several commands to perform various operations on a mailbox. There is of course the traditional mail command for reading and sending mail. You can count messages with messages, move mail across mailboxes with movemail, read specific messages with readmsg and filter them with sieve. I could go on paraphrasing the Debian description of the mailutils package but I think I'll just stop with frm and from – they are the ones I want to further discuss here.

They both just list some headers, such that you can further grep or awk about to taste. Run without any arguments, they do the same thing. But if their purposes overlap considerably, the options they take can make them behave fairly differently. The most noticeable difference is that from is the only one letting you filter by sender. However, only frm can filter by attributes such as new, unread, old or read.

Despite their names and short descriptions, they can print more than just the sender. They display the subject, they can show the recipients – in fact, frm comes with the --field switch so you can specify the field to display. The bit of documentation dedicated to this command doesn't elaborate as to what fields work, but trying the ones you see from headers as mutt would would show them when you hit h appears to be working. However, when I say that the --field option lets you specify the field to print out, I meant that literally singular, as I there doesn't seem to be a way to show multiple fields. In particular, these commands won't work as you'd expect:

frm -f To Message-ID # Exits without doing anything
frm -f 'To Message-ID' # Prints empty lines
frm -f To,Message-ID # Prints empty lines
frm -f To -f Message-ID # Only prints message IDs

2.2 Configuration

Mailutils commands are configured using a common style in files matching their command names. For instance, frm is configured with the ~/.frm file and from with the ~/.from file. Their purpose being similar and their configuration identical for basic usage, I couldn't resist running ln -s ~/.frm ~/.from. A bare-bones configuration file could look like:

mailbox {
  mailbox-pattern imaps://fred@example.com/INBOX;
}

2.3 Kerberos Authentication?

I commonly have the use case of working with IMAP servers and using this configuration will work out of the box. For instance, running frm without any arguments at all will be enough to prompt you for a password and list from message headers. Mailutils is advertised in various places in the documentation to support Kerberos. Unfortunately, there was no way I could convince either frm or from to authenticate me using this method. Not even recompiling them, making sure GSSAPI was a compile-time option, and in spite of frm --show-config-options being adamant that GSSAPI support was enabled. And frustratingly, frm --debug knows it's aware the server is AUTH=GSSAPI capable, but will pay lip service to it anyway. Looking at the source code, it's all down to folder.c running the mu_url_get_secret() function, which gives the impression it's made to retrieve a password that was previously supplied by the user in another Mailutils file – but not a Kerberos ticket.

2.4 Speed

Running frm against a 1k-messages mailbox causes it to start listing messages little by little after a few seconds, taking a few minutes to go through all of them – fair enough. However, attempting to have it retrieve my main, 100k inbox lead to a dead end. It seemed the command was so overwhelmed with the task that it couldn't even bring itself to print anything out for over 15 minutes. After this, it trudgingly went about listing only about 600 senders before getting stuck again for longer than I could be bothered to wait. And this is where I altogether started considering a different approach to the business of non-interactive mail header retrieval.

3 Python' imaplib Module

3.1 Retrieving Messages Faster

Rather than looking for an alternative to Mailutils, I thought I'd give the standard Python imaplib module a go. After all, writing Python is a relatively high-level task, so much so that it's often less work than running commands from a shell.

import imaplib

imap = imaplib.IMAP4_SSL('imap.example.com')
imap.login(username, password)
imap.select('INBOX', readonly=True)

The first step consists of connecting (securely) to the server, supplying a username and password and selecting a mailbox to work with. It turns out INBOX would have been the default one selected, but calling select() for the sake of setting the readonly flag seemed like a good idea.

messages = imap.search(None, 'ALL')[1][0].split()
last = int(messages[-1])

The search() function returns a tuple made of:

The search success status.
A list made of a single, potentially very long string of space-separated message numbers. It does look a little strange to have a list of one single string, rather than just the string itself, something I've seen with several IMAP servers.

Hence the [1][0] index and the split() call to turn the result into a proper Python list of message numbers. In my use case here, I'm interested in all the messages in the mailbox, which is why I set the criterion to ALL.

chunk = 1000
for start in range(1, last, chunk):
    end = start + chunk - 1
    if end > last:
        end = last

    for message in imap.fetch('%d:%d' % (start, end),
                              'BODY[HEADER.FIELDS (from subject)]')[1]:
        if isinstance(message, tuple):
            print message[1].strip()

So as to behave a bit more helpfully than frm, I'm fetching messages in small chunks which I print as I go along. The message set expected as first argument by fetch() can be a comma-separated list of message numbers or, more usefully still, a colon-separated range. Note that the upper bound of the range cannot be (much) more than the maximum message number, hence the end > last condition setting end = last if true. The fetch() function returns a tuple made of:

The fetch success status.
A rather strange list of alternating:
- Tuples which hold:
  - The message number and the names of the message parts you requested.
  - The message parts you requested.
- The rather moot string of a closing parenthesis.

Hence the [1] index and the isinstance(message, tuple) test. Let's spend a minute looking at how to write message part names. They are called message data items, in RFC 3501 parlance, and can be either:

Atoms – one or more non-special characters. A string, in essence.
Parenthesised lists – space-separated sequences of items.
Macros – short-hands for specific parenthesised lists.

The FETCH Command section of the RFC 3501 describes the various data items that can be fetched, and you'll find that the senders and subjects are header fields of the body. So the data item defined as BODY[<section>]<<partial>> comes into play. The <section> will be HEADER.FIELDS, and because of this the <<partial>> will be a parenthesised list of field names. Again, Mutt can easily show you sample header fields hitting h , but I can understand you'd rather read the RFC 2822 Field definitions section instead if you're desperate for some light bedtime reading.

This little script lists the senders and subjects of my 100k messages in about 11 minutes, an order of magnitude faster than frm and from.

3.2 Kerberos Authentication?

I'd already be sold on the Python imaplib module for its ferocious speed. However, it doesn't support Kerberos authentication out of the box. Yves Fischer suggests a Python imaplib kerberos mixin which undoubtedly does the business.

4 Reference

GNU Mailutils is the homepage of frm, from and those other commands.
imaplib — IMAP4 protocol client describes the Python interface, carefully avoiding going into too much useful details about the IMAP protocol itself.
Python — imaplib IMAP example with Gmail wonderfully complements the Python imaplib module page.
imaplib – IMAP4 client library – Python Module of the Week gives another very good hands-on guide to the Python imaplib module.
FETCH Command is the official reference to what you can fetch about a message with the IMAP protocol.
Field definitions is the official reference to header fields supported by the IMAP protocol.
Manual IMAP offers a more bite-sized and pragmatic description of the IMAP protocol.
Python imaplib kerberos mixin explains how to authenticate with Kerberos using the Python imaplib module.