Jérôme Belleman
Home  •  Tools  •  Posts  •  Talks  •  Travels  •  Graphics  •  About Me

Writing MySQL Storage Engines

27 Dec 2009

InnoDB and MyISAM are the two most common storage engines available by default for MySQL. I was stunned to discover how easy it is to write custom ones too.

1 Context

I had to write a query system for extracting information from plain text files holding some personnel data. I struck me as obvious to use the almighty Structured Query Language – SQL – we all know and love, and for which there are already not only parsers, but complete systems such as relational databases.

So I set off to see if there was a way to bend MySQL to load data from plain text files, until I realised that this could mean writing a new storage engine. While it felt overly challenging at first, I soon realised it was as easy as exciting to do.

2 Writing a Storage Engine

The nice thing about storage engines is that, as of MySQL 5.1, they can be written as plug-ins. The most approachable way to tackle this is by studying the example provided in the storage/example directory from the MySQL source distribution. Copy the whole directory to a new one and rename all occurrences of its name into a name of yours:

#!/usr/bin/tcsh
foreach i (*)
    cat $i | sed s/EXAMPLE/E/g | sed s/example/e/g > $i
end

Let's see how a SELECT query is handled on a table of 2 columns: 1 TINYINT (1 byte) and 1 INT (4 bytes). There's actually a cascade a function calls but the one we're interested in is rnd_next(), which is called for each row to display in the result and whose job it is to write said row (according to a format) into the buffer it's passed as parameter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int ha_pse::rnd_next(uchar *buf)
{
    static int loop = 0;
  
    DBUG_ENTER("ha_pse::rnd_next");
  
    buf[0] = 0;
    buf[1] = 42;
    *(int *) (&buf[2]) = 12648430;
  
    if (loop) {
        loop = 0;
        DBUG_RETURN(HA_ERR_END_OF_FILE);
    } else {
        loop = 1;
        DBUG_RETURN(0);
    }
}

The format described above is made of a nullable column bitmap, followed by a sequence of values typed accordingly. The bitmap (line 7) is a sequence of at least one byte with a bit for each nullable column. And if you have from 9 to 16 nullable columns, you'll go with 2 columns, and so on. I assume that the bit points to the nullable-column but the documentation is rather unclear about that. The following typed values (lines 8 and 9) are coded according to their data type definitions. The TINYINT is to take up 1 byte: it's to be a char taking as much space in the output buffer buf. Likewise, the INT is to take up 4 bytes: it's to be an int taking as much space in buf. Interestingly, you can see in this very example that there seems to be no problem regarding the memory address parity of the integer: here, we've had the integer begin on an odd-numbered index (buf[2]) but that seems to cause no problem despite the fact Intel architectures expect to lay about their integers on even-numbered addresses.

Lines 11 to 17 control the flow: you have to be careful here, because each fill-up of buf requires to return 0 (line 16), not and end-of-file. Returning the end-of-file will turn up upon the next call to rnd_next() (line 13).

3 New Storage Engine Compilation

There's one extra command to run on top of the normal configure-make-make install build procedure which is necessary to take your new storage engine plug-in into account:

autoreconf --force --install --symlink

4 New Storage Engine Registration

Log in with sufficient privileges (something like mysql --user=root will do) and issue the following queries to register the new storage engine. Create a fresh table using it (see the storage/example directory from the MySQL source code distribution). We'll also run a sample SELECT query on the table to check that the engine we wrote works properly:

mysql> INSTALL PLUGIN e SONAME 'ha_e.so';
mysql> CREATE TABLE test.foo (i TINYINT NOT NULL,
                              j INT NOT NULL)
                             ENGINE = e;
mysql> SELECT * FROM test.foo;
+----+----------+
| i  | j        |
+----+----------+
| 42 | 12648430 |
+----+----------+
1 row in set (0.00 sec)

5 Debugging

Fred Fish seems to be the debugging superstar in the MySQL world. The DBUG_PRINT macro is particularly useful and can be used as simply as this way:

DBUG_PRINT("hullo", ("hullo");

Also see DBUG, coming with the MySQL source distribution, in the dbug directory. You can for instance use this macro out of the box in the EXAMPLE storage engine derived C++ file. You're then supposed to run mysqld(_safe) with the --debug switch. The output of this macro will show up in the /tmp/mysqld.trace file. (Note that there's no need to run the mysql client with the --debug switch).

6 References