Writing MySQL Storage Engines
InnoDB and MyISAM are the two most common storage engines available by default for MySQL. I was stunned to discover how easy it is to write custom ones too.
1 Context
I had to write a query system for extracting information from plain text files holding some personnel data. I struck me as obvious to use the almighty Structured Query Language – SQL – we all know and love, and for which there are already not only parsers, but complete systems such as relational databases.
So I set off to see if there was a way to bend MySQL to load data from plain text files, until I realised that this could mean writing a new storage engine. While it felt overly challenging at first, I soon realised it was as easy as exciting to do.
2 Writing a Storage Engine
The nice thing about storage engines is that, as of MySQL 5.1, they can be written as plug-ins. The most approachable way to tackle this is by studying the example provided in the storage/example
directory from the MySQL source distribution. Copy the whole directory to a new one and rename all occurrences of its name into a name of yours:
#!/usr/bin/tcsh
foreach i (*)
cat $i | sed s/EXAMPLE/E/g | sed s/example/e/g > $i
end
Let's see how a SELECT
query is handled on a table of 2 columns: 1 TINYINT
(1 byte) and 1 INT
(4 bytes). There's actually a cascade a function calls but the one we're interested in is rnd_next()
, which is called for each row to display in the result and whose job it is to write said row (according to a format) into the buffer it's passed as parameter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The format described above is made of a nullable column bitmap, followed by a sequence of values typed accordingly. The bitmap (line 7) is a sequence of at least one byte with a bit for each nullable column. And if you have from 9 to 16 nullable columns, you'll go with 2 columns, and so on. I assume that the bit points to the nullable-column but the documentation is rather unclear about that. The following typed values (lines 8 and 9) are coded according to their data type definitions. The TINYINT
is to take up 1 byte: it's to be a char
taking as much space in the output buffer buf
. Likewise, the INT
is to take up 4 bytes: it's to be an int
taking as much space in buf
. Interestingly, you can see in this very example that there seems to be no problem regarding the memory address parity of the integer: here, we've had the integer begin on an odd-numbered index (buf[2]
) but that seems to cause no problem despite the fact Intel architectures expect to lay about their integers on even-numbered addresses.
Lines 11 to 17 control the flow: you have to be careful here, because each fill-up of buf
requires to return 0 (line 16), not and end-of-file. Returning the end-of-file will turn up upon the next call to rnd_next()
(line 13).
3 New Storage Engine Compilation
There's one extra command to run on top of the normal configure
-make
-make install
build procedure which is necessary to take your new storage engine plug-in into account:
autoreconf --force --install --symlink
4 New Storage Engine Registration
Log in with sufficient privileges (something like mysql --user=root
will do) and issue the following queries to register the new storage engine. Create a fresh table using it (see the storage/example
directory from the MySQL source code distribution). We'll also run a sample SELECT
query on the table to check that the engine we wrote works properly:
mysql> INSTALL PLUGIN e SONAME 'ha_e.so';
mysql> CREATE TABLE test.foo (i TINYINT NOT NULL,
j INT NOT NULL)
ENGINE = e;
mysql> SELECT * FROM test.foo;
+----+----------+
| i | j |
+----+----------+
| 42 | 12648430 |
+----+----------+
1 row in set (0.00 sec)
5 Debugging
Fred Fish seems to be the debugging superstar in the MySQL world. The DBUG_PRINT
macro is particularly useful and can be used as simply as this way:
DBUG_PRINT("hullo", ("hullo");
Also see DBUG, coming with the MySQL source distribution, in the dbug
directory. You can for instance use this macro out of the box in the EXAMPLE
storage engine derived C++ file. You're then supposed to run mysqld
(_safe
) with the --debug
switch. The output of this macro will show up in the /tmp/mysqld.trace
file. (Note that there's no need to run the mysql
client with the --debug
switch).
6 References
- Writing a Custom Storage Engine
- Data Types
- mysql_debug(), which uses the Fred Fish debug library.