Jérôme Belleman
Home  •  Tools  •  Posts  •  Talks  •  Travels  •  Graphics  •  About Me

Databases vs. Plain Text Files

14 Jun 2015

Are databases the answer to everything? When is it more convenient to use plain text files? And which are the scenarios in which they are just safer?

There was a time when I wanted everything to be a database. I would find whichever excuse I needed to back my programs with DBs. It didn't really matter if they were hosted in DB servers as is typically the case with Oracle, MySQL or PostgreSQL. Or if they were stored in simple local files as with SQLite. Or even if they were relational DBs or NoSQL DBs. So long as they were DBs.

And then I became lazier, started to use plain text files more and more and found that, simple though this might sound, this method offers some unique opportunities.

1 When to Not Use DBs

DBs are probably a good idea when dealing with massive amounts of data which need indexing and efficient storage. Very often, though, you don't deal with such large amounts of data, but you could do with the comfort of having human-readable data serialisation.

This allows you to read and edit your data files with any tool you like, at any stage of software development or production. It makes it really easy to edit/fix data manually, even doing so over large numbers of records thanks to on-the-fly programmable editors such as Vim.

Another interesting application of plain-text files storage back-ends is backups and version control. I've once had the case of a task manager I wrote where I used JSON files to store tasks. During development, I chose to commit each change into a git repository, to make sure that no data corruption could occur: this allowed me to diff changes and rewind however I liked it.

What I didn't expect, however, is that it turned out to be so convenient and so reassuring that I eventually chose to automatically commit all task changes into a git repository. This even allowed me to add features for free to distribute and sync task data between different devices, pretty much only by using git push and git pull.

2 Human-Readable Format

There is a number of human-readable formats available. I quite like using YAML and JSON, although I occasionally find it useful to write one of my own.

2.1 YAML

YAML is blissful to read and write. You can work with it e.g. with PyYAML. However, YAML expects special characters to use directives, repeated nodes and other features. Most of the time I found myself happier with only lists and dictionaries and a YAML parser that doesn't try to be more sophisticated than needed:

import yaml

with open('foo.yaml') as fhl:
    cfg = yaml.load(fhl, Loader=yaml.BaseLoader)

2.2 JSON

Not much to say there, except that it's perhaps not as human-readable/-writeable as YAML but possibly safer to use when dealing with funny characters. You can easily browse a JSON file with Vim by folding it, after making sure it's got the expected structure with =ap (“indent a paragraph”):

equalprg=jq\ . foldmethod=indent nofoldenable

Working with JSON in Python is a doddle. There are other JSON modules for Python which might be faster than the bundled one.

3 References