OpenTSDB: a Time Series Database
My first introduction to NoSQL was with time-series databases. It began with OpenTSDB, a simple yet versatile solution I used to monitor the CERN Batch System.
1 Getting Started
The getting-started guide has you cover the following steps:
- Get gnuplot and a JDK (a JRE alone won't be enough – we need to compile stuff).
- Get and set up HBase which involves editing the
conf/hbase-site.xml
file, setting the root directory and the network interfaces sensibly before starting it up. - Check out, compile and start OpenTSDB.
You should be able to load the Web interface on http://127.0.0.1:4242.
2 Collecting Data
- Before you can do anything useful with OpenTSDB, you need to create your first metrics. It's something that's carried out with the
tsdb
executable, which is hiding in thebuild/
directory. To actually start collecting data, you need to write strings to the OpenTSDB server along the lines of:
put proc.loadavg.5m 1288946927 0.62 host=foo
3 Tags
There's a number of aspects to be wary of when writing strings to the OpenTSDB server to collect data. I once tried with the tsdb import
command to tag samples with different types of values:
foo.bar 1333369407 42 baz= |
✗ | An equal sign with no value at all is invalid. |
foo.bar 1333369407 42 baz="" |
✗ | An empty string is invalid. In fact, values don't have to and effectively can't be braced in (single or double) quotes. |
foo.bar 1333369407 42 |
✗ | A sample needs at least one tag. |
foo.bar 1333369407 42=fo*o |
✗ | Probably can't do this since * is used in queries to match all tags. Escaping it with a backslash doesn't help. |
foo.bar 1333369407 42 baz=boo |
✓ |
4 OpenTSDB Packaging
4.1 OpenTSDB RPM
In order to build an RPM for OpenTSDB, you need to:
Get the source from the upstream repository:
... as explained in the getting-started guide. Don'tgit clone git://github.com/stumbleupon/opentsdb.git
./bootstrap
just yet, we'll leave this torpmbuild
.- Make a compressed tarball of the cloned repository.
- You may need to create a patch to be applied in
%prep
to remove theAC_PROG_MKDIR_P
from theconfigure.ac
file. It doesn't appear to be needed and may not be available on all systems. Write a SPEC file skeleton:
rpmdev-newspec opentsdb
Edit the SPEC file. Don't forget to refer to the aforementioned patch. You may want to set
BuildRequires
to something likejava-1.6.0-sun-devel
andRequires
tohadoop-hbase
. As mentioned above,%build
will have to call./bootstrap
before it can%configure
andmake
. In addition tomake install
,%install
should copysrc/create_table.sh
because it will be useful later on.Run
rpmbuild
to raise the list of files which would be installed, so you can include this straight into the SPEC file.
So this RPM requires hadoop-hbase
. A convenient way of providing it is to rely on the Cloudera distribution. You'll find there are a few things that need changing in the files installed by the hadoop-hbase
RPM, though. You may want to edit the configuration file which will be supplied by this RPM. Also, the stop-hbase.sh
script repeatedly does a kill -0
on the Java process, which isn't much use.
4.2 OpenTSDB Init Script
I once wrote an init script which takes care of (re)starting/stopping HBase as well as OpenTSDB. It is important to make sure that the relevant Java processes are gone when you stop the services. You may have to wait for a while before this is the case, which is why doing service opentsdbd restart
may not be advisable.
When starting OpenTSDB, reading /var/log/opentsdb.log
will be useful. Many Connection refused
messages and exceptions will be printed out, but it doesn't mean it's not going to work in the end. What you have to look for is a TSDMain: Ready to serve
message, at which point OpenTSDB will really become available.
5 OpenTSDB Commands
OpenTSDB comes with the tsdb
command-Line tool which is very useful for carrying out various operations:
tsdb uid
lets you play with UIDs. You can for instance list existing metrics:tsdb uid grep metrics '.*'
You can rename them too:
Note that it's not possible to remove a metric. What is typically done is to rename it, prefixing it with an underscore.tsdb uid rename metrics foo bar
Although
tsdb query
andtsdb scan
look like they perform the same tasks, they behave differently. Roughly,tsdb query
acts similarly to the Web interface and aggregates metrics, whiletsdb scan
directly displays and can change the raw data. Note that whiletsdb scan
requires an aggregateFUNC
argument (probably only becausetsdb query
sensibly does) it should and seems to be ignored since no aggregation is actually performed.
6 Migrating Data
You can't apparently just copy HBase data files in hbase.rootdir
from one instance to another one, and add_table.rb
may not be much use either. The best way I've found to perform a migration is actually to do it on the OpenTSDB level with tsdb scan
(not tsdb query
which aggregates) in the source instance and tsdb import
in the destination instance.
7 Gnuplot Settings
There is a rather undocumented way of changing gnuplot settings in OpenTSDB. It involves modifying /usr/share/opentsdb/mygnuplot.sh
to have gnuplot read from a custom .gnuplot
file (typically in /usr/share/opentsdb
too):
exec nice gnuplot /usr/share/opentsdb/.gnuplot "$@" >"$stdout" 2>"$stderr"
The .gnuplot
file can then include commands like set bmargin 7
. Unfortunately, it's necessary to restart OpenTSDB after the change.