OpenTSDB: a Time Series Database

12 Jan 2013

1 Getting Started
2 Collecting Data
3 Tags
4 OpenTSDB Packaging
- 4.1 OpenTSDB RPM
- 4.2 OpenTSDB Init Script
5 OpenTSDB Commands
6 Migrating Data
7 Gnuplot Settings
8 References

My first introduction to NoSQL was with time-series databases. It began with OpenTSDB, a simple yet versatile solution I used to monitor the CERN Batch System.

1 Getting Started

The getting-started guide has you cover the following steps:

Get gnuplot and a JDK (a JRE alone won't be enough – we need to compile stuff).
Get and set up HBase which involves editing the conf/hbase-site.xml file, setting the root directory and the network interfaces sensibly before starting it up.
Check out, compile and start OpenTSDB.

You should be able to load the Web interface on http://127.0.0.1:4242.

2 Collecting Data

Before you can do anything useful with OpenTSDB, you need to create your first metrics. It's something that's carried out with the tsdb executable, which is hiding in the build/ directory.
To actually start collecting data, you need to write strings to the OpenTSDB server along the lines of:
```
put proc.loadavg.5m 1288946927 0.62 host=foo
```

3 Tags

There's a number of aspects to be wary of when writing strings to the OpenTSDB server to collect data. I once tried with the tsdb import command to tag samples with different types of values:

`foo.bar 1333369407 42 baz=`	✗	An equal sign with no value at all is invalid.
`foo.bar 1333369407 42 baz=""`	✗	An empty string is invalid. In fact, values don't have to and effectively can't be braced in (single or double) quotes.
`foo.bar 1333369407 42`	✗	A sample needs at least one tag.
`foo.bar 1333369407 42=fo*o`	✗	Probably can't do this since `*` is used in queries to match all tags. Escaping it with a backslash doesn't help.
`foo.bar 1333369407 42 baz=boo`	✓

4 OpenTSDB Packaging

4.1 OpenTSDB RPM

In order to build an RPM for OpenTSDB, you need to:

Get the source from the upstream repository:
```
git clone git://github.com/stumbleupon/opentsdb.git
```
... as explained in the getting-started guide. Don't ./bootstrap just yet, we'll leave this to rpmbuild.
Make a compressed tarball of the cloned repository.
You may need to create a patch to be applied in %prep to remove the AC_PROG_MKDIR_P from the configure.ac file. It doesn't appear to be needed and may not be available on all systems.
Write a SPEC file skeleton:
```
rpmdev-newspec opentsdb
```
Edit the SPEC file. Don't forget to refer to the aforementioned patch. You may want to set BuildRequires to something like java-1.6.0-sun-devel and Requires to hadoop-hbase. As mentioned above, %build will have to call ./bootstrap before it can %configure and make. In addition to make install, %install should copy src/create_table.sh because it will be useful later on.

Run rpmbuild to raise the list of files which would be installed, so you can include this straight into the SPEC file.

So this RPM requires hadoop-hbase. A convenient way of providing it is to rely on the Cloudera distribution. You'll find there are a few things that need changing in the files installed by the hadoop-hbase RPM, though. You may want to edit the configuration file which will be supplied by this RPM. Also, the stop-hbase.sh script repeatedly does a kill -0 on the Java process, which isn't much use.

4.2 OpenTSDB Init Script

I once wrote an init script which takes care of (re)starting/stopping HBase as well as OpenTSDB. It is important to make sure that the relevant Java processes are gone when you stop the services. You may have to wait for a while before this is the case, which is why doing service opentsdbd restart may not be advisable.

When starting OpenTSDB, reading /var/log/opentsdb.log will be useful. Many Connection refused messages and exceptions will be printed out, but it doesn't mean it's not going to work in the end. What you have to look for is a TSDMain: Ready to serve message, at which point OpenTSDB will really become available.

5 OpenTSDB Commands

OpenTSDB comes with the tsdb command-Line tool which is very useful for carrying out various operations:

tsdb uid lets you play with UIDs. You can for instance list existing metrics:
```
tsdb uid grep metrics '.*'
```
You can rename them too:
```
tsdb uid rename metrics foo bar
```
Note that it's not possible to remove a metric. What is typically done is to rename it, prefixing it with an underscore.
Although tsdb query and tsdb scan look like they perform the same tasks, they behave differently. Roughly, tsdb query acts similarly to the Web interface and aggregates metrics, while tsdb scan directly displays and can change the raw data. Note that while tsdb scan requires an aggregate FUNC argument (probably only because tsdb query sensibly does) it should and seems to be ignored since no aggregation is actually performed.

6 Migrating Data

You can't apparently just copy HBase data files in hbase.rootdir from one instance to another one, and add_table.rb may not be much use either. The best way I've found to perform a migration is actually to do it on the OpenTSDB level with tsdb scan (not tsdb query which aggregates) in the source instance and tsdb import in the destination instance.

7 Gnuplot Settings

There is a rather undocumented way of changing gnuplot settings in OpenTSDB. It involves modifying /usr/share/opentsdb/mygnuplot.sh to have gnuplot read from a custom .gnuplot file (typically in /usr/share/opentsdb too):

exec nice gnuplot /usr/share/opentsdb/.gnuplot "$@" >"$stdout" 2>"$stderr"

The .gnuplot file can then include commands like set bmargin 7. Unfortunately, it's necessary to restart OpenTSDB after the change.