Bucket in OpenTSDB

In OpenTSDB, it introduces the notion of time bucket, basically a way of grouping all data points fall within the specific time bucket. OpenTSDB uses the hourly time bucket for partitioning of data with the cell value is either of type integer or float. One could easily extend the same design to other domains storing more complex data types, for example, complex Avro type.

Even better, different time buckets could be designed such as daily, weekly, and monthly buckets. The sky is the limit here.

In openTSDB, the timestamp is broken into two parts, encoded both in the row key and qualifier.

The first one part is the hourly basetime encoded in the row key and the other is the delta seconds from the hourly bucket encoded in the qualifier.

The following code snippet from OpenTSDB shows how the base_time (hour bucket) is computed.
final long base_time = timestamp – (timestamp % Const.MAX_TIMESPAN);

private long updateBaseTime(final long timestamp) {

// We force the starting timestamp to be on a MAX_TIMESPAN boundary

// so that all TSDs create rows with the same base time.  Otherwise

// we'd need to coordinate TSDs to avoid creating rows that cover

// overlapping time periods.

final long base_time = timestamp - (timestamp % Const.MAX_TIMESPAN);

// Clone the row key since we're going to change it.  We must clone it

// because the HBase client may still hold a reference to it in its

// internal datastructures.

row = Arrays.copyOf(row, row.length);

Bytes.setInt(row, (int) base_time, tsdb.metrics.width());

tsdb.scheduleForCompaction(row, (int) base_time);

return base_time;

}

Wei Shung Chung

Wei Shung Chung – Hadoop, HBase, MapReduce, Spark, Spark ML, Machine Learning, Deep Learning

Bucket in OpenTSDB

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply