-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
I am working on setting up an OpenTSDB with rollups using version 2.4. Our use case is largely for near real time alerting (requiring sub one minute latency) but occasionally we want to query several months back on high-ish volume data. Rollups should offer us a great performance improvement here, so thank you for the work in that area.
We are working on creating a daily batch job to create hourly rollups. That means there will be a delay of up to one day (plus job runtime) to the rollup data being available. The delay means that any query served by the rollup table would be missing the latest data which is available in the raw table.
If we were stream processing to produce rollups, we would hit similar issues albeit with a lower latency. We likely wouldn't want the stream processor sending rollup updates as every new raw data point comes in, so there would be a delay in the stream processor producing data points while the rolled up data point "matures". That delay would cause a similar issue.
We wondered if it might be a reasonable idea to execute the query with some knowledge of what the latest available timestamps are in the rollup table. With that knowledge, a rollup query could be split up; populating the majority of data from rollup table, but still fall back to the raw table to create the latest data points, then finally joining both sets of data points together.
Does this seem like a reasonable feature or is it too complicated and we would be better off stream processing to reduce latency and just ignore the latest data point?