This proposal describes addition of a new Thanos command (component) into cmd/thanos
called query-frontend
This component will literally import a certain version of Cortex frontend package.
We will go through rationales, and potential alternatives.
Cortex Frontend was introduced by Tom in August 2019. It was designed to be deployed in front of Prometheus Query API in order to ensure:
Since the nature of Cortex backend is really similar to Thanos, with exactly the same PromQL API, and long term capabilities, the caching work done for Cortex fits to Thanos. Given also our good collaboration in the past, it feels natural to reuse Cortex’s code. We even started discussion to move it to separate repo, but there was no motivation towards this, since there is no issue on using the Cortex one, as Cortex is happy to take generalized contributions.
At the end we were advertising to use Cortex query frontend on production on top of Thanos and this works considerably well, with some problems on edge cases and for downsampled data as mentioned here.
However, we realized recently that asking users to install suddenly Cortex component on top of Thanos system is extremely confusing:
All of this were causing confusion and questions like this.
At the end we decided with Thanos and Cortex maintainers that, ultimately, it would be awesome to create a new Thanos service called query-frontend
.
The idea is to create thanos query-frontend
component that allows specifying following options:
--query-range.split-interval
, time.Duration
--query-range.max-retries-per-request
, int
, default = 5
--query-range.disable-step-align
, bool
--query-range.response-cache-ttl
time.Duration
--query-range.response-cache-max-freshness
time.Duration
default = 1m
--query-range.response-cache-config(-file)
pathorcontent
+ CacheConfigWe plan to have in-mem, fifo and memcached support for now. Cache config will be exactly the same as the one used for Store Gateway.
This command will be placeholder for any query planning or queueing logic that we might want to add at some point. It will be not part of any gRPC API.
To make this happen we will propose a small refactor in Cortex code to avoid unnecessary package dependencies.
Unfortunately we tried this path already without success. Reasons were mentioned in Motivation
This will definitely simplify deployment if Querier would allow caching directly. However, this way is not really scalable.
Furthermore, eventually frontend will be responsible for more than just caching. It is meant to do query planning like splitting or even advanced query parallelization (query sharding). This might mean future improvements in terms of query scheduling, queuing and retrying. This means that at some point we would need an ability to scale query part and caching/query planner totally separately.
Last but not least splitting queries allows to perform request in parallel. Only if used in single binary we can achieve load balancing of those requests.
NOTE: We can still consider just simple response caching inside the Querier if user will request so.
I think this does not need to be explained. Response caching has proven to be not trivial. It’s really amazing that we have opportunity to work towards something that works with experts in the field like @tomwilkie and others from Loki and Cortex Team.
Overall, Reusing is caring.
thanos query-frontend
subcommand.Improvements to Cortex query frontend, so Thanos query-frontend
as described here