Thanos supports different tracing backends that implements
All clients are configured using
--tracing.config-file to reference to the configuration file or
--tracing.config to put yaml config directly.
You can either pass YAML file defined below in
--tracing.config-file or pass the YAML content directly using
--tracing.config. We recommend the latter as it gives an explicit static view of configuration for each component. It also saves you the fuss of creating and managing additional file.
Don’t be afraid of multiline flags!
In Kubernetes it is as easy as (using Thanos sidecar example):
- args: - sidecar - | --objstore.config=type: GCS config: bucket: <bucket> - --prometheus.url=http://localhost:9090 - | --tracing.config=type: STACKDRIVER config: service_name: "" project_id: <project> sample_factor: 16 - --tsdb.path=/prometheus-data
At that point, anyone can use your provider by spec.
See this issue to check our progress on moving to OpenTelemetry Go client library.
Once tracing is enabled and sampling per backend is configured, Thanos will generate traces for all gRPC and HTTP APIs thanks to generic “middlewares”. Some more interesting to observe APIs like
query_range have more low-level spans with focused metadata showing latency for important functionalities. For example, Jaeger view of HTTP query_range API call might look as follows:
As you can see it contains both HTTP request and spans around gRPC request, since Querier calls gRPC services to get fetch series data.
Each Thanos component generates spans related to its work and sends them to central place e.g Jaeger or OpenTelemetry collector. Such place is then responsible to tie all spans to a single trace, showing a request execution path.
Single trace is tied to a single, unique request to the system and is composed of many spans from different components. Trace is identifiable using
Trace ID, which is a unique hash e.g
131da78f02aa3525. This information can be also referred as
request id and
operation id in other systems. In order to use trace data you want to find trace IDs that explains the requests you are interested in e.g request with interesting error, or longer latency, or just debug call you just made.
When using tracing with Thanos, you can obtain trace ID in multiple ways:
X-Thanos-Trace-Idresponse header with trace ID of this request as value.
Every request against any Thanos component’s API with header
X-Thanos-Force-Tracing will be sampled if tracing backend was configured.
Currently supported tracing backends:
Client for https://github.com/jaegertracing/jaeger tracing.
type: JAEGER config: service_name: "" disabled: false rpc_metrics: false tags: "" sampler_type: "" sampler_param: 0 sampler_manager_host_port: "" sampler_max_operations: 0 sampler_refresh_interval: 0s reporter_max_queue_size: 0 reporter_flush_interval: 0s reporter_log_spans: false endpoint: "" user: "" password: "" agent_host: "" agent_port: 0
Client for https://cloud.google.com/trace/ tracing.
type: STACKDRIVER config: service_name: "" project_id: "" sample_factor: 0
Client for https://www.elastic.co/products/apm tracing.
type: ELASTIC_APM config: service_name: "" service_version: "" service_environment: "" sample_rate: 0
Client for Lightstep.
In order to configure Thanos to interact with Lightstep you need to provide at least an access token in the configuration file. The
collector key is optional and used when you have on-premise satellites.
type: LIGHTSTEP config: access_token: "" collector: scheme: "" host: "" port: 0 plaintext: false custom_ca_cert_file: "" tags: ""
Found a typo, inconsistency or missing information in our docs? Help us to improve Thanos documentation by proposing a fix on GitHub here ❤️