Ticket: https://github.com/improbable-eng/thanos/issues/484
It is becoming clear that we need to remove gossip protocol as our main way of communication between Thanos Querier and other components. Static configuration seems to be well enough for our simple use cases. To give users more flexibility (similar to gossip auto-join logic), we already wanted to introduce a File SD that allows changing StoreAPI
s on-the-fly.
Gossip protocol (with the membership implementation) was built into Thanos from the very beginning. The main advantages over other solution to connect components were:
After a couple of month of maintaining Thanos project and various discussions with different users, we realized that those advantages are not outstanding anymore and are not worth keeping, compared to the issues gossip causes. There are numerous reasons why we should deprecate gossip:
cluster.advertise-address
that was sometimes able to magically deduce private IP address (and sometimes not!) were leading to lots of questions and issues. Something that was made for quick ramp-up into Thanos (“just click the button and it auto-joins everything”) created lots of confusion and made it harder to experiment.StoreAPI
s. It is clear that all-to-all
communication is neither necessary nor educative.StoreAPI
s (static --store
flag), we needed to implement health check and metadata propagation anyway. In fact, StoreAPI.Info
was already there all the time.peer
level and there is no way you can abstract multiple peers behind loadbalancer. This hides easy solutions from our eyes, e.g how to make Store Gateway HA. Without gossip, you can just use Kubernetes HA Service or any other loadbalancer. To support Store Gateway HA for gossip we would end up implementing LB logic in Thanos Querier (like proposed here)We will break backward compatibility, as gossip was the main way to connect components. As it is not very nice, it will ensure stability and better user experience and flexibility in the future. As we are in 0.1.0 RC process, we will not break any guarantee.
As help for migration we might want to add sidecar that uses FileSD but support gossip clustering.