The Wayback Machine - https://web.archive.org/web/20241202005554/https://github.com/rabbitmq/rabbitmq-server/issues/3386
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent hash exchange bindings are changed and break when a node restarts #3386

Closed
luos opened this issue Sep 8, 2021 · 24 comments · Fixed by #5121
Closed

Consistent hash exchange bindings are changed and break when a node restarts #3386

luos opened this issue Sep 8, 2021 · 24 comments · Fixed by #5121
Assignees
Milestone

Comments

@luos
Copy link
Contributor

luos commented Sep 8, 2021

Hi,

When a single node is restarted out of a cluster the consistent hash exchange will add existing bindings again into the ring.

Reproduction:

  1. Create 3 node cluster
  2. Create a consistent hash exchange
  3. Bind a queue to it with any weight (for example 1)
  4. Stop a node (with stop_app or anything)

Output:

root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
[{chx_hash_ring,{resource,<<"/">>,exchange,<<"test-exchange">>},
                #{13 => {resource,<<"/">>,queue,<<"test-qq">>}},
                14}]
root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'^C
root@rmq1:/# rabbitmqctl start_app^C
root@rmq1:/# rabbitmqctl stop_app && rabbitmqctl start_app
Stopping rabbit application on node [email protected] ...
Starting node [email protected] ...
root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
[{chx_hash_ring,{resource,<<"/">>,exchange,<<"test-exchange">>},
                #{13 => {resource,<<"/">>,queue,<<"test-qq">>},
                  14 => {resource,<<"/">>,queue,<<"test-qq">>},
                  15 => {resource,<<"/">>,queue,<<"test-qq">>}},
                16}]
root@rmq1:/# rabbitmqctl stop_app && rabbitmqctl start_app
Stopping rabbit application on node [email protected] ...
Starting node [email protected] ...
root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
[{chx_hash_ring,{resource,<<"/">>,exchange,<<"test-exchange">>},
                #{13 => {resource,<<"/">>,queue,<<"test-qq">>},
                  14 => {resource,<<"/">>,queue,<<"test-qq">>},
                  15 => {resource,<<"/">>,queue,<<"test-qq">>},
                  16 => {resource,<<"/">>,queue,<<"test-qq">>},
                  17 => {resource,<<"/">>,queue,<<"test-qq">>}},
                18}]
root@rmq1:/# rabbitmqctl stop_app && rabbitmqctl start_app
Stopping rabbit application on node [email protected] ...
Starting node [email protected] ...
root@rmq1:/# rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
[{chx_hash_ring,{resource,<<"/">>,exchange,<<"test-exchange">>},
                #{13 => {resource,<<"/">>,queue,<<"test-qq">>},
                  14 => {resource,<<"/">>,queue,<<"test-qq">>},
                  15 => {resource,<<"/">>,queue,<<"test-qq">>},
                  16 => {resource,<<"/">>,queue,<<"test-qq">>},
                  17 => {resource,<<"/">>,queue,<<"test-qq">>},
                  18 => {resource,<<"/">>,queue,<<"test-qq">>},
                  19 => {resource,<<"/">>,queue,<<"test-qq">>}},
                20}]

After this cleaning up the bindings and recreating them breaks the exchange:

  1. Remove all bindings from the queue / exchange
  2. Add a binding to a queue
  3. Try to publish messages
rmq1.monitored.host | 2021-09-08 12:52:49.746842+00:00 [warn] <0.32348.1> Bucket 6 not found
rmq1.monitored.host | 2021-09-08 12:52:53.163576+00:00 [warn] <0.32375.1> Bucket 6 not found
@michaelklishin
Copy link
Member

I only see two options:

  • Make hash ring additions idempotent
  • Assume that if the table can be sync, we can use its existing data (this may require using disk nodes, which I don't know how to make upgradeable)

Binding recovery will happen on node boot regardless of what we do. Besides bindings themselves, exchange state is a really rare edge case that's not really handled in RabbitMQ.

Ideas and PRs would be appreciated, our team doesn't have the capacity to redesign this plugin any time soon.

@luos
Copy link
Contributor Author

luos commented Sep 8, 2021

I think it would be fine to not apply the binding if we already have weight number of entries - though it makes it harder that you can bind the queue multiple times.

Maybe as a quick fix just resolving the counter(?) which goes wrong with the buckets would be fine so at least there is a queue where the message goes even if it's not consistent.

@michaelklishin
Copy link
Member

We can ignore the fact that you can bind a queue multiple times as that's clearly something the binding recovery process and such a stateful exchange type cannot handle (we don't know what other bindings are going to be recovered down the line).

For this exchange type, multiple bindings likely is a result of a mistake. We can assume that if you have enough ring entries ("weight") for the recovered binding, we can do nothing.

@FalconerTC
Copy link

FalconerTC commented Sep 23, 2021

Very interested in a resolution to this issue! I'm seeing the same issue, where node restarts cause the exchange to become unusable, though I'm not seeing the binding count increase.

@FalconerTC
Copy link

FalconerTC commented Nov 16, 2021

Turns out I didn't really resolve this in #3594... When bindings are copied over when a new node comes up, they aren't sorted anymore so remove_binding doesn't properly adjust

An example after node removal:

I have no name!@rabbitmq-6:/$ rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
 {chx_hash_ring,{resource,<<"/">>,exchange,<<"order-group-action">>},
                #{0 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-pzn4k">>},
                  1 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-kt7v5">>},
                  2 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-kt7v5">>},
                  3 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-pzn4k">>},
                  4 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-kt7v5">>},
                  5 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-pzn4k">>},
                  6 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-jkw6b">>}},
                7},

Which means the ring breaks when one of those bindings is removed

I have no name!@rabbitmq-6:/$ rabbitmqctl eval 'ets:tab2list(rabbit_exchange_type_consistent_hash_ring_state).'
 {chx_hash_ring,{resource,<<"/">>,exchange,<<"order-group-action">>},
                #{3 =>
                      {resource,<<"/">>,queue,
                                <<"order-group-action-panda-668d997c5-jkw6b">>}},
                4},

This also can happen by doing the following

  1. Create binding1 with queue1
  2. Create binding2 with queue2
  3. Create binding3 with queue1 (with a different weight)
  4. Delete binding1

I imagine this can be fixed with some sorting in remove_binding but as per

For this exchange type, multiple bindings likely is a result of a mistake

I wonder if it would be better to just reject new bindings with the same name? Any suggestion @michaelklishin ?

@luos
Copy link
Contributor Author

luos commented Nov 18, 2021

I think that maybe rebinding should just ensure that there are as many buckets as the new binding and drop the old binding. What do you think?

I'd say rejecting a binding may be unexpected for some applications, though most probably for 99% of users it does not matter (and now it's broken anyway).

@de1ayer
Copy link

de1ayer commented Feb 8, 2022

Hi all! I've experienced same issue last week ("Bucket not found" in log, no routing out of exchange) and wondering around for fix. #3594 is merged to 3.8.24 (we have 3.8.3 versioned cluster at the moment) so I may update to get partly fix.

What about adding new nodes which is the main way to scale cluster? Is it unsafe for 'x-consistent-hash' exchanges ?

@caoheyang
Copy link

caoheyang commented Feb 16, 2022

Hi all! Is there have any updates?

Heyang

Regards

@luos
Copy link
Contributor Author

luos commented Feb 16, 2022

Hi,

From my side, I'd prefer if the plugin rejects rebinding, that would simplify it, but it definitely needs some more thinking and implementation. :)
I think node restarts then deleting the binding still break it. I have not tested but possibility it's OK when you do not delete bindings, keeping in mind, restarting will probably reshuffle the exchange.

@luos
Copy link
Contributor Author

luos commented Feb 23, 2022

Hi,

I made some tests and here are my findings:

  • Restarting a node does not really break the exchange, however it adds more bindings as previously. This means that the messages will be reshuffled.
  • After a restart, deleting a binding will leave the exchange in an inconsistent state, losing messages.

I tested this on 3.9.13.

@nunoguerreirorosa
Copy link

nunoguerreirorosa commented Mar 1, 2022

Consistent hash exchange is the only option as far as I am aware in RabbitMQ to be able to maintain the order of dependent messages while scaling up the number of consumers. This plugin was also added in Amazon MQ but apparently with this particular bug it ends up rendering this plugin unusable. I would say it is quite a critical bug for anyone thinking on using this particular plugin, since a node restart within a cluster is quite a standard operation. Is there any way this can be prioritized in the roadmap @michaelklishin ?

@stgolem
Copy link

stgolem commented Apr 20, 2022

@FalconerTC is there any chance on fixing this issue? Or are there any known workarounds for this aside from full rebinding? We're stuck with this issue in our application also.

@michaelklishin
Copy link
Member

@nunoguerreirorosa this is open source software, so you if want something "prioritized", dive in and provide a set of steps to reasonably reliably reproduce, then feel free to look into the root cause. Pressing others to "prioritize" something in a piece of software you get for free is not how open source software evolves.

Consistent hashing exchange, unlike every other "built-in" (or tier 1) exchange type, is stateful. Booting nodes would recover ring state while applications can already begin modifying the same state by binding or unbinding. Besides delaying client connection listener startup
until after all plugins are enabled and some time passes, there is no way to avoid this that
would not result in refused client operations.

@luos ring state recovery does not involve creating any bindings. The plugin uses a list of
durable bindings to recompute the (transient) state and then adapts it as bindings are
added or removed.

@stgolem we don't understand what the issue is or how to reproduce it. You are welcome to
spend some time on that.

@michaelklishin
Copy link
Member

I now see some investigations above 👍

Making binding addition idempotent sounds like the right thing to do (removal, of course, already is).

This is a behavior change so we either have to try to squeeze it into 3.10 or anger the SemVer lovers and ship it in a patch release (even though it would count as a bug fix to many).

@michaelklishin
Copy link
Member

Other team members suggest that we cannot delay 3.10 over this, so it would have to go into a patch release at some point.

@luos
Copy link
Contributor Author

luos commented Apr 20, 2022

I do not think that's a problem. It is a bug and now it is quite broken, I'd say it is not recommended to use in a clustered environment, so if it gets fixed at some point, it will be a bug fix.

@stgolem
Copy link

stgolem commented Apr 20, 2022

Thanks for your time. We have exactly same issue as others with this plugin in cluster environment and node restarts. We will gently wait for this issue to resolve in some next release. As a user i classify this as a bug.

@SteamUpdate
Copy link

Greetings! I have a case how to reproduce "bucket not found" error.

  • RMQ Version: 3.9.15
  • Erlang Version: 24.3.3

RMQ runs as a cluster of three nodes. One exchange and four queues has been created. 1 & 3 queues are located on node 2, 2 & 4 - on node 1. Four bindings with routing key "1" has been created also.

Check the ring status:

$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test3'
Ring index: 3, queue: 'test4'

Send a message with routing key 101 in the exchange. The message has been routed to queue 'test1'.

Next, stop node 1 with rabbitmqctl stop_app command and check the ring status:

rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test3'

Next, start node 1 with rabbitmqctl start_app command and check the ring status:

rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 0, queue: 'test1'
Ring index: 1, queue: 'test3'
Ring index: 2, queue: 'test1'
Ring index: 3, queue: 'test3'
Ring index: 4, queue: 'test2'
Ring index: 5, queue: 'test4'

Next, stop node 2 and check the ring status:

$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test4'

Next, start node 2 and check the ring status:

$ rabbitmq-diagnostics consistent_hash_exchange_ring_state test-hash
Inspecting consistent hashing ring state for exchange test-hash in virtual host '/'...
Ring index: 1, queue: 'test2'
Ring index: 2, queue: 'test4'
Ring index: 3, queue: 'test2'
Ring index: 4, queue: 'test4'
Ring index: 5, queue: 'test1'
Ring index: 6, queue: 'test3'

Finally, send a message with routing key 101 in the exchange.

Result: the message published, but not routed. Logs: 2022-04-20 13:20:04.078869+00:00 [warn] <0.24750.1> Bucket 0 not found

cc: @luos @michaelklishin

@michaelklishin michaelklishin added this to the 3.10.1 milestone Apr 21, 2022
@michaelklishin michaelklishin removed this from the 3.10.1 milestone May 9, 2022
@tronyx
Copy link

tronyx commented Jun 8, 2022

Is this planned to be fixed in both 3.9.X and 3.10.X? Are there any recommendations on a quicker way to resolve this after container recreation other than simply removing and recreating the exchange?

@michaelklishin michaelklishin self-assigned this Jun 15, 2022
@michaelklishin
Copy link
Member

michaelklishin commented Jun 16, 2022

I have been looking at how we can make binding addition and deletion idempotent for this exchange type. I see only one simple way: to keep track of what (destination, type, routing key) have been added and if they are known, make the function do nothing.

This has one limitation. Code that repeatedly binds the same exchange and queue with different routing keys/weights is no longer going to work:

# in pseudocode
bind(chx, q1, weight: 1)
bind(chx, q1, weight: 3)

will lead to a single binding with "weight = 1". I find this acceptable but there may be some code that's going to break as a result, or at least the routing distribution will be affected.
Quite frankly, I don't think the weights idea is worth the complexity anyway, and this may
be our first step towards getting rid of it entirely for 3.11 or 4.0.

Trying to make hash ring updates idempotent is going to be too complex for me to agree to.

@HoloRin
Copy link
Contributor

HoloRin commented Jun 17, 2022

If it is the case that the effect of weights is indeterminate in the face of node restarts, then adopting a first wins approach does not seem unreasonable to me.

@michaelklishin
Copy link
Member

The hash ring state record changes would require a schema database migration, which seems like an overkill here. Will try to make the function idempotent using the bindings/routes tables instead which are very stable.

@michaelklishin
Copy link
Member

First attempt at relying on binding data to make hash ring state management idempotent failed.
Because RabbitMQ will create bindings anyway, it would only create confusion.

A longer term solution would be to make this the first plugin that adopts Khepri, our Raft-based next generation schema data store.

In the short term I'll look into using the state of the ring itself and/or the good old distributed locking module in Erlang.

ansd added a commit that referenced this issue Jun 30, 2022
First binding wins.
Duplicate bindings, i.e. bindings with the same source exchange and
same destination queue / exchange but possibly different routing key
(weight) are ignored from now on by the consistent hash exchange.

This applies only to bindings being added.
For bindings being deleted, any duplicate binding (independent of its
routing key) will delete all buckets for the given source and
destination. (This is to ensure that buckets for a given source and
destination can be deleted for upgrades prior to this change. This
was also the behaviour prior to this commit, so nothing changes in that
regard.)

Note that duplicate bindings continue to be created in RabbitMQ.
(They are only ignored by the consistent hash exchange.)

Adding a binding will perform linear search in the bucket map.
This is already stated in the README:
"These two operations use linear algorithms to update the ring."

The linear search when adding a binding could be optimised by
adding another Mnesia table field which will require a new migration and
feature flag. Hence, such an optimization is left out in this commit.

Fixes #3386.
ansd added a commit that referenced this issue Jun 30, 2022
First binding wins.
Duplicate bindings, i.e. bindings with the same source exchange and
same destination queue / exchange but possibly different routing key
(weight) are ignored from now on by the consistent hash exchange.

This applies only to bindings being added.
For bindings being deleted, any duplicate binding (independent of its
routing key) will delete all buckets for the given source and
destination. (This is to ensure that buckets for a given source and
destination can be deleted for when upgrading from a version prior
to this commit. This was also the behaviour prior to this commit,
so nothing changes in that regard.)

Note that duplicate bindings continue to be created in RabbitMQ.
(They are only ignored by the consistent hash exchange.)

Adding a binding will perform linear search in the bucket map.
This is already stated in the README:
"These two operations use linear algorithms to update the ring."

The linear search when adding a binding could be optimised by
adding another Mnesia table field which will require a new migration and
feature flag. Hence, such an optimization is left out in this commit.

Fixes #3386.
ansd added a commit that referenced this issue Jun 30, 2022
First binding wins.
Duplicate bindings, i.e. bindings with the same source exchange and
same destination queue / exchange but possibly different routing key
(weight) are ignored from now on by the consistent hash exchange.

This applies only to bindings being added.
For bindings being deleted, any duplicate binding (independent of its
routing key) will delete all buckets for the given source and
destination. (This is to ensure that buckets for a given source and
destination can be deleted for when upgrading from a version prior
to this commit. This was also the behaviour prior to this commit,
so nothing changes in that regard.)

Note that duplicate bindings continue to be created in RabbitMQ.
(They are only ignored by the consistent hash exchange.)

Adding a binding will perform linear search in the bucket map.
This is already stated in the README:
"These two operations use linear algorithms to update the ring."

The linear search when adding a binding could be optimised by
adding another Mnesia table field which will require a new migration and
feature flag. Hence, such an optimization is left out in this commit.

Fixes #3386.
@ansd
Copy link
Member

ansd commented Jun 30, 2022

Thanks @SteamUpdate for the excellent reproduction steps in #3386 (comment).

#5121 makes adding bindings idempotent and will therefore fix this issue.

As @michaelklishin already mentioned this comes with a slight breaking change for applications that previously relied upon adding duplicate bindings (i.e. bindings with the same source exchange and same destination queue but possibly different routing key).

mergify bot pushed a commit that referenced this issue Jun 30, 2022
First binding wins.
Duplicate bindings, i.e. bindings with the same source exchange and
same destination queue / exchange but possibly different routing key
(weight) are ignored from now on by the consistent hash exchange.

This applies only to bindings being added.
For bindings being deleted, any duplicate binding (independent of its
routing key) will delete all buckets for the given source and
destination. (This is to ensure that buckets for a given source and
destination can be deleted for when upgrading from a version prior
to this commit. This was also the behaviour prior to this commit,
so nothing changes in that regard.)

Note that duplicate bindings continue to be created in RabbitMQ.
(They are only ignored by the consistent hash exchange.)

Adding a binding will perform linear search in the bucket map.
This is already stated in the README:
"These two operations use linear algorithms to update the ring."

The linear search when adding a binding could be optimised by
adding another Mnesia table field which will require a new migration and
feature flag. Hence, such an optimization is left out in this commit.

Fixes #3386.

(cherry picked from commit 878f369)
mergify bot pushed a commit that referenced this issue Jun 30, 2022
First binding wins.
Duplicate bindings, i.e. bindings with the same source exchange and
same destination queue / exchange but possibly different routing key
(weight) are ignored from now on by the consistent hash exchange.

This applies only to bindings being added.
For bindings being deleted, any duplicate binding (independent of its
routing key) will delete all buckets for the given source and
destination. (This is to ensure that buckets for a given source and
destination can be deleted for when upgrading from a version prior
to this commit. This was also the behaviour prior to this commit,
so nothing changes in that regard.)

Note that duplicate bindings continue to be created in RabbitMQ.
(They are only ignored by the consistent hash exchange.)

Adding a binding will perform linear search in the bucket map.
This is already stated in the README:
"These two operations use linear algorithms to update the ring."

The linear search when adding a binding could be optimised by
adding another Mnesia table field which will require a new migration and
feature flag. Hence, such an optimization is left out in this commit.

Fixes #3386.

(cherry picked from commit 878f369)
(cherry picked from commit ab4f8a9)
@michaelklishin michaelklishin added this to the 3.10.6 milestone Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet