Skip to content

Handle parallel computation timeouts when forming a cluster #539

Closed
@michaelklishin

Description

@michaelklishin

Describe the bug

This piece of code

ra/src/ra.erl

Line 396 in ae8cbf2

ra_lib:partition_parallel(
does not correctly handle ra_lib:partition_parallel/2 timeout errors, and as a result, say, a RabbitMQ quorum queue ends up with a metadata that does not match the state of its Ra cluster.

The ra_lib:partition_parallel/2 return value should change to something like

{ok, Started, NotStarted} | {error, timeout}

instead of throwing an error.

Reproduction steps

Unfortunately, none. It's a difficult to hit timeout.

Expected behavior

That Ra cluster members are rolled back in case of the error.

Additional context

rabbitmq/rabbitmq-server#13828.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions