-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Summary
Currently running a rendezvous server that helps nodes bootstrap into a set of peers under a topic.
The rendezvous server handled up to about 1000 connections with no issue, but now we have seen this rendezvous server get a panic from within the libp2p-rendezvous crate.
The number of connections has increased to about 3k-5k and this issue started to show up. We removed pending connection and established connection limits and started to see this behavior.
thread 'tokio-runtime-worker' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-rendezvous-0.14.0/src/server.rs:184:38:
This is the where the panic is occurring:
.expect("Send response"); |

I'm unsure if the issue is related to the number of connections, but that's the only thing that has changed.
The comments suggest that self.inner.send_response
will bubble up an error:
If the [
ResponseChannel] is already closed due to a timeout or the connection being closed, the response is returned as an
Err for further handling.
https://github.com/libp2p/rust-libp2p/blob/master/protocols/request-response/src/lib.rs#L484

Questions:
- Is it correct behavior for
self.inner.send_response
to panic if a connection timed out or closed? - How are we supposed to handle this?
- Any ideas as to why this could be happening?
Additional Note:
- I would expect the system to be able to handle many more connections than this given i have seen others show data where nodes are handling 10k-40k connections.
Expected behavior
I would expect for the response to fail for that specific connection and for the rendezvous server to continue handling all the other connections that are happening rather than crashing
Actual behavior
It panics frequently, after a couple minutes of handling connections.
Relevant log output
`thread 'tokio-runtime-worker' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-rendezvous-0.14.0/src/server.rs:184:38:`
Possible Solution
Perhaps the error should just be handled in a way that doesnt panic the entire swarm?
Version
libp2p=version = "0.53.2"
Would you like to work on fixing this bug?
Maybe