-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Hello everyone,
when configuring the engine_threads to any value > 1 I get an error when connecting from a client to a server.
For the configuration with engine_threads=2, I get the error occasionally e.g. not for every port combination. The configuration engine_threads=16 fails more often e.g. on every port combination I tried.
The error message on the client is
ubuntu@ip-172-31-32-21:~$ ${MSG_GEN} --local_ip 172.31.32.121 --remote_ip 172.31.32.120 --msg_size 64 --msg_window 32
I20241103 18:50:38.682282 1 main.cc:332] Starting in client mode, request size 64
Checking for file descriptor...
Got a file descriptor!
ERROR: Failed to dequeue response from control queue.
F20241103 18:50:49.975369 1 main.cc:346] Check failed: ret == 0 Failed to connect to remote host. machnet_connect() error: Unknown error -1
*** Check failure stack trace: ***
@ 0x7fa3d8ce3f03 google::LogMessage::Fail()
@ 0x7fa3d8ce793c google::LogMessage::SendToLog()
@ 0x7fa3d8ce39e7 google::LogMessage::Flush()
@ 0x7fa3d8ce509f google::LogMessageFatal::~LogMessageFatal()
@ 0x562d0c932a28 main
@ 0x7fa3d8866d90 (unknown)
I have a server running on another EC2 instance with this command
ubuntu@ip-172-31-32-20:~$ ${MSG_GEN} --local_ip 172.31.32.120 --msg_size 64
On the other hand, if I use engine_threads=1, the execution succeeds
ubuntu@ip-172-31-32-21:~$ ${MSG_GEN} --local_ip 172.31.32.121 --remote_ip 172.31.32.120 --msg_size 64 --msg_window 32
I20241103 18:06:00.837787 1 main.cc:332] Starting in client mode, request size 64
Checking for file descriptor...
Got a file descriptor!
I20241103 18:06:03.949545 1 main.cc:350] [CONNECTED] [172.31.32.121:1024 <-> 172.31.32.120:888]
I20241103 18:06:03.972815 7 main.cc:294] Client Loop: Starting.
TX/RX (msg/sec, Gbps): (0.0K/0.0K, 0.000/0.000). RTT (p50/99/99.9 us): 144/144/144
TX/RX (msg/sec, Gbps): (220.0K/220.0K, 0.113/0.113). RTT (p50/99/99.9 us): 143/177/195
TX/RX (msg/sec, Gbps): (220.0K/220.0K, 0.113/0.113). RTT (p50/99/99.9 us): 143/177/194
TX/RX (msg/sec, Gbps): (217.4K/217.4K, 0.111/0.111). RTT (p50/99/99.9 us): 143/179/543
TX/RX (msg/sec, Gbps): (220.0K/220.0K, 0.113/0.113). RTT (p50/99/99.9 us): 143/177/193
TX/RX (msg/sec, Gbps): (220.2K/220.2K, 0.113/0.113). RTT (p50/99/99.9 us): 143/177/190
TX/RX (msg/sec, Gbps): (220.1K/220.1K, 0.113/0.113). RTT (p50/99/99.9 us): 143/177/191
TX/RX (msg/sec, Gbps): (220.1K/220.1K, 0.113/0.113). RTT (p50/99/99.9 us): 143/177/189
MSG_GEN="docker run -v /var/run/machnet:/var/run/machnet ghcr.io/microsoft/machnet/machnet:latest release_build/src/apps/msg_gen/msg_gen"
Setup: Two EC2 instances of type c5n.18xlarge running Kernel 6.5.0-1014-aws on Ubuntu 23.10.
Metadata
Metadata
Assignees
Labels
No labels