Skip to content

Conversation

@vvcarvalho-csw
Copy link

No description provided.

Rui Graça and others added 2 commits November 12, 2025 16:08
Wait for unacknowledged data on all TCP endpoints

Before calling close() add a wait for all data to be sent and acked.

Setting the linger option, on tcp sockets, will not change the TIME_WAIT value.
The linger will block the close call, for a maximum of the defined time.
This means, if we set linger to 5, the close() call can block up to 5 seconds.
Afterwards, the socket can still be in a TIME_WAIT state for 60~120 seconds.
The only way to fully remove the TIME_WAIT state is to set the linger to 0, which forces a RST to be sent.
The issue with this approach is that when calling the close() the RST will be sent and the buffered data,
that is sill to be sent, will be dropped.
In order to avoid this, before calling the close(), vsomeip-lib needs to wait until all data is sent.
SOME/IP messages have a 16-byte header, but a significant amount of the
codebase (including rmi::on_message) uses a 8-byte definition
This is a purposefully small, minimal fix
fabiofcmonteiro and others added 3 commits November 12, 2025 16:36
Returns when receiving messages on invalid port
vsomeip sends a StopSubscribe to a service, but still receives a external notification for that service ~15ms after.
This leads to the InstanceID be 0xFFFF, and we log the error message
Received message on invalid port: [beef.FFFF.0001.0000.0000] from: 192.168.2.57:15000
Modify the multicast group test to detect issues related to UDP packets dispatching.
SO_REUSEPORT was added to workaround a TCP issue related to the restart.
During the review process, it was added too to UDP but it introduced a
regression: UDP packets were no longer being received by the correct socket.
Indeed, there are no written rules defining how UDP packets should be
distributed by Linux when multiple sockets are bound to the same endpoint.
VSOMEIP can create up to three sockets on the same endpoint. It was
thought that when the SO_REUSEPORT option was enabled, Linux routed
all traffic to the one bound to INADDR_ANY. That is not correct. Therefore,
the fix that redirected messages received by the socket bound to
INADDR_ANY has been removed from this PR, to avoid any regression. It
remains a check that prints a warning if this socket receives a packet
addressed to another socket.
Therefore, limited to extending the multicast group test in order to
detect the problem of unexpected dispatching of UDP packets.
Update the error handling for local communication to synchronize IO operations errors and state.
In some scenarios, a race condition between
local client endpoints read and write callbacks error handling could force an application to never
fully close the socket and be stuck on a re-connection loop.
This PR updates the error handling of both IO operations for local tcp and uds communication.
The callbacks for a single connection are invoked sequentially and only the first one is processed,
moreover, the re-connect decision is forwarded for the routing manager instead of forcing low-level
retries.
joaofesilva and others added 5 commits November 12, 2025 16:55
Prevent race condition on insert_subscription and improve logging
Consider the following case:

Server offers a service, before registering all the events
Two applications subscribe said service at the same time, including the unregistered events

In this scenario there is a race condition on insert_subscription, where
two placeholders are created for the same subscription

routing_manager_base::insert_subscription(a524): [cafe.0001.0001.0001] received subscription for unknown
(unrequested / unoffered) event. Creating placeholder event holding subscription until event is requested/offered.
routing_manager_base::insert_subscription(a52d): [cafe.0001.0001.0001] received subscription for unknown
(unrequested / unoffered) event. Creating placeholder event holding subscription until event is requested/offered.

However the second placeholder created ends up not being successful and the subscription
is not inserted into the eventgroup.
This PR fixes this race by protecting the call to create_placeholder_event_and_subscribe
Also improve logging
It is possible that an application sees several clients with the same
routing info/guest address. It is a transient state, that happens when
an application restarts, acquires a new client-id, but binds to the same
ip/port
There is a long history to the problem:

prior to 062d843, it can happen, but it is temporary, as
applications would drop client after receiving RIE_DELETE_CLIENT
(that of course has other issues...)
after 062d843, it can happen, but is no longer temporary
9e5fff5 tries to tackle it by dropping routing, but using the same
client connection
176aaef partially reverts 9e5fff5

This commit resolves the issue by dropping old clients, as soon as a new
client with the same routing info/guest address is received
fake_tcp_socket_handle::is_connected can extend the lifetime of the
underlying socket objects, end up being the one triggering destruction
It deadlocks, because destruction invokes socket_manager, and this
function is itself invoked by socket_manager. Fix it with a trick
Only close server endpoints at end of suspend process
Server endpoints were being closed at the beginning of the suspend process, preventing necessary messages
from being sent during suspension and causing them to remain pending until resume starts.
One affected case is Stop Offer messages, which were verified as not being sent during suspend, explaining the
high number of sdi::check_stopped_services_on_suspend stop offer not sent for service logs in 3.5.11.
This change implements the correct flow for the suspend process: close connections only after all cleanup
operations are complete.
Update vsomeip-lib to v3.5.11

Only close server endpoints at end of suspend process
tests: fix fake_socket_tests deadlock
routing: drop old clients with same routing info
 prevent race on insert_subscription
Update error handling for local endpoints
extend multigroup test to detect issue with UDP dispatching
Return on received message on invalid port/instance
rmi: fix buffer overflow
Wait for unacknowledged data before closing
Client should start Wait Phase when offer TTL expires
fix/doc offered_services_info
application start-stop-start broken with local TCP
Fix reuse address issues in tests
manage suspend event
log rmc state transitions as a string
doc and fix security_test
Test cached event feature
rm: improve subscription logging
Use universal time (availability_handler_test)
add logs when unsetting payload
misc: add predicate to condition variables
Add boardnet initial event test
Fix subscribe notify test diff clients
Fix restart routing test
cei, ltcei, rmc: Trivial changes on logging readability
Check ACL and process messages if the address is specified
Adds all sanitizers and valgrinds to the check job
Fix subscribe notify test one tests
emi: remove deferrence of multicast operations
remove sessionID from clients_ map
cei: trivial changes, removes duplicated code
Update cache on change
Add basic watchdog for tracking stop offers
routing: remove dead code
endpoints: set SO_LINGER to 0, force TCP RSTs
@fcmonteiro fcmonteiro merged commit f58ba57 into COVESA:master Nov 14, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants