-
Notifications
You must be signed in to change notification settings - Fork 766
vSomeIP 3.5.9 Release #968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4eaaadc to
e89240c
Compare
Implement a clean shutdown of offer_test_service. The application was interrupted using SIGTERM, and Valgrind sometimes reported memory leaks. By performing a clean shutdown, this error no longer occurs.
Fix race condition in test suspend_resume_test_initial when a notification is received after the unavailability.
the test failed because a notification was received when the service was unavailable.
2025-08-11 14:44:47.829423 suspend_resume_test_service [info] rmi::set_routing_state
Set routing to suspend mode, diagnosis mode is inactive.
2025-08-11 14:44:47.829768 suspend_resume_test_client [debug] on_message: Received event.
2025-08-11 14:44:47.829864 suspend_resume_test_client [debug] on_availability: Test service is NOT available.
2025-08-11 14:44:47.829890 suspend_resume_test_client [debug] on_message: Received event.
2025-08-11 14:44:47.829894 suspend_resume_test_client [debug] [TEST-cli] HasReceived Changed, triggering cv
This last trace means that the variable has_received_ has been set to true.
After resume, the test expected to receive a notification, which it did. But this was ignored because the
variable has_received_ was not set to false.
2025-08-11 14:44:52.829535 suspend_resume_test_service [info] rmi::set_routing_state
Set routing to resume mode, diagnosis mode was inactive.
2025-08-11 14:44:52.840545 suspend_resume_test_client [debug] [TEST-cli] On availability will trigger cv
2025-08-11 14:44:52.840596 suspend_resume_test_client [debug] [TEST-cli] Service Available after susp/resume: r=0
2025-08-11 14:44:54.836073 suspend_resume_test_client [debug] on_message: Received event.
There is a defect in the test that requested two notifications by generating unsubscribe/subscribe commands
without waiting for confirmation of these operations. The result of these two operations interfered with the
expected sequence.
add check to tests return code and documentation for debounce_filter_test
add check to tests return code and documentation for cyclic_event_tests
add check to tests return code and documentation for debounce_callback_test
add check to tests return code and documentation for debounce_frequency_test
Document offer_test_local test. Add VSOMEIP_APPLICATION_NAME to the test executables
Reduces the amount of cycles from 100 to 5. Increases test timeout to 180 seconds. This test fails in valgrind tests, which run in a slower environment. In a worst case scenario this test would take 6*100 seconds.
get_local_port would unnecessarily use getsockname syscall, and do it especially often when ACL is used While at it, remove set_local_port (shift logic to on_bind_error instead), and cleanup dead code
Unit tests of usei to reproduce time-related issues. I tried to adapt a POC usei network test, but the network tests are blackbox tests, and we cannot create routing_manager_stub or endpoint_manager_impl. Changing the interface was considered, but it would not pass review. Another solution was to use debug symbols to find the private methods, but that would be too much. For these reasons, unit tests are developed.
All Android builds also define linux, plenty of portability guides state the same (which makes sense, Android uses a Linux kernel..). This eliminates most uses of ifdef ANDROID
The test failed due to a delay of 6 ms, whereas the maximum expected delay was 5 ms. debug log have been activated and the high_resolution_clock has been replaced by the steady_clock. An issue has been found in the code generating the messages that has a time drift.
Came across missing details
The fake_socket tests already exceed the 1MB limit, which is anyhow hilariously small. Increase it by a few orders
4a0295b introduces a data race caught by the CI, where: lusei restarts (which is stop+init+start) init writes local_, which is the address of the acceptor socket (in the case of lusei, something like /var/run/someip/vsomeip-1234) however, even though stop closes the acceptor socket, an accept_cbk can execute, which due to 4a0295b, also reads local_ for logging The accept_cbk in the background is concerning and needs to be solved, this commit just mitigates the race. And of course, both lusei and ltsei have the same race, there are just more tests using lusei
add fair-sched to valgrind Set valgrind with --fair-sched=yes. This allows created threads to run with same priority as main application thread. Without this, valgrind stalled application thread execution and the app that started first would get almost no cpu time. See: https://valgrind.org/docs/manual/manual-core.html https://valgrind.org/docs/manual/manual-core.html#manual-core.pthreads_perf_sched
Enable compilation of usei_tests/ut_basic_tests with boost 1.87+ boost 1.87 retires asio::io_context::work. It was replaced by executor_work_guard.
Force remote subscriptions to be removed on host error. In some ECU modes, an emergency shutdown is executed instead of a graceful shutdown, this leads to vsomeipd not suspending, consequently, the vsomeip suspend command is not propagated to clients. For some clients, as the suspend command is not received and the unit doesn't reboot, it will not clean up the remote subscriptions map that it manages, the container is used to detect if a client is a new subscriber or not to an event (initial events are only sent if it's a new subscriber). This PR forces the clean up for scenarios where a graceful shutdown is not performed by cleaning up the container when the client detects and handles a connection lost towards host.
As it was, the result from ctest when running offer_stop_offer_test_client was not being evaluated. To fix this, keep track of the return codes of all programs started. Removed unnecessary code
Due to e8b6519 (netlink removal), libvsomeip receives a EADDRNOTAVAIL on bind, in ltcei::connect, before the network interface is ready Which by itself is expected, and not a problem - but due to broken error handling, libvsomeip continues with connect logic, does ASSIGN_CLIENT The sending of ASSIGN_CLIENT of course fails (connection is not open..), and causes a cascade of errors, which unfortunately only resolve on the ASSIGN_CLIENT_ACK timeout, which takes 3s and heals the situation. There are failures exactly because of these extra 3s. While at it, promote ASSIGN_CLIENT/REGISTER_APPLICATION timeouts to errors, as these should never happen
Abort when incorrect state is detected for critical syscall/libc calls such as recvfrom/sendto/epoll_wait. Under these situations, the process using libvsomeip is incorrectly handling file descriptors (e.g. double close with invalid fd value) leading to a libvsomeip state that is incorrect and not recoverable. The abort will cause a core dump and platforms can decide on recovery strategies.
strerror is not thread safe in older glibc, and anyhow not guaranteed to be thread safe in general It is not important, as any developer should recognize the term "errno", therefore remove all of the calls
If an application closes the epollfd that libvsomeip is using internally, libvsomeip IO threads will loop forever on a EBADF. Therefore, react on epoll_wait.
Test is fragile as there can be more client -> routing reconnects than expected, adjust check, add comment
This test was created to ensure that all clients subscribed to a remote service receives all notifications correctly when another client on the same device subscribed to the same service unsubscribes it. This happened since before it was fixed the routing manager would leave the multicast group where the service sent its notifications while there were clients still subscribed to it, leading to clients missing notifications being sent by the service
Fix remaining race condition in udp_server_endpoint_impl during restart. During a restart, a problem remained concerning buffer sharing between stopped streaming and started streaming. The solution is to allocate buffers dynamically. The unit test provided reproduces the problem and also detects an issue with a shared_pointer that was not protected.
Removed Services/multicast, as the multicast address and port must be assigned to the eventgroup as different eventgroups of the same service can use different multicast addresses/ports. Removed Eventgroups/is_multicast, as eventgroup uses the multicast address/port defined in the service node. This is superseded by the eventgroup specific multicast address/port definition Removed servicegroups, leftover from old versions of vsomeip. Removed routing-client-ports, leftover from old versions of vsomeip. Added Environment Variables info Adapted configuration test to the new changes, as the deprecated json now makes no sense to use.
Simplify the logger implementation and make is (mostly) lock-free. Refactor the logger implementation so it is lock-free for logging to both DLT, and console. This should completely remove contention and significantly improve performance in a multithreaded environment. However, one atomic load per message logged remains, due to out-of-band (and repeated) initialization of the logger. Fixing it would enable additional simplifications and optimizations. Only file logging still requires locking - there is no way around this, without relying on POSIX specifics. However, this PR also should improve performance here, as the log file no longer is opened, flushed, and closed for every single message. Also incorporates the more general optimization changes that were supposed to be merged, but somehow got lost in time.
Change tcp_tw_reuse to 2 to have the default linux values used on most ECUs tcp_tw_reuse is a Linux kernel parameter that enables the reuse of TCP sockets in the TIME_WAIT state for new outgoing connections. It has the following values: 0: Disable TIME_WAIT socket reuse 1: Enable TIME_WAIT socket reuse 2: Enable TIME_WAIT socket reuse for loopback only (Linux Default)
Fix invalid use of std::move The mutex and the condition variables were moved and then used again. It could explain the BLOCKING CALL that lead to a failure.
Use only console output, do not use dlt nor dlt-daemon, as there is anyhow a DISABLE_DLT option While at it, enable trace level for all tests
These logs should enable better tracing of connection lifecycles. On the client side the log is expanded to mention the remote address, on the server side a new log is added to mention the remote connector.
Avoid a busy loop if a failure occurs during a multicast leave operation. The leave failed because MACsec reconfigured the network interface during the normal execution. The endpoint_manager_impl doesn't repeat the leave but it repeats the join in a busy loop. By performing the leave operation even if it was already done, it will avoid the error: address already in use, when the join is done again. The error: address already in use, cannot be ignored because it happens during the restart case and it must be managed in this case. It isn't enough to fully fix the busy loop, because the join will return an error until the network interface is up again. So a delay has been added, It remains one issue: the leave isn't repeated by endpoint_manager_impl. Only the join is repeated. The correct handling of this problem would require to record that the join state as unknown. Then we would have to manage this new state and repeat of the leave operation. It is too risky for something that shall not happen.
Turns out that boost::system::errc::broken_pipe != boost::asio::error::broken_pipe This was very, very painful to find out. It is also generally true - no system::errc will ever be equal to any asio::error, they are inherently different error categories Remove dead code, fix tests. Remove all uses of system::errc because I never, ever want to deal with this again
It makes no sense to do so, because that causes libvsomeip to connect to vsomeipd on application::init, before there is ever an io thread executing events
If close is not called, the socket options can be overwritten by boost which can lead to an increased TIME_WAIT value With this change, the fake_socket required some changes in the receive_ handler clean up, to avoid lock order inversion
Remove duplicate shutdown closure of sockets. Both wait_until_sent and restart functions already call the shutdown_and_close_socket. There is no need to call it also before calling either of the other 2 functions. This PR is addressing the following scenario: cei::shutdown_and_close_socket_unlocked: socket shutdown error (107): Transport endpoint is not connected endpoint > 0xffff980b8a80 socket state > 3 cei::shutdown_and_close_socket_unlocked: not recreating socket endpoint > 0xffff980b8a80 socket state > 0 cei::shutdown_and_close_socket_unlocked: socket was not open endpoint > 0xffff980b8a80 socket state > 1 cei::shutdown_and_close_socket_unlocked: socket has been reset endpoint > 0xffff980b8a80 socket state > 0 cei::shutdown_and_close_socket_unlocked: socket was not open endpoint > 0xffff980b8a80 socket state > 1 cei::shutdown_and_close_socket_unlocked: not recreating socket endpoint > 0xffff980b8a80 socket state > 0 Where 2 extra shutdowns are called.
Update local_endpoint ClientID on add_guest This was the case: There was an endpoint established between two applications A and B. An STR occurred. The endpoint still exists, but is disconnected. It tries to reconnect, and succeeds. During this reconnection, application B re-registers to routing host and changes ClientID. Afterwards app A receives ON_AVAILABLE from app B, with a new ClientID. App A tries to connect to this new ClientID, which has the same address, and it fails with Cannot assign requested address. This fix proposes to update the local_endpoint map, to synchronize the ClientID with what we are updating in the guests map (when 'add_guest' is called).
Work around compiler bug in GCC < 10 after recent logger optimizations. GCC < 10 does not accept a struct with member initializers as a template argument for std::atomic. This was fixed in later versions; however, as the affected struct is never used without explicit initialization, just remove the initializers.
Silly one, likely there for years, but only happens if libdlt is used
For versions of libdlt that support privacy-aware logging, ensure that all normal log messages are marked as public. It was determined that vsomeip never logs sensitive data through the normal logger. Trace data containing message payloads (and thus potentially privacy- relevant information) go through a different code path, and remains marked as private for now.
SO_REUSEPORT option to mitigate the issue with Address already used. After resume, the TCP server that was closed during suspend received an unexpected error: Address already in use. The reason is not clear, and the DLT log shows that the stop was called. The workaround is to use the option SO_REUSEPORT. The same option is added to the UDP unicast server. The UDP multicast server already has error handling and does not require it. This PR also aims to improve logging, particularly by tracking init, start and stop operations, and by monitoring active connections.
Update vsomeip-lib to v3.5.9 Update local_endpoint ClientID on add_guest Make compile with GCC < 10 again tce: Remove duplicate shutdown closure of sockets Ensure socket closure on dtor rmc: remove sender start on rmc::init misc: fix use of non-asio error codes multicast, add repeat delay Expand logs to trace connections easier Minor change, typo when declaring endianness structure zuul: remove dlt flaky, subscribe_notify_test_one_event_two_eventgroups_tcp Change tcp_tw_reuse to 2 Avoid locking in and improve performance of the logger Remove leftover configurations fix race condition with on_message_received_unlocked Create multicast group test tests: fix flaky test allow_reconnects NTF integration pipeline misc: react on EBADF for epoll_wait misc: remove use of strerror Add support for VSOMEIP_ABORT_ON_CRIT_SYSCALL_ERROR ltcei: fix connect logic fix and document event_test fix offer_stop_offer_test scripts Force remote subscriptions to be removed on host error Enable usei_tests/ut_basic_tests with boost 1.87+ Post results to artifactory for batch pipelines add fair-sched to valgrind Adds remote information to client endpoint warning logs Adds vsomeip version to the cmake log lsei: fix race zuul: increase test output size lsei: minor log improvement flaky test, debounce_filter_test build: cleanup ifdef ANDROID usei, unit tests endpoint: fix get_local_port, remove set_local_port reduce time spent in test_restart_client_in_loop zuul: suppress sonarqube on catch-all exceptions document offer_test_local memcheck test doc/fix debounce_frequency_test fix/document debounce_callback_test scripts fix/document cyclic_event_tests scripts fix/document debounce_filter_test scripts fix test suspend_resume_test_initial fix shutdown of offer_test_service Adds ets and verification job to check (non-voting) add SO_REUSEPORT option + logging Mark all normal log messages as public misc: fix client-side-logging crash
fcmonteiro
approved these changes
Oct 27, 2025
duartenfonseca
approved these changes
Oct 27, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.