vSomeIP 3.5.9 Release #968

vvcarvalho-csw · 2025-10-23T12:57:02Z

No description provided.

Implement a clean shutdown of offer_test_service. The application was interrupted using SIGTERM, and Valgrind sometimes reported memory leaks. By performing a clean shutdown, this error no longer occurs.

Fix race condition in test suspend_resume_test_initial when a notification is received after the unavailability. the test failed because a notification was received when the service was unavailable. 2025-08-11 14:44:47.829423 suspend_resume_test_service [info] rmi::set_routing_state Set routing to suspend mode, diagnosis mode is inactive. 2025-08-11 14:44:47.829768 suspend_resume_test_client [debug] on_message: Received event. 2025-08-11 14:44:47.829864 suspend_resume_test_client [debug] on_availability: Test service is NOT available. 2025-08-11 14:44:47.829890 suspend_resume_test_client [debug] on_message: Received event. 2025-08-11 14:44:47.829894 suspend_resume_test_client [debug] [TEST-cli] HasReceived Changed, triggering cv This last trace means that the variable has_received_ has been set to true. After resume, the test expected to receive a notification, which it did. But this was ignored because the variable has_received_ was not set to false. 2025-08-11 14:44:52.829535 suspend_resume_test_service [info] rmi::set_routing_state Set routing to resume mode, diagnosis mode was inactive. 2025-08-11 14:44:52.840545 suspend_resume_test_client [debug] [TEST-cli] On availability will trigger cv 2025-08-11 14:44:52.840596 suspend_resume_test_client [debug] [TEST-cli] Service Available after susp/resume: r=0 2025-08-11 14:44:54.836073 suspend_resume_test_client [debug] on_message: Received event. There is a defect in the test that requested two notifications by generating unsubscribe/subscribe commands without waiting for confirmation of these operations. The result of these two operations interfered with the expected sequence.

add check to tests return code and documentation for debounce_filter_test

add check to tests return code and documentation for cyclic_event_tests

add check to tests return code and documentation for debounce_callback_test

add check to tests return code and documentation for debounce_frequency_test

Document offer_test_local test. Add VSOMEIP_APPLICATION_NAME to the test executables

Reduces the amount of cycles from 100 to 5. Increases test timeout to 180 seconds. This test fails in valgrind tests, which run in a slower environment. In a worst case scenario this test would take 6*100 seconds.

get_local_port would unnecessarily use getsockname syscall, and do it especially often when ACL is used While at it, remove set_local_port (shift logic to on_bind_error instead), and cleanup dead code

Unit tests of usei to reproduce time-related issues. I tried to adapt a POC usei network test, but the network tests are blackbox tests, and we cannot create routing_manager_stub or endpoint_manager_impl. Changing the interface was considered, but it would not pass review. Another solution was to use debug symbols to find the private methods, but that would be too much. For these reasons, unit tests are developed.

All Android builds also define linux, plenty of portability guides state the same (which makes sense, Android uses a Linux kernel..). This eliminates most uses of ifdef ANDROID

The test failed due to a delay of 6 ms, whereas the maximum expected delay was 5 ms. debug log have been activated and the high_resolution_clock has been replaced by the steady_clock. An issue has been found in the code generating the messages that has a time drift.

Came across missing details

The fake_socket tests already exceed the 1MB limit, which is anyhow hilariously small. Increase it by a few orders

4a0295b introduces a data race caught by the CI, where: lusei restarts (which is stop+init+start) init writes local_, which is the address of the acceptor socket (in the case of lusei, something like /var/run/someip/vsomeip-1234) however, even though stop closes the acceptor socket, an accept_cbk can execute, which due to 4a0295b, also reads local_ for logging The accept_cbk in the background is concerning and needs to be solved, this commit just mitigates the race. And of course, both lusei and ltsei have the same race, there are just more tests using lusei

add fair-sched to valgrind Set valgrind with --fair-sched=yes. This allows created threads to run with same priority as main application thread. Without this, valgrind stalled application thread execution and the app that started first would get almost no cpu time. See: https://valgrind.org/docs/manual/manual-core.html https://valgrind.org/docs/manual/manual-core.html#manual-core.pthreads_perf_sched

Enable compilation of usei_tests/ut_basic_tests with boost 1.87+ boost 1.87 retires asio::io_context::work. It was replaced by executor_work_guard.

Force remote subscriptions to be removed on host error. In some ECU modes, an emergency shutdown is executed instead of a graceful shutdown, this leads to vsomeipd not suspending, consequently, the vsomeip suspend command is not propagated to clients. For some clients, as the suspend command is not received and the unit doesn't reboot, it will not clean up the remote subscriptions map that it manages, the container is used to detect if a client is a new subscriber or not to an event (initial events are only sent if it's a new subscriber). This PR forces the clean up for scenarios where a graceful shutdown is not performed by cleaning up the container when the client detects and handles a connection lost towards host.

As it was, the result from ctest when running offer_stop_offer_test_client was not being evaluated. To fix this, keep track of the return codes of all programs started. Removed unnecessary code

Due to e8b6519 (netlink removal), libvsomeip receives a EADDRNOTAVAIL on bind, in ltcei::connect, before the network interface is ready Which by itself is expected, and not a problem - but due to broken error handling, libvsomeip continues with connect logic, does ASSIGN_CLIENT The sending of ASSIGN_CLIENT of course fails (connection is not open..), and causes a cascade of errors, which unfortunately only resolve on the ASSIGN_CLIENT_ACK timeout, which takes 3s and heals the situation. There are failures exactly because of these extra 3s. While at it, promote ASSIGN_CLIENT/REGISTER_APPLICATION timeouts to errors, as these should never happen

Abort when incorrect state is detected for critical syscall/libc calls such as recvfrom/sendto/epoll_wait. Under these situations, the process using libvsomeip is incorrectly handling file descriptors (e.g. double close with invalid fd value) leading to a libvsomeip state that is incorrect and not recoverable. The abort will cause a core dump and platforms can decide on recovery strategies.

strerror is not thread safe in older glibc, and anyhow not guaranteed to be thread safe in general It is not important, as any developer should recognize the term "errno", therefore remove all of the calls

If an application closes the epollfd that libvsomeip is using internally, libvsomeip IO threads will loop forever on a EBADF. Therefore, react on epoll_wait.

Test is fragile as there can be more client -> routing reconnects than expected, adjust check, add comment

This test was created to ensure that all clients subscribed to a remote service receives all notifications correctly when another client on the same device subscribed to the same service unsubscribes it. This happened since before it was fixed the routing manager would leave the multicast group where the service sent its notifications while there were clients still subscribed to it, leading to clients missing notifications being sent by the service

Fix remaining race condition in udp_server_endpoint_impl during restart. During a restart, a problem remained concerning buffer sharing between stopped streaming and started streaming. The solution is to allocate buffers dynamically. The unit test provided reproduces the problem and also detects an issue with a shared_pointer that was not protected.

Removed Services/multicast, as the multicast address and port must be assigned to the eventgroup as different eventgroups of the same service can use different multicast addresses/ports. Removed Eventgroups/is_multicast, as eventgroup uses the multicast address/port defined in the service node. This is superseded by the eventgroup specific multicast address/port definition Removed servicegroups, leftover from old versions of vsomeip. Removed routing-client-ports, leftover from old versions of vsomeip. Added Environment Variables info Adapted configuration test to the new changes, as the deprecated json now makes no sense to use.

Simplify the logger implementation and make is (mostly) lock-free. Refactor the logger implementation so it is lock-free for logging to both DLT, and console. This should completely remove contention and significantly improve performance in a multithreaded environment. However, one atomic load per message logged remains, due to out-of-band (and repeated) initialization of the logger. Fixing it would enable additional simplifications and optimizations. Only file logging still requires locking - there is no way around this, without relying on POSIX specifics. However, this PR also should improve performance here, as the log file no longer is opened, flushed, and closed for every single message. Also incorporates the more general optimization changes that were supposed to be merged, but somehow got lost in time.

Change tcp_tw_reuse to 2 to have the default linux values used on most ECUs tcp_tw_reuse is a Linux kernel parameter that enables the reuse of TCP sockets in the TIME_WAIT state for new outgoing connections. It has the following values: 0: Disable TIME_WAIT socket reuse 1: Enable TIME_WAIT socket reuse 2: Enable TIME_WAIT socket reuse for loopback only (Linux Default)

Fix invalid use of std::move The mutex and the condition variables were moved and then used again. It could explain the BLOCKING CALL that lead to a failure.

Use only console output, do not use dlt nor dlt-daemon, as there is anyhow a DISABLE_DLT option While at it, enable trace level for all tests

These logs should enable better tracing of connection lifecycles. On the client side the log is expanded to mention the remote address, on the server side a new log is added to mention the remote connector.

Avoid a busy loop if a failure occurs during a multicast leave operation. The leave failed because MACsec reconfigured the network interface during the normal execution. The endpoint_manager_impl doesn't repeat the leave but it repeats the join in a busy loop. By performing the leave operation even if it was already done, it will avoid the error: address already in use, when the join is done again. The error: address already in use, cannot be ignored because it happens during the restart case and it must be managed in this case. It isn't enough to fully fix the busy loop, because the join will return an error until the network interface is up again. So a delay has been added, It remains one issue: the leave isn't repeated by endpoint_manager_impl. Only the join is repeated. The correct handling of this problem would require to record that the join state as unknown. Then we would have to manage this new state and repeat of the leave operation. It is too risky for something that shall not happen.

Turns out that boost::system::errc::broken_pipe != boost::asio::error::broken_pipe This was very, very painful to find out. It is also generally true - no system::errc will ever be equal to any asio::error, they are inherently different error categories Remove dead code, fix tests. Remove all uses of system::errc because I never, ever want to deal with this again

It makes no sense to do so, because that causes libvsomeip to connect to vsomeipd on application::init, before there is ever an io thread executing events

If close is not called, the socket options can be overwritten by boost which can lead to an increased TIME_WAIT value With this change, the fake_socket required some changes in the receive_ handler clean up, to avoid lock order inversion

Remove duplicate shutdown closure of sockets. Both wait_until_sent and restart functions already call the shutdown_and_close_socket. There is no need to call it also before calling either of the other 2 functions. This PR is addressing the following scenario: cei::shutdown_and_close_socket_unlocked: socket shutdown error (107): Transport endpoint is not connected endpoint > 0xffff980b8a80 socket state > 3 cei::shutdown_and_close_socket_unlocked: not recreating socket endpoint > 0xffff980b8a80 socket state > 0 cei::shutdown_and_close_socket_unlocked: socket was not open endpoint > 0xffff980b8a80 socket state > 1 cei::shutdown_and_close_socket_unlocked: socket has been reset endpoint > 0xffff980b8a80 socket state > 0 cei::shutdown_and_close_socket_unlocked: socket was not open endpoint > 0xffff980b8a80 socket state > 1 cei::shutdown_and_close_socket_unlocked: not recreating socket endpoint > 0xffff980b8a80 socket state > 0 Where 2 extra shutdowns are called.

Update local_endpoint ClientID on add_guest This was the case: There was an endpoint established between two applications A and B. An STR occurred. The endpoint still exists, but is disconnected. It tries to reconnect, and succeeds. During this reconnection, application B re-registers to routing host and changes ClientID. Afterwards app A receives ON_AVAILABLE from app B, with a new ClientID. App A tries to connect to this new ClientID, which has the same address, and it fails with Cannot assign requested address. This fix proposes to update the local_endpoint map, to synchronize the ClientID with what we are updating in the guests map (when 'add_guest' is called).

Work around compiler bug in GCC < 10 after recent logger optimizations. GCC < 10 does not accept a struct with member initializers as a template argument for std::atomic. This was fixed in later versions; however, as the affected struct is never used without explicit initialization, just remove the initializers.

Silly one, likely there for years, but only happens if libdlt is used

For versions of libdlt that support privacy-aware logging, ensure that all normal log messages are marked as public. It was determined that vsomeip never logs sensitive data through the normal logger. Trace data containing message payloads (and thus potentially privacy- relevant information) go through a different code path, and remains marked as private for now.

SO_REUSEPORT option to mitigate the issue with Address already used. After resume, the TCP server that was closed during suspend received an unexpected error: Address already in use. The reason is not clear, and the DLT log shows that the stop was called. The workaround is to use the option SO_REUSEPORT. The same option is added to the UDP unicast server. The UDP multicast server already has error handling and does not require it. This PR also aims to improve logging, particularly by tracking init, start and stop operations, and by monitoring active connections.

Update vsomeip-lib to v3.5.9 Update local_endpoint ClientID on add_guest Make compile with GCC < 10 again tce: Remove duplicate shutdown closure of sockets Ensure socket closure on dtor rmc: remove sender start on rmc::init misc: fix use of non-asio error codes multicast, add repeat delay Expand logs to trace connections easier Minor change, typo when declaring endianness structure zuul: remove dlt flaky, subscribe_notify_test_one_event_two_eventgroups_tcp Change tcp_tw_reuse to 2 Avoid locking in and improve performance of the logger Remove leftover configurations fix race condition with on_message_received_unlocked Create multicast group test tests: fix flaky test allow_reconnects NTF integration pipeline misc: react on EBADF for epoll_wait misc: remove use of strerror Add support for VSOMEIP_ABORT_ON_CRIT_SYSCALL_ERROR ltcei: fix connect logic fix and document event_test fix offer_stop_offer_test scripts Force remote subscriptions to be removed on host error Enable usei_tests/ut_basic_tests with boost 1.87+ Post results to artifactory for batch pipelines add fair-sched to valgrind Adds remote information to client endpoint warning logs Adds vsomeip version to the cmake log lsei: fix race zuul: increase test output size lsei: minor log improvement flaky test, debounce_filter_test build: cleanup ifdef ANDROID usei, unit tests endpoint: fix get_local_port, remove set_local_port reduce time spent in test_restart_client_in_loop zuul: suppress sonarqube on catch-all exceptions document offer_test_local memcheck test doc/fix debounce_frequency_test fix/document debounce_callback_test scripts fix/document cyclic_event_tests scripts fix/document debounce_filter_test scripts fix test suspend_resume_test_initial fix shutdown of offer_test_service Adds ets and verification job to check (non-voting) add SO_REUSEPORT option + logging Mark all normal log messages as public misc: fix client-side-logging crash

vvcarvalho-csw closed this Oct 24, 2025

vvcarvalho-csw force-pushed the master branch from 4eaaadc to e89240c Compare October 24, 2025 14:07

donatellob and others added 28 commits October 24, 2025 15:08

fix shutdown of offer_test_service

9ea9193

Implement a clean shutdown of offer_test_service. The application was interrupted using SIGTERM, and Valgrind sometimes reported memory leaks. By performing a clean shutdown, this error no longer occurs.

fix/document debounce_filter_test scripts

6b5e7e2

add check to tests return code and documentation for debounce_filter_test

fix/document cyclic_event_tests scripts

cdb991b

add check to tests return code and documentation for cyclic_event_tests

fix/document debounce_callback_test scripts

80e7c4e

add check to tests return code and documentation for debounce_callback_test

doc/fix debounce_frequency_test

1420259

add check to tests return code and documentation for debounce_frequency_test

document offer_test_local memcheck test

e3d2ab8

Document offer_test_local test. Add VSOMEIP_APPLICATION_NAME to the test executables

reduce time spent in test_restart_client_in_loop

37ad127

Reduces the amount of cycles from 100 to 5. Increases test timeout to 180 seconds. This test fails in valgrind tests, which run in a slower environment. In a worst case scenario this test would take 6*100 seconds.

endpoint: fix get_local_port, remove set_local_port

23bb69f

get_local_port would unnecessarily use getsockname syscall, and do it especially often when ACL is used While at it, remove set_local_port (shift logic to on_bind_error instead), and cleanup dead code

build: cleanup ifdef ANDROID

bc66e12

All Android builds also define linux, plenty of portability guides state the same (which makes sense, Android uses a Linux kernel..). This eliminates most uses of ifdef ANDROID

lsei: minor log improvement

c5fcfc0

Came across missing details

increase test output size

0a5e8da

The fake_socket tests already exceed the 1MB limit, which is anyhow hilariously small. Increase it by a few orders

Adds vsomeip version to the cmake log

687370f

Adds remote information to client endpoint warning logs

5b60055

Enable usei_tests/ut_basic_tests with boost 1.87+

fa04537

Enable compilation of usei_tests/ut_basic_tests with boost 1.87+ boost 1.87 retires asio::io_context::work. It was replaced by executor_work_guard.

fix offer_stop_offer_test scripts

b8ba927

As it was, the result from ctest when running offer_stop_offer_test_client was not being evaluated. To fix this, keep track of the return codes of all programs started. Removed unnecessary code

fix and document event_test

615b199

misc: remove use of strerror

777c168

strerror is not thread safe in older glibc, and anyhow not guaranteed to be thread safe in general It is not important, as any developer should recognize the term "errno", therefore remove all of the calls

misc: react on EBADF for epoll_wait

6f8b867

If an application closes the epollfd that libvsomeip is using internally, libvsomeip IO threads will loop forever on a EBADF. Therefore, react on epoll_wait.

tests: fix flaky test allow_reconnects

ab4d268

Test is fragile as there can be more client -> routing reconnects than expected, adjust check, add comment

Victor Carvalho and others added 19 commits October 24, 2025 15:08

flaky, subscribe_notify_test_one_event_two_eventgroups_tcp

98a530c

Fix invalid use of std::move The mutex and the condition variables were moved and then used again. It could explain the BLOCKING CALL that lead to a failure.

remove dlt

c0efcc1

Use only console output, do not use dlt nor dlt-daemon, as there is anyhow a DISABLE_DLT option While at it, enable trace level for all tests

Minor change, typo when declaring endianness structure

e9429c5

Expand logs to trace connections easier

0768dd3

These logs should enable better tracing of connection lifecycles. On the client side the log is expanded to mention the remote address, on the server side a new log is added to mention the remote connector.

rmc: remove sender start on rmc::init

b8e6b34

It makes no sense to do so, because that causes libvsomeip to connect to vsomeipd on application::init, before there is ever an io thread executing events

Ensure socket closure on dtor

5f4a33b

If close is not called, the socket options can be overwritten by boost which can lead to an increased TIME_WAIT value With this change, the fake_socket required some changes in the receive_ handler clean up, to avoid lock order inversion

misc: fix client-side-logging crash

88dc74b

Silly one, likely there for years, but only happens if libdlt is used

vvcarvalho-csw reopened this Oct 24, 2025

fcmonteiro approved these changes Oct 27, 2025

View reviewed changes

duartenfonseca approved these changes Oct 27, 2025

View reviewed changes

fcmonteiro merged commit bb35c97 into COVESA:master Oct 28, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vSomeIP 3.5.9 Release #968

vSomeIP 3.5.9 Release #968

Uh oh!

vvcarvalho-csw commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vSomeIP 3.5.9 Release #968

vSomeIP 3.5.9 Release #968

Uh oh!

Conversation

vvcarvalho-csw commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants