vSomeIP 3.5.8 Release #956

vvcarvalho-csw · 2025-09-16T13:05:28Z

Commit 211cc6a should address these topics:

Fixes issues with suspend resume tests by introducing sleeps that helps when processing data in systems with less resources. Before these changes, there were multiple issues with timing when threads were being created and SOME/IP packets were being received. The first sleep is done in client side in the start. This is done since in systems with less resources, the runner_ thread was created after the condition variable (cv) in start function was notified. What this lead to was that the cv in run would never be notified since the wait would start after it was notified in start. The other change is done in service when it receives and unsubscribe. Now the service sleeps for longer, since there was an issue where it would no process the subscribes/unsubscribes from client in a correct order, leading to the test failing. Now with the 3 seconds sleeps the correct order is ensured and the test has not failed with this change.

std::filesystem is supported since C++17, no need for boost. On my PC (WSL+Debian), boost::filesystem has a linking problem. However, the main motivation is to modernise vsomeip.

Add configuration for TCP keepalive parameters, also for along TCP_USER_TIMEOUT.

Covers the ON_UNAVAILABLE that applications see when underlying connections break

Fix availability handler client deadlock. Sporadically, the availability tests failed by timeout since the client application didn't terminate properly and continued on loop trying to connect to host. The client shutdown thread was blocked waiting for the dispatch thread which was trying to acquire the client inter-process mutex in the availability callback, which was held by the client in the stop method. Why sporadic? It depends on (non-deterministic) locking order of client mutex between the dispatch thread and the stop method.

Adds a network test that checks whether the epsilon change function is being correctly used to debounce outgoing events.

Reverts 544749b

With TCP communication, netlink is used by a host vsomeip application to determine when the "local" interface is ready. Instead of using netlink (which is Linux-specific), use IP_FREEBIND (also Linux-specific) in order to bind the local TCP ports regardless of network state. It is much simpler, and bypasses the problem of waiting for the local interface. Same is done in rmc, see e8b6519

If an application (or libvsomeip itself) accidentally closes a descriptor, following syscalls receive EBADF. There are plenty of libraries (e.g., libsystemd, libudev, ...) that will immediately SIGABRT on such a case, to signal application misbehavior It is somewhat difficult to do the same in libvsomeip because of the use of boost asio, but follow the same principle, and add close(2) and other wrappers that react to EBADF.

Ensures that all accesses to targets_ are synchronised by mutex_. There is a segfault when usei::send_queued_unlocked performs the following operation: const auto its_entry = _it->second.queue_.front(); The back trace is \#1 __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x402e2cbc6794f03) \#2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x402e2cbc6794efb) \#3 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count (__r=..., this=<synthetic pointer>) \#4 std::__shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> >, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (this=<synthetic pointer>) \#5 std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >::shared_ptr (this=<synthetic pointer>) \#6 std::pair<std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned int>::pair (this=<synthetic pointer>) \#7 vsomeip_v3::udp_server_endpoint_impl::send_queued_unlocked (this=this@entry=0x7f84002180, _it=...) \#8 vsomeip_v3::udp_server_endpoint_impl::send_queued (this=0x7f84002180, _it=...) \#9 vsomeip_v3::server_endpoint_impl<boost::asio::ip::udp>::send_cbk (this=0x7f84002180, This means that it, which is of type target_data_iterator_type, pointed to corrupted data. It had been retrieved from server_endpoint_impl::send_cbk, which obtained it as follows std::lock_guard<std::mutex> its_lock(mutex_); auto it = targets_.find(_key); if (it == targets_.end()) return; If all accesses to targets_ are synchronised using mutex_, it should not be possible to corrupt the it pointer. This is what is done in this PR.

Remove leftovers from application signal handling logic, see 2709230. While at it, improve a couple of logs.

Check if a service is available before sending a request to it. This would prevent us from trying to process a request which might not reach its destination, and allows us to immediately report an error to the application. Process offer service before on_availability callback being invoked, with previous order, clients could send a request while the service is still not added to the control map. Npdu test is updated to enforce the clients to request services before sending any request type message, without the request, the host will not send the ADD_SERVICE routing info command and consequently on the client side the service will constantly be assessed as unavailable.

Created a new CMake flag, ANDROID_CI_BUILD, for building with NDK. This changes are only to be used on the CI, and with this, a check job will verify if every PR builds for ANDROID. This will not influence the regular ANDROID build.

Check that connection wasn't stopped before to repeat the receive. After refactoring udp_server_endpoint_impl, there is a lifecycle_index which is used to check whether the result of an asynchronous operation must be ignored. The access to this variable cannot be synchronized because the call to on_message_received_unlocked must be lock-free. But after calling on_message_received_unlocked, we must check again the value of lifecycle_index before to repeat the asynchronous operation. And this second check was missing. Without this check, the asynchronous operation can be repeated, even if a stop was requested in between.

Fixed a minor threading issue in the routing handler implementation. Problem reported while executing the test offer_test_local in CI.

The function that automatically restored joins at startup was defective. The function was faulty because it was performing invalid checks. The need behind this function is unknown. It could be a performance optimization. There were two options: delete it because it seems redundant, or fix it. The second option has been chosen. This function checked whether the group was joined successfully before. But after a stop, this information is no longer valid. This check has been removed. It also checked whether the socket has been created previously. However, the socket may have been created by set_multicast_option, in which case we still need to join the groups. This check has been removed as well. We can remove both checks because redundant join operations are ignored by set_multicast_option.

Problem scenario: Client successfully connects for Routing HOST; Before the client starts the Registration process it receives EOF (unknown reasons); Client retries to reconnect with HOST, however host denies because its connection with this client already exists; Client gets stuck in this loop. The main issue is that, because the TCP keep-alive is disabled, for clients in the same VM/SOC, the HOST never discovers that it lost the connection. With keep-alive enabled, this process is recoverable, HOWEVER, it takes more time, equal to what are the tcp_keekaplive settings. Solution: The proposed is to simply replace the previous connection, as it is a faster recovery. Also adds test to replicate the issue. Adds to the socket_manager the ability to allow the disconnection of only one side of the endpoint.

Summary Infer routing information (client-id, guest address) for a local server connection (e.g., ltsei or lusei) from the first received message Details It is necessary to fundamentally solve the following race condition: client -> server connection breaks; implementation also breaks server -> client as a reaction client does REQUEST_SERVICE_ID routing sends ROUTING_INFO+CONFIG_ID to both client+server assume server receives ROUTING_INFO first assume server -> client ECONNRESET is received now (and not earlier) server erases ROUTING_INFO due to 5. assume client receives ROUTING_INFO second, sends SUBSCRIBE to server server does not respond due to lack of routing info Fundamentally, this race (and similar) happens because: a. routing information comes through routing -> server connection b. routing information is cleaned based on client <-> server connection The race can only be solved with strict order of events (difficult for events such as connection breaks), or by sending the information in the same channel. This commit does the later, using as a starting point work already done in fe4e94c. While at it: add test coverage + "command-peeking" capability (based on Kai's work) add logging of own client-id in emb use a single registration thread, fixes a re-registration race remove pending sub logic, no longer necessary with routing inference enable connection break tests, fixed with this commit format internal.hpp.in, was forgotten in 7e60ee2.

Tombstone fix: In android boost::property_tree::ptree_error does not inherit from std::exception. Change the catch to "..." as we are also ignoring it. Log fix: The start function is called on every receive_cbk, so the log can not be there. This PR moves log from the connection to the server_endpoint accept_cbk. New log example: ltsei::accept_cbk: server(169.254.87.2:31000) accepted connection with client(169.254.87.2:31495), endpoint > 0x55555563de60 cmake fix: update cmake_minimum_required to be the same every where.

Summary Remove superfluous semicolon that causes compilation failure with -pedantic and certain toolchains.

Changes: - Fix tombstone on android application load and spamming log - routing: infer routing info - Mitigate to add already existing connection - Set test results artifact expiration - udp-server: fix automatic rejoin operations at startup - routing manager implementation, tiny threading issue - fix regression introduced by usei refactoring - zuul: make clang-format a voting check - Android CI Build - Check availability on send - application_impl: minor cleanup - control access to targets_ - misc: react to EBADF - rms: drop use of netlink - Revert "NO-TICKET: remove dependency to boost filesystem" - Debounce Epsilon Tests - Solve availability test deadlock - rms: add missing ON_UNAVAILABLE log - override host kernel TCP keepalive params - remove dependency to boost filesystem - suspend resume tests fix - Fix E2E protection checker counter increment (#889) - Fix queue size recheck condition - app: fix thread names on ASSIGN_CLIENT_ACK_ID - flaky test someip_application_init_test.multithread_init - Document offer_stop_offer_test - Remove superfluous semicolon - fix github actions (#940)

vvcarvalho-csw force-pushed the master branch from b46b5be to 10f78e4 Compare September 16, 2025 13:37

vvcarvalho-csw closed this Sep 30, 2025

vvcarvalho-csw force-pushed the master branch from 5828f90 to 15ceb66 Compare September 30, 2025 08:51

andrefesilva and others added 18 commits September 30, 2025 09:52

remove dependency to boost filesystem

544749b

std::filesystem is supported since C++17, no need for boost. On my PC (WSL+Debian), boost::filesystem has a linking problem. However, the main motivation is to modernise vsomeip.

override host kernel TCP keepalive params

34dc9d1

Add configuration for TCP keepalive parameters, also for along TCP_USER_TIMEOUT.

rms: add missing ON_UNAVAILABLE log

c5bb0d4

Covers the ON_UNAVAILABLE that applications see when underlying connections break

Debounce Epsilon Tests

3224c8f

Adds a network test that checks whether the epsilon change function is being correctly used to debounce outgoing events.

Revert "remove dependency to boost filesystem"

e8a1c94

Reverts 544749b

application_impl: minor cleanup

834750c

Remove leftovers from application signal handling logic, see 2709230. While at it, improve a couple of logs.

Android CI Build

a986179

Created a new CMake flag, ANDROID_CI_BUILD, for building with NDK. This changes are only to be used on the CI, and with this, a check job will verify if every PR builds for ANDROID. This will not influence the regular ANDROID build.

routing manager implementation, tiny threading issue

598abb1

Fixed a minor threading issue in the routing handler implementation. Problem reported while executing the test offer_test_local in CI.

vvcarvalho-csw reopened this Sep 30, 2025

vvcarvalho-csw force-pushed the master branch from ad3facb to 9a4a245 Compare October 8, 2025 07:24

Victor Carvalho and others added 3 commits October 9, 2025 16:08

Remove superfluous semicolon (#1314)

7eb855c

Summary Remove superfluous semicolon that causes compilation failure with -pedantic and certain toolchains.

vvcarvalho-csw force-pushed the master branch from 9a4a245 to 8506280 Compare October 9, 2025 15:33

fcmonteiro merged commit e89240c into COVESA:master Oct 13, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vSomeIP 3.5.8 Release #956

vSomeIP 3.5.8 Release #956

Uh oh!

vvcarvalho-csw commented Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vSomeIP 3.5.8 Release #956

vSomeIP 3.5.8 Release #956

Uh oh!

Conversation

vvcarvalho-csw commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vvcarvalho-csw commented Sep 16, 2025 •

edited

Loading