-
Notifications
You must be signed in to change notification settings - Fork 766
vSomeIP 3.5.8 Release #956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b46b5be to
10f78e4
Compare
5828f90 to
15ceb66
Compare
Fixes issues with suspend resume tests by introducing sleeps that helps when processing data in systems with less resources. Before these changes, there were multiple issues with timing when threads were being created and SOME/IP packets were being received. The first sleep is done in client side in the start. This is done since in systems with less resources, the runner_ thread was created after the condition variable (cv) in start function was notified. What this lead to was that the cv in run would never be notified since the wait would start after it was notified in start. The other change is done in service when it receives and unsubscribe. Now the service sleeps for longer, since there was an issue where it would no process the subscribes/unsubscribes from client in a correct order, leading to the test failing. Now with the 3 seconds sleeps the correct order is ensured and the test has not failed with this change.
std::filesystem is supported since C++17, no need for boost. On my PC (WSL+Debian), boost::filesystem has a linking problem. However, the main motivation is to modernise vsomeip.
Add configuration for TCP keepalive parameters, also for along TCP_USER_TIMEOUT.
Covers the ON_UNAVAILABLE that applications see when underlying connections break
Fix availability handler client deadlock. Sporadically, the availability tests failed by timeout since the client application didn't terminate properly and continued on loop trying to connect to host. The client shutdown thread was blocked waiting for the dispatch thread which was trying to acquire the client inter-process mutex in the availability callback, which was held by the client in the stop method. Why sporadic? It depends on (non-deterministic) locking order of client mutex between the dispatch thread and the stop method.
Adds a network test that checks whether the epsilon change function is being correctly used to debounce outgoing events.
With TCP communication, netlink is used by a host vsomeip application to determine when the "local" interface is ready. Instead of using netlink (which is Linux-specific), use IP_FREEBIND (also Linux-specific) in order to bind the local TCP ports regardless of network state. It is much simpler, and bypasses the problem of waiting for the local interface. Same is done in rmc, see e8b6519
If an application (or libvsomeip itself) accidentally closes a descriptor, following syscalls receive EBADF. There are plenty of libraries (e.g., libsystemd, libudev, ...) that will immediately SIGABRT on such a case, to signal application misbehavior It is somewhat difficult to do the same in libvsomeip because of the use of boost asio, but follow the same principle, and add close(2) and other wrappers that react to EBADF.
Ensures that all accesses to targets_ are synchronised by mutex_. There is a segfault when usei::send_queued_unlocked performs the following operation: const auto its_entry = _it->second.queue_.front(); The back trace is \#1 __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x402e2cbc6794f03) \#2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x402e2cbc6794efb) \#3 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count (__r=..., this=<synthetic pointer>) \#4 std::__shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> >, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (this=<synthetic pointer>) \#5 std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >::shared_ptr (this=<synthetic pointer>) \#6 std::pair<std::shared_ptr<std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned int>::pair (this=<synthetic pointer>) \#7 vsomeip_v3::udp_server_endpoint_impl::send_queued_unlocked (this=this@entry=0x7f84002180, _it=...) \#8 vsomeip_v3::udp_server_endpoint_impl::send_queued (this=0x7f84002180, _it=...) \#9 vsomeip_v3::server_endpoint_impl<boost::asio::ip::udp>::send_cbk (this=0x7f84002180, This means that it, which is of type target_data_iterator_type, pointed to corrupted data. It had been retrieved from server_endpoint_impl::send_cbk, which obtained it as follows std::lock_guard<std::mutex> its_lock(mutex_); auto it = targets_.find(_key); if (it == targets_.end()) return; If all accesses to targets_ are synchronised using mutex_, it should not be possible to corrupt the it pointer. This is what is done in this PR.
Remove leftovers from application signal handling logic, see 2709230. While at it, improve a couple of logs.
Check if a service is available before sending a request to it. This would prevent us from trying to process a request which might not reach its destination, and allows us to immediately report an error to the application. Process offer service before on_availability callback being invoked, with previous order, clients could send a request while the service is still not added to the control map. Npdu test is updated to enforce the clients to request services before sending any request type message, without the request, the host will not send the ADD_SERVICE routing info command and consequently on the client side the service will constantly be assessed as unavailable.
Created a new CMake flag, ANDROID_CI_BUILD, for building with NDK. This changes are only to be used on the CI, and with this, a check job will verify if every PR builds for ANDROID. This will not influence the regular ANDROID build.
Check that connection wasn't stopped before to repeat the receive. After refactoring udp_server_endpoint_impl, there is a lifecycle_index which is used to check whether the result of an asynchronous operation must be ignored. The access to this variable cannot be synchronized because the call to on_message_received_unlocked must be lock-free. But after calling on_message_received_unlocked, we must check again the value of lifecycle_index before to repeat the asynchronous operation. And this second check was missing. Without this check, the asynchronous operation can be repeated, even if a stop was requested in between.
Fixed a minor threading issue in the routing handler implementation. Problem reported while executing the test offer_test_local in CI.
The function that automatically restored joins at startup was defective. The function was faulty because it was performing invalid checks. The need behind this function is unknown. It could be a performance optimization. There were two options: delete it because it seems redundant, or fix it. The second option has been chosen. This function checked whether the group was joined successfully before. But after a stop, this information is no longer valid. This check has been removed. It also checked whether the socket has been created previously. However, the socket may have been created by set_multicast_option, in which case we still need to join the groups. This check has been removed as well. We can remove both checks because redundant join operations are ignored by set_multicast_option.
Problem scenario: Client successfully connects for Routing HOST; Before the client starts the Registration process it receives EOF (unknown reasons); Client retries to reconnect with HOST, however host denies because its connection with this client already exists; Client gets stuck in this loop. The main issue is that, because the TCP keep-alive is disabled, for clients in the same VM/SOC, the HOST never discovers that it lost the connection. With keep-alive enabled, this process is recoverable, HOWEVER, it takes more time, equal to what are the tcp_keekaplive settings. Solution: The proposed is to simply replace the previous connection, as it is a faster recovery. Also adds test to replicate the issue. Adds to the socket_manager the ability to allow the disconnection of only one side of the endpoint.
Summary Infer routing information (client-id, guest address) for a local server connection (e.g., ltsei or lusei) from the first received message Details It is necessary to fundamentally solve the following race condition: client -> server connection breaks; implementation also breaks server -> client as a reaction client does REQUEST_SERVICE_ID routing sends ROUTING_INFO+CONFIG_ID to both client+server assume server receives ROUTING_INFO first assume server -> client ECONNRESET is received now (and not earlier) server erases ROUTING_INFO due to 5. assume client receives ROUTING_INFO second, sends SUBSCRIBE to server server does not respond due to lack of routing info Fundamentally, this race (and similar) happens because: a. routing information comes through routing -> server connection b. routing information is cleaned based on client <-> server connection The race can only be solved with strict order of events (difficult for events such as connection breaks), or by sending the information in the same channel. This commit does the later, using as a starting point work already done in fe4e94c. While at it: add test coverage + "command-peeking" capability (based on Kai's work) add logging of own client-id in emb use a single registration thread, fixes a re-registration race remove pending sub logic, no longer necessary with routing inference enable connection break tests, fixed with this commit format internal.hpp.in, was forgotten in 7e60ee2.
Tombstone fix: In android boost::property_tree::ptree_error does not inherit from std::exception. Change the catch to "..." as we are also ignoring it. Log fix: The start function is called on every receive_cbk, so the log can not be there. This PR moves log from the connection to the server_endpoint accept_cbk. New log example: ltsei::accept_cbk: server(169.254.87.2:31000) accepted connection with client(169.254.87.2:31495), endpoint > 0x55555563de60 cmake fix: update cmake_minimum_required to be the same every where.
Summary Remove superfluous semicolon that causes compilation failure with -pedantic and certain toolchains.
Changes: - Fix tombstone on android application load and spamming log - routing: infer routing info - Mitigate to add already existing connection - Set test results artifact expiration - udp-server: fix automatic rejoin operations at startup - routing manager implementation, tiny threading issue - fix regression introduced by usei refactoring - zuul: make clang-format a voting check - Android CI Build - Check availability on send - application_impl: minor cleanup - control access to targets_ - misc: react to EBADF - rms: drop use of netlink - Revert "NO-TICKET: remove dependency to boost filesystem" - Debounce Epsilon Tests - Solve availability test deadlock - rms: add missing ON_UNAVAILABLE log - override host kernel TCP keepalive params - remove dependency to boost filesystem - suspend resume tests fix - Fix E2E protection checker counter increment (#889) - Fix queue size recheck condition - app: fix thread names on ASSIGN_CLIENT_ACK_ID - flaky test someip_application_init_test.multithread_init - Document offer_stop_offer_test - Remove superfluous semicolon - fix github actions (#940)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Commit 211cc6a should address these topics: