Skip to content

Conversation

@duartenfonseca
Copy link
Collaborator

No description provided.

Jorge Saraiva and others added 7 commits July 7, 2025 12:40
Summary
Removes access member sec rule from received responses
Details
The rule vsomeip_sec_policy_is_client_allowed_to_access_member, was
considered to be redundant, therefore it is now removed.
This PR also resolve the introduced regression with 0622382
Details
When an endpoint is no longer reachable (after a suspend, for example),
how quickly the TCP connection takes to timeout depends on its' state:

with no unacknowledged data, it depends on TCP keepalive, and depends
on the underlying kernel configuration (interval, number-of-probes, ...)
with unacknowledged data, it depends on the configuration for TCP
transmission, the number of retries, and exponential timeouts

It is known that devices sometimes  have very aggressive
TCP keepalive parameters (3 probes, 1 sec).
The TCP retransmission parameters however are usually default, and it can
take many, many, many minutes until a TCP connections due to
retransmission exhastion
This can lead to  Android <-> Node0
connections to timeout quickly, after ~3s, when Android suspends. However,
connections which happen to have a lot of data flowing through (and thus
unacknowledged data), would often never timeout. This is problematic, as:
a) once Android resumes, those those "stale" connections need a TCP
retransmission to recover, which can take up to 2 minutes
b) in the meantime, VSIP was accumulating gigantic buffers, risking a OOM
In some cases, it even seems deterministic - connection between VSIP and
a app would never timeout after Android suspends, due to the sheer volume
of data that goes through
To cope with this gigantic mess, use TCP_USER_TIMEOUT, which causes a
connection to timeout when data remains unacknowledged for more than
than a certain amount of time
There are many, many advantages to TCP_USER_TIMEOUT:

it is extremely granular, can set timeouts in the order of
milliseconds
(which might be interesting for vsomeip, as all communication is local)
it overrides both TCP keepalive and TCP transmission mechanisms, such
that it becomes easier to reason about when a connection will timeout
it ties quite nicely with TCP keepalive, which ensures that there is
periodically some data to acknowledge
(without TCP keepalive, a connection would otherwise remain stale)

While at it, disable keep alive for same-SoC/VM connections
Enables Valgrind (massif and memcheck) and removes Valgrind Helgrind
Details
Activate flag for NOT skipping valgrinds to also track subprocesses
Removal of Valgrind Helgrind (docs and jobs)
Summary
Valgrind memcheck test corrections

Details
Valgrind memcheck test corrections
Short summary for the suppression's used
offer_thread_bind_tls
Valgrind can't follow TLS memory until the thread is completely destroyed,
therefore can't confirm whether the thread cleans it up after exit, hence it marks
it "possibly lost."
Thread is joined in the object (offer_test_service) destructor.
dispatcher_threads_tls, stop_thread_tls, io_threads_tls
When app_->start() is called, threads are internally created.
Valgrind can't keep track cleanup of thread shutdown for TLS allocations, thus it marks
as "possibly lost".
All other suppressions (load_policy_data)
These errors come with the usage of the function configuration_impl::load_policy_data
There is no leak directly within the configuration_element class or its members. Structure
uses value semantics std::string and boost:property_tree::ptree.
False positive, likely trigger by external resources or Boost usage.
Summary
Some tests report status PASSED even when some assertions fail
Details
On some tests that use .sh starter files the check for the exit codes of the launched
binaries is missing.
This leads to a false positive where even if some of the assertions fail
Summary
Create a test to verify if the availability handler is working correctly.
Details
Previously, an issue was detected where, in the case of a handler re-register, the handler would not
become available again when the service stopped offering and then started offering again.
Test Case:

Server: offer a service
Client: register an availability handler for it → UNKNOWN
Client requests the service
Client: wait for AVAILABLE
Client: re-register the availability handler → AVAILABLE
Server: stop the service → UNAVAILABLE
Server: offer again → a new AVAILABLE should appear
Details
The Out of Memory condition that triggers the yellowscreen happens when clients
repeatedly attempt to reconnect.
The root cause occurs when the TCP ACK for the registration message is delayed,
leading clients to continuously retry the connection.
In this test, a service application (registration_check_service.cpp) attempts to register
with the routing manager.
During this process, a Python script (someip_sequence_checker.py) is activated to
intercept outgoing TCP packets using NetfilterQueue and Scapy, monitoring for a
specific sequence of SOMEIP protocol messages during the registration phase.
@duartenfonseca duartenfonseca merged commit ad4b688 into master Jul 7, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants