-
Notifications
You must be signed in to change notification settings - Fork 30
Addition of network abstraction layer to separate socket implementation code from network communication logic #508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…create_clock_server, because there are no plans using other network stacks rather than UDP.
…e. && Change all read() write(), and shutdown() to use netdrv
…work driver is initialized, and send default values when not initialized.
edwardalee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't made it all the way though, but it's looking pretty good! One issue is that while I think "network abstraction" is a good term to use for the general architecture, I find the resulting names unnecessarily long. I suggest dropping the "abstraction" in most cases. E.g.:
fed->fed_net_abstraction -> fed->net
write_to_net_abstraction -> write_to_net
write_to_net_abstraction_fail_on_error -> write_to_net_fail_on_error
read_from_net_abstraction_fail_on_error -> read_from_net_fail_on_error
Hello Prof. Lee, thank you for your comment. I shortened most of the names. net_abstractions_for_inbound_p2p_connections -> net_for_inbound_p2p_connections |
network/impl/src/socket_common.c
Outdated
|
|
||
| #include "util.h" | ||
| #include "socket_common.h" | ||
| #include "util.h" // LF_MUTEX_UNLOCK(), logging.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #include "util.h" // LF_MUTEX_UNLOCK(), logging.h | |
| #include <logging.h> | |
| #include "util.h" // LF_MUTEX_UNLOCK() |
core/federated/RTI/rti_remote.c
Outdated
| } | ||
|
|
||
| void send_physical_clock(unsigned char message_type, federate_info_t* fed, socket_type_t socket_type) { | ||
| void send_physical_clock(unsigned char message_type, federate_info_t* fed, bool use_UDP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think socket_type == UDP is more readable than true. I recommend changing this back to using the socket_type_t enum unless there is a pressing reason to change this to bool.
core/federated/clock-sync.c
Outdated
| } | ||
|
|
||
| int handle_T1_clock_sync_message(unsigned char* buffer, int socket, instant_t t2) { | ||
| int handle_T1_clock_sync_message(unsigned char* buffer, void* socket_or_net, instant_t t2, bool use_udp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument name here is inconsistent with the one above: use_UDP vs. use_udp.
|
Non-deterministically failing tests:
All fixed. |
hokeun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Just left one minor comment.
@edwardalee Hello, Professor. Do you mind taking a look once again? I finally fixed all CI tests failing problems. |
|
Will do later this week |
edwardalee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay.
This is really a significant piece of work. This time, I focused on net_abstraction.h and I have a number of questions. Some of those I've entered in the code. Overall, this looks like a good first step, but the abstraction still looks pretty "leaky" to me. There are a lot of specifics that leak into the abstraction, like arguments that are there only because the RTI and the federates handle error conditions differently. It seems like this abstraction should not be making any distinction between RTI, federate, or any other user of the abstraction. Let the caller handle errors however they like.
Most importantly, net_abstraction.h needs some documentation on how to use it.
Writing such documentation might reveal some weaknesses in the abstraction design.
What you currently do is this, for example:
// Create a network abstraction.
net_abstraction_t net = initialize_net();
// Set the received host name and port to the network abstraction.
set_server_port(net, uport);
set_server_hostname(net, hostname);
// Create the client network abstraction.
create_client(net);
if (connect_to_net(net) < 0) {
lf_print_error_and_exit("Failed to connect to federate.");
}
(there is nearly identical code in connect_to_RTI)
What this is really doing is this:
// Allocate a network abstraction.
net_abstraction_t net = initialize_net();
// Set the received host name and port to the network abstraction.
((socket_priv_t*)net)->server_port = uport;
((socket_priv_t*)net)->server_hostname = hostname;
// Create the client network abstraction.
create_client(net);
if (connect_to_net(net) < 0) {
lf_print_error_and_exit("Failed to connect to federate.");
}
It is confusing to me to initialize, then initialize some more, then create, then connect.
Also, create_client is always followed by connect_to_net, so why are these two different functions? Maybe it be better to do something like this:
socket_connection_parameters_t params;
params.type = "TCP";
params.port = uport;
params.server_hostname = hostname;
net_abstraction_t net = connect_to_net((net_params_t*)¶ms);
if (net == NULL) {
lf_print_error_and_exit("Failed to connect to federate.");
}
The base type net_params_t would have whatever fields are needed by all uses, such as the type.
Better yet, this could be abstracted for TCP usage, so the user just does this:
net_abstraction_t net = connect_via_TCP(hostname, uport);
if (net == NULL) {
lf_print_error_and_exit("Failed to connect to federate.");
}
This function would then do what the previous code does.
The function connect_via_TCP would not be part of your API, hence not declared in net_abstraction.h.
More questions:
Is get_socket_priv_t really necessary? All it does is a cast, and it seems to obscure the code. It doesn't even avoid the segfault if the argument is NULL.
It's not clear what check_net_closed will do if called on a network interface that has not been opened. Maybe a better function would be is_net_open or is_connection_open?
Regarding MSG_TYPE_ADDRESS_ADVERTISEMENT and MSG_TYPE_ADDRESS_QUERY, perhaps in a future PR these could also be abstracted. The protocol could be augmented (or just changed) to support an advertisement that could be a simple string message, to be interpreted appropriately by the client. This could in fact allow for adaptable connections. E.g., the advertisement could be a string like: "TCP:my_ip_address:my_port"
For now, I don't think get_my_port or get_server_port or get_ip_addr or get_server_hostname should be part of this abstraction. The users of these functions already know what struct to cast to. Same for all the corresponding set functions.
If you want functions, make them static functions in rti_remote.c.
| // If use UDP, apply a jitter attenuator to the estimated clock error to prevent | ||
| // large jumps in the underlying clock. | ||
| // Note that estimated_clock_error is calculated using lf_time_physical() which includes | ||
| // the clock sync adjustment. | ||
| // Use of TCP socket means we are in the startup phase, so | ||
| // rather than adjust the clock offset, we simply set it to the | ||
| // estimated error. | ||
| adjustment = (socket_type == UDP) ? estimated_clock_error / _LF_CLOCK_SYNC_ATTENUATION : estimated_clock_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not change the original logic in any way. Maybe revert to the original?
| @@ -1 +1 @@ | |||
| master | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to point back to master when that merge happens.
| * Create a network abstraction server. This is such as a server socket which accepts connections. | ||
| * However this is only the creation of the server network abstraction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Create a network abstraction server. This is such as a server socket which accepts connections. | |
| * However this is only the creation of the server network abstraction. | |
| * Create a network abstraction server. For example, this might be a server socket that accepts connections. |
|
|
||
| #include "socket_common.h" | ||
|
|
||
| typedef void* net_abstraction_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear whether an instance of net_abstraction_t is a single connection or a collection of connections of a particular type. Perhaps clarify this with a comment, like this:
/**
* Pointer to whatever data structure is used to maintain the state of a network connection or service.
* For example, this could point to an integer socket identifier if TCP is being used for network connections.
*/
| * However this is only the creation of the server network abstraction. | ||
| * | ||
| * @param net_abs Server's network abstraction. | ||
| * @param increment_port_on_retry Whether to increment the port on retry if binding fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument suggests a "leaky abstraction". The argument only applies to socket connections and is only used by the RTI on its rather funky mechanism where it increments the port if the DEFAULT_PORT is not available. This functionality could be moved to rti_remote.c, which has this:
if (create_server(rti_remote->rti_net, true)) {
lf_print_error_system_failure("RTI failed to create TCP server: %s.", strerror(errno));
return -1;
};
If the retry functionality were put here and removed from deep inside socket_common.c, then this API could be cleaned up by removing this argument.
| * The implementation should include three steps. | ||
| * 1. Initialize the network abstraction of the connected federate. | ||
| * 2. Wait for the incoming connection request. This should block until the connection is successfully accepted. | ||
| * 3. Save the information in the connected network abstraction, such as the address of the connected peer, for future | ||
| * querying address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These steps are unclear. The first step implies there is already a connected federate.
Also, why is this even referencing a federate or RTI? Shouldn't this abstraction be just about establishing a connection between two entities, without regard to what they are?
Also, this comment should state that create_server must be called first.
Also, I think the rti_chan argument should be removed and its functionality put where it is actually needed, which is in federate.c where accept_net is called. Shouldn't this handle checking the RTI and retrying? This may require another function like is_connected(net_abstraction_t) in your net_abstraction.
Co-authored-by: Edward A. Lee <[email protected]>
lf-lang/lingua-franca#2455
Summary
This PR adds a network interface layer to support interoperability in terms of network protocols and security using the
net_abstraction.The original code is very tied up with sockets, making it hard to add different network features.
However, I clarify that this PR does not remove all socket related functions out of the main flow from the RTI and federate, for two reasons.
MSG_TYPE_ADDRESS_ADVERTISEMENTandMSG_TYPE_ADDRESS_QUERYuses the port and IP address in the protocol itself.We can completely remove socket-related code using
#ifdef COMM_TYPE_TCP; however, I highlight that this PR is more concentrated on supporting end-to-end pluggable and interchangeable network security based on TCP. Thus, in this PR, I have not added the#ifdefguards, and have not changed the protocol forMSG_TYPE_ADDRESS_ADVERTISEMENTandMSG_TYPE_ADDRESS_QUERY.Features
Plugin API for
networkThis PR creates a separately-compiled library for the
networkinstead of compiling it as a part of the core runtime.I followed the prior work on
low_level_platform.handplatform.h.All source files related to networking are moved to
network/impl/src, and all headers are undernetwork/api. There are separateCMakelists.txtfor each.Add
COMM_TYPEtarget property.The
comm-typekeyword is available in theCtarget as follows.Currently only supports
TCP, and plan to supportSSTfor security.Refactoring on clock-synchronization.
Cloxk synchronization still uses UDP. So, I left all clock-sync functions to directly use UDP sockets, and refactored these functions.
rti_remote.csend_physical_clock()andhandle_physical_clock_sync_message(): Uses boolean flaguse_UDPwhen UDP socket used.clock-sync.chandle_T4_clock_sync_message(): Uses boolean flaguse_UDPwhen UDP socket used.handle_T1_clock_sync_message(): Usevoid* socket_or_net_abstractionas parameter to support both socket and network abstraction.handle_address_ad()andhandle_address_query()There are no changes in the protocol.
As explained in the summary,
MSG_TYPE_ADDRESS_ADVERTISEMENTandMSG_TYPE_ADDRESS_QUERYuse port numbers and IP addresses in the protocol itself. To explain further, for physical connections or decentralized coordination, thefederateAhas to know the peerfederateB's port number and IP address to directly connect to it. This is done byfederateAsending aMSG_TYPE_ADDRESS_ADVERTISEMENTto theRTIit's port, and theRTIsends aMSG_TYPE_ADDRESS_QUERY_REPLYmessage to the peerfederateB, including the port number and IP address. Thus, the port number and IP address itself cannot be encapsulated under the network abstraction layer, as it is included in the protocol.Therefore, there are some
get()andset()calls to the network interface.FedA- >RTI:MSG_TYPE_ADDRESS_ADVERTISEMENT:FedAcallsget_my_port()to send its own port to theRTI.FedB->RTI:MSG_TYPE_ADDRESS_QUERY: No changes.RTI->FedB:MSG_TYPE_ADDRESS_QUERY_REPLY:RTIcallsget_server_port()andget_ip_addr()to encode it to the message to send toFedB.FedB: Sets the received port and IP address byset_server_port()andset_server_host_name()to directly connect toFedA.Minor logic change: Move
getpeername()logic toaccept_net_abstraction()The RTI should know the connected federate(FedA)'s IP address, to pass the IP address to the other federate(FedB), as explained above.
Before, this was done in
rti_remote.c'sreceive_and_check_fed_id_message(). I moved this tolf_socket_support.c'saccept_net_abstraction(), because it looks more appropriate to set the connected peer's information inside this function.One inefficiency that happens is that in decentralized coordination, the server federate does not need to save the connected peer federate's IP address. However, I think this will barely affect the performance.
Add default UDP port number as 15061.
The UDP port was usually set to the
RTI's port + 1, inrti_remote.c'screate_server()function call. However, I did not want to expose the port number in thecreate_server()interface, so I took the parameterportout fromcreate_server().Minor changes
start_rti_server()function does not get theportas a parameter. Theportwill be saved inrti_remote_t;suser_specified_port, with a default value when not set up.