-
Notifications
You must be signed in to change notification settings - Fork 154
Feat[STATS]: Partition replication time #909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f005255
to
7cc82d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build 3028 of commit 7cc82d2 has completed with FAILURE
@waldgange Can you put in the description what exact metric are you reporting? |
Back to Anton for more research |
ccbf1cf
to
95a3533
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build 3052 of commit 95a3533 has completed with FAILURE
95a3533
to
5d68561
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build 3054 of commit 5d68561 has completed with FAILURE
Signed-off-by: Anton Pryakhin <[email protected]>
5d68561
to
74265e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build 3055 of commit 74265e7 has completed with FAILURE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few thoughts:
- Need to double check if
e_PARTITION_REPLICATION
metric for the weak consistency justifies the price of extra Receipts traffic. - Rather than introducing extra
StorageMessageAttributes
field, (suppose the answer to the 1. is yes), Replicas can just send Receipt unconditionally. Primary can add PUTs tod_unreceipted
unconditionally (with a new flag, maybe). Primary then can skip callingonReplicatedBatch
for weak consistency messages. - Backward compatibility. Looks like (but need to double-check and test) extra Receipts do not hurt (older versions)
- There is
mqbs::ReplicatedStorage::isStrongConsistency
which may be useful.
Wrt 1. the consensus is to NOT use the new metric for weak consistency yet. That should reduce the scope of this PR
This PR introduces a new partition metric
e_PARTITION_REPLICATION_TIME_NS
. It represents the time in nanoseconds it takes to store a message record at primary and replicate it to a majority of nodes in the cluster. We report average and maximum time during the report interval only from the primary node. The replication time is measured for both strong and eventual consistency modes. To make it possible we started collecting receipts from replicas regardless of the level of consistency. Also with this PR we stop reporting ACK time from primaries. Now we only do it from "first-hop" nodes