-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add shard write-load to cluster info #131496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shard write-load to cluster info #131496
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
@@ -527,8 +538,6 @@ public ClusterInfo getClusterInfo() { | |||
estimatedHeapUsages.put(nodeId, new EstimatedHeapUsage(nodeId, maxHeapSize.getBytes(), estimatedHeapUsage)); | |||
} | |||
}); | |||
final Map<String, NodeUsageStatsForThreadPools> nodeThreadPoolUsageStats = new HashMap<>(); | |||
nodeThreadPoolUsageStatsPerNode.forEach((nodeId, nodeWriteLoad) -> { nodeThreadPoolUsageStats.put(nodeId, nodeWriteLoad); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appeared to be just a copy, which already happens in the ClusterInfo
constructor, I assume remnants of something that was since refactored away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all my comments are straightforward to address, so I'll go ahead and approve now 👍
@@ -621,6 +621,7 @@ public void testUnassignedAllocationPredictsDiskUsage() { | |||
ImmutableOpenMap.of(), | |||
ImmutableOpenMap.of(), | |||
ImmutableOpenMap.of(), | |||
ImmutableOpenMap.of(), | |||
ImmutableOpenMap.of() | |||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I missed a handful of these in the builder conversion 😏 Oops..
InternalClusterInfoService.buildShardLevelInfo(stats, shardSizes, shardDataSetSizes, routingToPath, new HashMap<>()); | ||
InternalClusterInfoService.buildShardLevelInfo( | ||
stats, | ||
shardWriteLoads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below there are checks on the results. Are there some checks that can be added for shardWriteLoads
results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in c5ceb92
@@ -59,9 +59,10 @@ public class ClusterInfo implements ChunkedToXContent, Writeable { | |||
final Map<NodeAndPath, ReservedSpace> reservedSpace; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class comment further up ^ could use an update at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in cfe5759
@@ -159,7 +159,8 @@ public ClusterInfo getClusterInfo() { | |||
dataPath, | |||
Map.of(), | |||
estimatedHeapUsages, | |||
nodeThreadPoolUsageStats | |||
nodeThreadPoolUsageStats, | |||
allocation.clusterInfo().getShardWriteLoads() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be saved as a private variable initialized in the ClusterInfoSimulator constructor? Similar to estimatedHeapUsages and nodeThreadPoolUsageStats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we won't use it at this level in the ClusterInfoSimulator
and I don't think we'll modify it as part of the simulation either so I think it's fine to pass it through.
public Map<ShardId, Double> getShardWriteLoads() { | ||
return shardWriteLoads; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the #equals
, #hashCode
methods below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we add equals and hashCode just for testing, but I updated them in cd3169f
); | ||
|
||
try { | ||
// Force a ClusterInfo refresh to run collection of the node thread pool usage stats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste failed you here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 7886659
} | ||
} | ||
// And that at least one is greater than zero | ||
assertThat(maximumLoadRecorded, greaterThan(0.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this assert be per index? Since all the indices received writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we insert between 1 and 500 documents in indices with between 1 and 3 shards, so it's possible some shards will not be written to. This is just a sanity check anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow. Some shards won't receive writes, I agree. But all indices will have writes (to at least one shard), and thus maximumLoadRecorded would have a value per index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, you're right. For some reason I read that as per shard. Indeed it could be tightened up to be per index. I'll put up a small PR to do that.
server/src/internalClusterTest/java/org/elasticsearch/index/shard/IndexShardIT.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Dianna Hohensee <[email protected]>
We already use
TransportIndicesStatsAction
to retrieve size on disk for the disk threshold decider.This PR just adds the
Indexing
stats to that request when the write load decider is enabled, and adds the returned shard write loads to theClusterInfo
.We should be able to feed these into the simulator to properly account for shard movement.
Relates: ES-12419 & ES-12420