Skip to content

Commit 932fd59

Browse files
authored
[FLINK-34558][state] Support tracking state size (#25837)
1 parent d14c2d5 commit 932fd59

File tree

57 files changed

+2665
-1063
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+2665
-1063
lines changed

docs/content.zh/docs/ops/metrics.md

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1368,6 +1368,157 @@ Note that for failed checkpoints, metrics are updated on a best efforts basis an
13681368
</tbody>
13691369
</table>
13701370

1371+
### State Size
1372+
1373+
<table class="table table-bordered">
1374+
<thead>
1375+
<tr>
1376+
<th class="text-left" style="width: 18%">Scope</th>
1377+
<th class="text-left" style="width: 26%">Metrics</th>
1378+
<th class="text-left" style="width: 48%">Description</th>
1379+
<th class="text-left" style="width: 8%">Type</th>
1380+
</tr>
1381+
</thead>
1382+
<tbody>
1383+
<tr>
1384+
<th rowspan="27"><strong>Task/Operator</strong></th>
1385+
<td>valueStateGetKeySize</td>
1386+
<td>The key size of get operation for value state</td>
1387+
<td>Histogram</td>
1388+
</tr>
1389+
<tr>
1390+
<td>valueStateGetValueSize</td>
1391+
<td>The value size of get operation for value state</td>
1392+
<td>Histogram</td>
1393+
</tr>
1394+
<tr>
1395+
<td>valueStateUpdateKeySize</td>
1396+
<td>The key size of update operation for value state</td>
1397+
<td>Histogram</td>
1398+
</tr>
1399+
<tr>
1400+
<td>valueStateUpdateValueSize</td>
1401+
<td>The value size of update operation for value state</td>
1402+
<td>Histogram</td>
1403+
</tr>
1404+
<tr>
1405+
<td>reducingStateGetKeySize</td>
1406+
<td>The key size of get operation for reducing state</td>
1407+
<td>Histogram</td>
1408+
</tr>
1409+
<tr>
1410+
<td>reducingStateGetValueSize</td>
1411+
<td>The value size of get operation for reducing state</td>
1412+
<td>Histogram</td>
1413+
</tr>
1414+
<tr>
1415+
<td>reducingStateAddKeySize</td>
1416+
<td>The key size of add operation for reducing state</td>
1417+
<td>Histogram</td>
1418+
</tr>
1419+
<tr>
1420+
<td>reducingStateAddValueSize</td>
1421+
<td>The value size of add operation for reducing state</td>
1422+
<td>Histogram</td>
1423+
</tr>
1424+
<tr>
1425+
<td>aggregatingStateGetKeySize</td>
1426+
<td>The key size of get operation for aggregating state</td>
1427+
<td>Histogram</td>
1428+
</tr>
1429+
<tr>
1430+
<td>aggregatingStateAddKeySize</td>
1431+
<td>The key size of add operation for aggregating state</td>
1432+
<td>Histogram</td>
1433+
</tr>
1434+
<tr>
1435+
<td>listStateGetKeySize</td>
1436+
<td>The key size of get operation for list state</td>
1437+
<td>Histogram</td>
1438+
</tr>
1439+
<tr>
1440+
<td>listStateGetValueSize</td>
1441+
<td>The value size of get operation for list state</td>
1442+
<td>Histogram</td>
1443+
</tr>
1444+
<tr>
1445+
<td>listStateAddKeySize</td>
1446+
<td>The key size of add operation for list state</td>
1447+
<td>Histogram</td>
1448+
</tr>
1449+
<tr>
1450+
<td>listStateAddValueSize</td>
1451+
<td>The value size of add operation for list state</td>
1452+
<td>Histogram</td>
1453+
</tr>
1454+
<tr>
1455+
<td>listStateAddAllKeySize</td>
1456+
<td>The key size of addAll operation for list state</td>
1457+
<td>Histogram</td>
1458+
</tr>
1459+
<tr>
1460+
<td>listStateAddAllValueSize</td>
1461+
<td>The value size of addAll operation for list state</td>
1462+
<td>Histogram</td>
1463+
</tr>
1464+
<tr>
1465+
<td>listStateUpdateKeySize</td>
1466+
<td>The key size of update operation for list state</td>
1467+
<td>Histogram</td>
1468+
</tr>
1469+
<tr>
1470+
<td>listStateUpdateValueSize</td>
1471+
<td>The value size of update operation for list state</td>
1472+
<td>Histogram</td>
1473+
</tr>
1474+
<tr>
1475+
<td>mapStateGetKeySize</td>
1476+
<td>The key size of get operation for map state</td>
1477+
<td>Histogram</td>
1478+
</tr>
1479+
<tr>
1480+
<td>mapStateGetValueSize</td>
1481+
<td>The value size of get operation for map state</td>
1482+
<td>Histogram</td>
1483+
</tr>
1484+
<tr>
1485+
<td>mapStatePutKeySize</td>
1486+
<td>The key size of put operation for map state</td>
1487+
<td>Histogram</td>
1488+
</tr>
1489+
<tr>
1490+
<td>mapStatePutValueSize</td>
1491+
<td>The value size of put operation for map state</td>
1492+
<td>Histogram</td>
1493+
</tr>
1494+
<tr>
1495+
<td>mapStateIteratorKeySize</td>
1496+
<td>The key size of iterator#next operation for map state</td>
1497+
<td>Histogram</td>
1498+
</tr>
1499+
<tr>
1500+
<td>mapStateIteratorValueSize</td>
1501+
<td>The value size of iterator#next operation for map state</td>
1502+
<td>Histogram</td>
1503+
</tr>
1504+
<tr>
1505+
<td>mapStateRemoveKeySize</td>
1506+
<td>The key size of remove operation for map state</td>
1507+
<td>Histogram</td>
1508+
</tr>
1509+
<tr>
1510+
<td>mapStateContainsKeySize</td>
1511+
<td>The key size of contains operation for map state</td>
1512+
<td>Histogram</td>
1513+
</tr>
1514+
<tr>
1515+
<td>mapStateIsEmptyKeySize</td>
1516+
<td>The key size of isEmpty operation for map state</td>
1517+
<td>Histogram</td>
1518+
</tr>
1519+
</tbody>
1520+
</table>
1521+
13711522
### RocksDB
13721523
Certain RocksDB native metrics are available but disabled by default, you can find full documentation [here]({{< ref "docs/deployment/config" >}}#rocksdb-native-metrics)
13731524

@@ -2208,6 +2359,22 @@ A larger value of this configuration will require more memory, but will provide
22082359
<span class="label label-danger">Warning</span> Enabling state-access-latency metrics may impact the performance.
22092360
It is recommended to only use them for debugging purposes.
22102361

2362+
## State key/value size tracking
2363+
2364+
Flink also allows to track the keyed state key/value size for standard Flink state-backends or customized state backends which extending from `AbstractStateBackend`. This feature is disabled by default.
2365+
To enable this feature you must set the `state.size-track.keyed-state-enabled` to true in the [Flink configuration]({{< ref "docs/deployment/config" >}}#state-backends-size-tracking-options).
2366+
2367+
Once tracking keyed state key/value size is enabled, Flink will sample the state size every `N` access, in which `N` is defined by `state.size-track.sample-interval`.
2368+
This configuration has a default value of 100. A smaller value will get more accurate results but have a higher performance impact since it is sampled more frequently.
2369+
2370+
As the type of this key/value size metrics is histogram, `state.size-track.history-size` will control the maximum number of recorded values in history, which has the default value of 128.
2371+
A larger value of this configuration will require more memory, but will provide a more accurate result.
2372+
2373+
<span class="label label-danger">Warning</span> Enabling state-size metrics may impact the performance.
2374+
It is recommended to only use them for debugging purposes.
2375+
If state.ttl is enabled, the size of the value will include the size of the TTL-related timestamp.
2376+
The value size of AggregatingState is not accounted for because AggregatingState returns a result processed by a user-defined AggregateFunction, whereas currently, only the actual stored data size in the state can be tracked.
2377+
22112378
## REST API integration
22122379

22132380
Metrics can be queried through the [Monitoring REST API]({{< ref "docs/ops/rest_api" >}}).

docs/content/docs/ops/metrics.md

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1358,6 +1358,157 @@ Note that for failed checkpoints, metrics are updated on a best efforts basis an
13581358
</tbody>
13591359
</table>
13601360

1361+
### State Size
1362+
1363+
<table class="table table-bordered">
1364+
<thead>
1365+
<tr>
1366+
<th class="text-left" style="width: 18%">Scope</th>
1367+
<th class="text-left" style="width: 26%">Metrics</th>
1368+
<th class="text-left" style="width: 48%">Description</th>
1369+
<th class="text-left" style="width: 8%">Type</th>
1370+
</tr>
1371+
</thead>
1372+
<tbody>
1373+
<tr>
1374+
<th rowspan="27"><strong>Task/Operator</strong></th>
1375+
<td>valueStateGetKeySize</td>
1376+
<td>The key size of get operation for value state</td>
1377+
<td>Histogram</td>
1378+
</tr>
1379+
<tr>
1380+
<td>valueStateGetValueSize</td>
1381+
<td>The value size of get operation for value state</td>
1382+
<td>Histogram</td>
1383+
</tr>
1384+
<tr>
1385+
<td>valueStateUpdateKeySize</td>
1386+
<td>The key size of update operation for value state</td>
1387+
<td>Histogram</td>
1388+
</tr>
1389+
<tr>
1390+
<td>valueStateUpdateValueSize</td>
1391+
<td>The value size of update operation for value state</td>
1392+
<td>Histogram</td>
1393+
</tr>
1394+
<tr>
1395+
<td>reducingStateGetKeySize</td>
1396+
<td>The key size of get operation for reducing state</td>
1397+
<td>Histogram</td>
1398+
</tr>
1399+
<tr>
1400+
<td>reducingStateGetValueSize</td>
1401+
<td>The value size of get operation for reducing state</td>
1402+
<td>Histogram</td>
1403+
</tr>
1404+
<tr>
1405+
<td>reducingStateAddKeySize</td>
1406+
<td>The key size of add operation for reducing state</td>
1407+
<td>Histogram</td>
1408+
</tr>
1409+
<tr>
1410+
<td>reducingStateAddValueSize</td>
1411+
<td>The value size of add operation for reducing state</td>
1412+
<td>Histogram</td>
1413+
</tr>
1414+
<tr>
1415+
<td>aggregatingStateGetKeySize</td>
1416+
<td>The key size of get operation for aggregating state</td>
1417+
<td>Histogram</td>
1418+
</tr>
1419+
<tr>
1420+
<td>aggregatingStateAddKeySize</td>
1421+
<td>The key size of add operation for aggregating state</td>
1422+
<td>Histogram</td>
1423+
</tr>
1424+
<tr>
1425+
<td>listStateGetKeySize</td>
1426+
<td>The key size of get operation for list state</td>
1427+
<td>Histogram</td>
1428+
</tr>
1429+
<tr>
1430+
<td>listStateGetValueSize</td>
1431+
<td>The value size of get operation for list state</td>
1432+
<td>Histogram</td>
1433+
</tr>
1434+
<tr>
1435+
<td>listStateAddKeySize</td>
1436+
<td>The key size of add operation for list state</td>
1437+
<td>Histogram</td>
1438+
</tr>
1439+
<tr>
1440+
<td>listStateAddValueSize</td>
1441+
<td>The value size of add operation for list state</td>
1442+
<td>Histogram</td>
1443+
</tr>
1444+
<tr>
1445+
<td>listStateAddAllKeySize</td>
1446+
<td>The key size of addAll operation for list state</td>
1447+
<td>Histogram</td>
1448+
</tr>
1449+
<tr>
1450+
<td>listStateAddAllValueSize</td>
1451+
<td>The value size of addAll operation for list state</td>
1452+
<td>Histogram</td>
1453+
</tr>
1454+
<tr>
1455+
<td>listStateUpdateKeySize</td>
1456+
<td>The key size of update operation for list state</td>
1457+
<td>Histogram</td>
1458+
</tr>
1459+
<tr>
1460+
<td>listStateUpdateValueSize</td>
1461+
<td>The value size of update operation for list state</td>
1462+
<td>Histogram</td>
1463+
</tr>
1464+
<tr>
1465+
<td>mapStateGetKeySize</td>
1466+
<td>The key size of get operation for map state</td>
1467+
<td>Histogram</td>
1468+
</tr>
1469+
<tr>
1470+
<td>mapStateGetValueSize</td>
1471+
<td>The value size of get operation for map state</td>
1472+
<td>Histogram</td>
1473+
</tr>
1474+
<tr>
1475+
<td>mapStatePutKeySize</td>
1476+
<td>The key size of put operation for map state</td>
1477+
<td>Histogram</td>
1478+
</tr>
1479+
<tr>
1480+
<td>mapStatePutValueSize</td>
1481+
<td>The value size of put operation for map state</td>
1482+
<td>Histogram</td>
1483+
</tr>
1484+
<tr>
1485+
<td>mapStateIteratorKeySize</td>
1486+
<td>The key size of iterator#next operation for map state</td>
1487+
<td>Histogram</td>
1488+
</tr>
1489+
<tr>
1490+
<td>mapStateIteratorValueSize</td>
1491+
<td>The value size of iterator#next operation for map state</td>
1492+
<td>Histogram</td>
1493+
</tr>
1494+
<tr>
1495+
<td>mapStateRemoveKeySize</td>
1496+
<td>The key size of remove operation for map state</td>
1497+
<td>Histogram</td>
1498+
</tr>
1499+
<tr>
1500+
<td>mapStateContainsKeySize</td>
1501+
<td>The key size of contains operation for map state</td>
1502+
<td>Histogram</td>
1503+
</tr>
1504+
<tr>
1505+
<td>mapStateIsEmptyKeySize</td>
1506+
<td>The key size of isEmpty operation for map state</td>
1507+
<td>Histogram</td>
1508+
</tr>
1509+
</tbody>
1510+
</table>
1511+
13611512
### RocksDB
13621513
Certain RocksDB native metrics are available but disabled by default, you can find full documentation [here]({{< ref "docs/deployment/config" >}}#rocksdb-native-metrics)
13631514

@@ -2208,6 +2359,22 @@ A larger value of this configuration will require more memory, but will provide
22082359
<span class="label label-danger">Warning</span> Enabling state-access-latency metrics may impact the performance.
22092360
It is recommended to only use them for debugging purposes.
22102361

2362+
## State key/value size tracking
2363+
2364+
Flink also allows to track the keyed state key/value size for standard Flink state-backends or customized state backends which extending from `AbstractStateBackend`. This feature is disabled by default.
2365+
To enable this feature you must set the `state.size-track.keyed-state-enabled` to true in the [Flink configuration]({{< ref "docs/deployment/config" >}}#state-backends-size-tracking-options).
2366+
2367+
Once tracking keyed state key/value size is enabled, Flink will sample the state size every `N` access, in which `N` is defined by `state.size-track.sample-interval`.
2368+
This configuration has a default value of 100. A smaller value will get more accurate results but have a higher performance impact since it is sampled more frequently.
2369+
2370+
As the type of this key/value size metrics is histogram, `state.size-track.history-size` will control the maximum number of recorded values in history, which has the default value of 128.
2371+
A larger value of this configuration will require more memory, but will provide a more accurate result.
2372+
2373+
<span class="label label-danger">Warning</span> Enabling state-size metrics may impact the performance.
2374+
It is recommended to only use them for debugging purposes.
2375+
If state.ttl is enabled, the size of the value will include the size of the TTL-related timestamp.
2376+
The value size of AggregatingState is not accounted for because AggregatingState returns a result processed by a user-defined AggregateFunction, whereas currently, only the actual stored data size in the state can be tracked.
2377+
22112378
## REST API integration
22122379

22132380
Metrics can be queried through the [Monitoring REST API]({{< ref "docs/ops/rest_api" >}}).
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<table class="configuration table table-bordered">
2+
<thead>
3+
<tr>
4+
<th class="text-left" style="width: 20%">Key</th>
5+
<th class="text-left" style="width: 15%">Default</th>
6+
<th class="text-left" style="width: 10%">Type</th>
7+
<th class="text-left" style="width: 55%">Description</th>
8+
</tr>
9+
</thead>
10+
<tbody>
11+
<tr>
12+
<td><h5>state.size-track.history-size</h5></td>
13+
<td style="word-wrap: break-word;">128</td>
14+
<td>Integer</td>
15+
<td>Defines the number of measured size to maintain at each state access operation.</td>
16+
</tr>
17+
<tr>
18+
<td><h5>state.size-track.keyed-state-enabled</h5></td>
19+
<td style="word-wrap: break-word;">false</td>
20+
<td>Boolean</td>
21+
<td>Whether to track size of keyed state operations, e.g value state put/get/clear. Please note that if state.ttl is enabled, the size of the value will include the size of the TTL-related timestamp.</td>
22+
</tr>
23+
<tr>
24+
<td><h5>state.size-track.sample-interval</h5></td>
25+
<td style="word-wrap: break-word;">100</td>
26+
<td>Integer</td>
27+
<td>The sample interval of size track once 'state.size-track.keyed-state-enabled' is enabled. The default value is 100, which means we would track the size every 100 access requests.</td>
28+
</tr>
29+
<tr>
30+
<td><h5>state.size-track.state-name-as-variable</h5></td>
31+
<td style="word-wrap: break-word;">true</td>
32+
<td>Boolean</td>
33+
<td>Whether to expose state name as a variable if tracking size.</td>
34+
</tr>
35+
</tbody>
36+
</table>

0 commit comments

Comments
 (0)