[Feature] Implement TopK wrapper by rwtarpit · Pull Request #3610 · pytorch/ignite

rwtarpit · 2026-02-27T18:24:47Z

Description:

As discussed in the issue thread, this is a draft PR for implementing a base structure for the TopK wrapper class.
the idea is to skip redundant checks on input data by skipping _check_shape and _check_type k times.

the wrapper only instantiates a single object of the base class and maintains states for each k using a dict.

for each batch, we check the data shape and type only once before updating the metric for each k individually. for this we use _skip_checks flag that can only be set from the TopK itself so it ensures metric function as usual when called without wrapper.

TopK class provides _wrap_prepare_output that handles top-k logic and transforms data accordingly. then metric can be calculated by base metric's update and compute as usual by the means of state dict and _state_dict_all_req_keys

I have currently only implemented the wrapper for recall and precision currently to get an idea how it would look when extended to other metrics in future
Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

vfdev-5

Thanks for the PR @rwtarpit !
I checked the code and thinking on how to make it a bit better.
We may need to figure out precisely as well on what kind of inputs we can compute top-k metric: binary, multi-class, multi-label? All of them or a set of them...

vfdev-5 · 2026-03-01T17:02:59Z

ignite/metrics/top_k.py

+            masked.scatter_(-1, top_indices, 1.0)
+            return (masked, y)
+
+    @reinit__is_reduced


I think, we do not need to decorate method with reinit__is_reduced and reinit__is_reduced as base metric will handle that.

vfdev-5 · 2026-03-01T17:12:28Z

ignite/metrics/top_k.py

+            for attr in self._base_metric._state_dict_all_req_keys:
+                setattr(self._base_metric, attr, self._states[k].get(attr, getattr(self._base_metric, attr)))


We can use Metric.state_dict and Metric.load_state_dict for that?

i'll take a look at this

rwtarpit · 2026-03-01T18:10:38Z

Thanks for the PR @rwtarpit ! I checked the code and thinking on how to make it a bit better. We may need to figure out precisely as well on what kind of inputs we can compute top-k metric: binary, multi-class, multi-label? All of them or a set of them...

yes this would need some research before finalizing the metric.
also i was thinking if we go with current implementation of implementing _wrap_prepare_output in TopK itself, the method might grow too big with branching and all. we can create _wrap_prepare_output method in base metric instead of TopK for all metrics as you proposed earlier for the metrics that can have TopK feature, and TopK instantiation will fail on metrics that don't have this method defined

rwtarpit · 2026-03-03T10:50:47Z

@vfdev-5
i checked the various supported input types and these are some of the distinctions i would like to flag:

TopK woudn't support binary input for any metric:

y_pred = torch.tensor([0.9, 0.3, 0.7])  # (N,) 
y      = torch.tensor([1,   0,   1  ])
# only one element per sample

Multiclass case:

#k=2
y_pred = torch.tensor([[0.1, 0.6, 0.2, 0.4],   # top2: class 1,3
                           [0.5, 0.2, 0.8, 0.3],   # top2: class 2,0
                           [0.3, 0.1, 0.2, 0.9]])  # top2: class 3,0
y = torch.tensor([1, 2, 1])    # true classes
# accuracy@2 = 2/3 = 0.66
# samples: recall@2 = (1/1 + 1/1 + 0/1)/3 = 0.66 = accuracy@2
# micro:   recall@2 = sum(TP)/sum(actual) = 2/3 = 0.66 = accuracy@2
# samples: precision@2 = (1/2 + 1/2 + 0/2)/3 = 0.333
# micro:   precision@2 = sum(TP)/N*k = 2/6 = 0.333

Multiclass Accuracy@k would be same as existing TopKCategoricalAccuracy
also for Recall@k for average = samples or micro, this would be same as TopKCategoricalAccuracy as actual_positivies=1 for all samples.
For Recall@k with average=macro and weighted, it is logical and distinct from Accuracy@k since these averages compute recall per class : TP_class / actual_class, rather than collapsing to a simple per sample hit/miss, making it sensitive to class imbalance unlike accuracy.

#macro recall@2:
class 1: actual= sample 1 and 3, predicted = sample 1; recall@class1 = 1/2=0.5
class 2: actual= sample 2, predicted = sample 2; recall@class2 = 1/1=1.0
macro recall@2 = (0.5+1.0)/2 = 0.75

and Precision@k for average=samples/micro doesn't make sense too as again, since there is only one true class per sample, precision@k <= 1/k always.
Precision@k would make sense for average = macro or weighted as we will be calculating per class precision rather than per sample:
TP_class / predicted_class
(how many times did we predict this class in top-k and were actually correct)

Multilabel case:

For multilabel, TopK should be able to expanded for all averaging strategies since each sample can have multiple true labels, ie both actual_positives > 1 and TP > 1 are possible, making precision and recall denominators variable unlike multiclass where actual_positives=1.

Similary F-score@k should be valid where both precision@k and recall@k are valid

rwtarpit · 2026-03-03T12:18:49Z

also i was thinking if we go with current implementation of implementing _wrap_prepare_output in TopK itself, the method might grow too big with branching and all. we can create _wrap_prepare_output method in base metric instead of TopK for all metrics as you proposed earlier for the metrics that can have TopK feature, and TopK instantiation will fail on metrics that don't have this method defined

can you confirm this too, i can then proceed with the implementation

rwtarpit · 2026-03-08T08:19:30Z

I have updated the draft PR with idea of keeping the TopK logic seperated and decoupled as much as possible from metrics(them being already stable):

The idea is to maintain the topk_transform registry for metrics we want to implement in single file itself. TopK wrapper will use these transforms to convert output accordingly before passing it to base_metric's update in a loop for each k.
Also to skip the redundant input validation checks, base metrics will need to have a small addition of skip_checks logic that can only be fired from TopK itself (this keeps base_metrics stable with minor change)

Advantages:

keeps code clean and modular by limiting TopK logic to itself.
user can register their own topk logic by implementing custom topk_transform for their custom metric
skip redundant checks and supports list[k]

Disadvantages:

base_metric is never attatched to engine, so its output_transform will never be called if user passes it. we would need to keep checks and clear documentation about this.
for topk logic to work, the output_transform passed to it should only unpack the output into (y_pred, y) and not do any other transformations like binarisation. again this would require some checks and warnings
slightly different than existing API in terms of UX

NOTE:
the registry idea is taken from :
https://github.com/open-mmlab/mmengine/blob/main/mmengine/registry/registry.py
https://github.com/open-mmlab/mmdetection/tree/main

i also checked how torch metrics implement TopK. they have topk logic for each metric where topk is passed as arg:
precision = Precision(..., top_k=4)
but thing is they pass topk as int.
if we want to have topk feature supporting list[int], i really can't think of other than having an additional wrapper class.

currently i have kept it to precision/recall and added a small testcase to check sanity of the idea. this is a rough sketch of the idea and we still need to think of adding of other metrics and edge cases/ exceptions before moving forward and finalising this idea

steaphenai

@rwtarpit Great work on this TopK wrapper! @vfdev-5 and @rwtarpit I have some few suggestions. Let me know what you think.

steaphenai · 2026-03-10T13:04:56Z

ignite/metrics/precision.py

        """
-        self._check_shape(output)
-        self._check_type(output)
+        if not getattr(self, "_skip_checks", False):


Suggested change

if not getattr(self, "_skip_checks", False):

if not self._skip_checks:

I think using getattr might implies the attribute might not exist when this method runs. So it is better to initialize self._skip_checks = False

steaphenai · 2026-03-10T13:05:19Z

ignite/metrics/top_k.py

+import torch
+from typing import Sequence
+
+from ignite.metrics import Metric


Suggested change

from ignite.metrics import Metric

from ignite.metrics import Metric

from ignite.metrics.metric import reinit__is_reduced

steaphenai · 2026-03-10T13:05:32Z

ignite/metrics/top_k.py

+
+        self._transform = transform
+        self._base_metric = base_metric
+        self._ks = sorted(top_k) if isinstance(top_k, list) else [top_k]


Suggested change

self._ks = sorted(top_k) if isinstance(top_k, list) else [top_k]

self._ks = sorted(top_k) if isinstance(top_k, list) else [top_k]

identity = lambda x: x

if base_metric.output_transform is not identity:

import warnings

warnings.warn(

"base_metric's output_transform will never be called inside TopK. "

"Pass output_transform to TopK directly instead.",

UserWarning,

)

TopK feeds the data directly to the base metric's update method, it completely bypasses the base metric's own output_transform. If a user passes a custom transform to the base metric, it will silently fail to run. So I think its better to add some warning.

steaphenai · 2026-03-10T13:05:41Z

ignite/metrics/top_k.py

+        self._ks = sorted(top_k) if isinstance(top_k, list) else [top_k]
+        super().__init__(output_transform=output_transform, device=device, skip_unrolling=skip_unrolling)
+
+    def reset(self):


Suggested change

def reset(self):

@reinit__is_reduced

def reset(self):

Its better to add this decorator because its resets the flag when new epoch starts

@vfdev-5 I just checked your previous comment that reinit__is_reduced flag automatically. If `self._base_metric.update() handles the flag reset for the wrapper, then we can safely ignore the decorator suggestions.

steaphenai · 2026-03-10T13:05:50Z

ignite/metrics/top_k.py

+        self._base_metric.reset()
+        self._states = {k: self._base_metric.state_dict() for k in self._ks}
+
+    def update(self, output):


Suggested change

def update(self, output):

@reinit__is_reduced

def update(self, output):

I guess we need the decorator here too

steaphenai · 2026-03-10T13:05:59Z

ignite/metrics/top_k.py

+
+        self._base_metric._skip_checks = False
+
+    def compute(self) -> list:


Distributed sync is missing here. The state tensors for each k inside self._states are never reduced across GPUs before compute() is called. I think we should add a docstring noting that distributed evaluation is not yet supported for TopK to avoid silent failures.

steaphenai · 2026-03-10T13:06:17Z

ignite/metrics/top_k.py

+def _precision_recall_topk_transform(output: Sequence[torch.Tensor], k: int):
+    """top_k transform for precision and recall"""
+    y_pred, y = output[0], output[1]
+    _, top_indices = torch.topk(y_pred, k=k, dim=-1)


Suggested change

_, top_indices = torch.topk(y_pred, k=k, dim=-1)

actual_k = min(k, y_pred.shape[-1])

_, top_indices = torch.topk(y_pred, k=actual_k, dim=-1)

I think we need to guard against k exceeding the total number of items. If a user asks for k=10 but the test batch only has 5 items, torch.topk will throw a RuntimeError and leds to crash. So taking min prevents this.

steaphenai · 2026-03-10T13:06:36Z

tests/ignite/metrics/test_precision.py

+    assert len(result) == 3
+    assert pytest.approx(result[0], abs=1e-4) == 1.0
+    assert pytest.approx(result[1], abs=1e-4) == 1.0
+    assert pytest.approx(result[2], abs=1e-4) == 2 / 3


Suggested change

assert pytest.approx(result[2], abs=1e-4) == 2 / 3

assert pytest.approx(result[2], abs=1e-4) == 2 / 3

def test_top_k_k_exceeds_num_items():

"""torch.topk crashes if k > number of items — verify guard works."""

from ignite.metrics import Precision

from ignite.metrics.top_k import TopK

import torch

y_pred = torch.tensor([[0.9, 0.3, 0.8]]) # 3 items

y_true = torch.tensor([[1, 0, 1]])

metric = TopK(Precision(average="samples", is_multilabel=True), top_k=[5]) # k=5 > 3 items

metric.update((y_pred, y_true)) # should NOT crash

result = metric.compute()

assert len(result) == 1

Lets add this unit test to verify the min(k, total_items) guard works and prevents a RuntimeError !

vfdev-5 · 2026-03-11T21:54:47Z

ignite/metrics/top_k.py

+    def __init__(
+        self,
+        base_metric: Metric,
+        top_k: int | list[int],
+        output_transform=lambda x: x,
+        device: str | torch.device = torch.device("cpu"),
+        skip_unrolling: bool = False,
+    ):


@TahaZahid05 seems like we have a similar idea/problem as SubgroupMetric from your PR. Can you check this implementation and give the feedback, thanks!

@vfdev-5 i have made a general comment on the PR.

vfdev-5 · 2026-03-11T22:00:11Z

base_metric is never attatched to engine, so its output_transform will never be called if user passes it. we would need to keep checks and clear documentation about this.

@rwtarpit you can check

ignite/ignite/metrics/metrics_lambda.py

Lines 148 to 170 in 6bfbdc4

    
           def _internal_attach(self, engine: Engine, usage: MetricUsage) -> None: 
        
               self.engine = engine 
        
               for index, metric in enumerate(itertools.chain(self.args, self.kwargs.values())): 
        
                   if isinstance(metric, MetricsLambda): 
        
                       metric._internal_attach(engine, usage) 
        
                   elif isinstance(metric, Metric): 
        
                       # NB : metrics is attached partially 
        
                       # We must not use is_attached() but rather if these events exist 
        
                       if not engine.has_event_handler(metric.started, usage.STARTED): 
        
                           engine.add_event_handler(usage.STARTED, metric.started) 
        
                       if not engine.has_event_handler(metric.iteration_completed, usage.ITERATION_COMPLETED): 
        
                           engine.add_event_handler(usage.ITERATION_COMPLETED, metric.iteration_completed) 
        
           def attach(self, engine: Engine, name: str, usage: str | MetricUsage = EpochWise()) -> None: 
        
               if self._updated: 
        
                   raise ValueError( 
        
                       "The underlying metrics are already updated, can't attach while using reset/update/compute API." 
        
                   ) 
        
               usage = self._check_usage(usage) 
        
               # recursively attach all its dependencies (partially) 
        
               self._internal_attach(engine, usage) 
        
               # attach only handler on EPOCH_COMPLETED 
        
               engine.add_event_handler(usage.COMPLETED, self.completed, name)

where lambda metric is contructed from inner metrics and when lambda metric is attached, inner metrics are also attached.

TahaZahid05 · 2026-03-12T18:00:08Z

@vfdev-5 Thanks for tagging.

Part of my approach has already been discussed in #3568 from what I can see. I'll list down my approach again for explanation purposes:

The main difference is in state management. This PR uses a single instance and swaps state_dicts across the different K's. in my PR (#3627), i used copy.deepcopy to create a seperate instance per key:

self._metrics = {k: copy.deepcopy(base_metric) for k in keys}

This simplifies the update() function as well because there is no need for _skip_checks since each k has its own instance. Even though this PR approach is more optimized by skipping redundant checks, it requires modifying each base's metric update() to add the _skip_checks guard, which is less scalable as every future metric that wants TopK support would need the same change.

Following is how the update looks under my implementation:

for k in self._ks:
    k_output = self._transform(output, k)
    self._metrics[k].update(k_output)

and compute() simply returns the results:

return {k: self._metrics[k].compute() for k in self._ks}

This is better in my opinion as you don't need to add the _skip_checks, meaning no modification to the base metric itself, and also you don't suffer from the distributed sync issues of the current implementation, as @steaphenai mentioned which don't occur under my implementation due to simplicity of it.

@rwtarpit mentioned in #3568 that they considered the use of deepcopy but were concerned about it being costly. Since Precision contains just a few tensors, I think cloning them for a typical set of a few values of k would be negligible and a good tradeoff compared to simplicity in implementation.

rwtarpit · 2026-03-13T10:12:20Z

thanks for the explaination @TahaZahid05 , i was bit reluctant too with this approach due to touching stable API of metrics.
i'll let you guys(@TahaZahid05 and @vfdev-5 ) decide on this part, if to use deepcopy or single instance for state management.
overall is using the registry approach feasible??

vfdev-5 · 2026-03-13T10:44:13Z

overall is using the registry approach feasible??

@rwtarpit I could not understand the usefulness of registry approach. If you can pitch again the idea with clear example, it would be helpful.

rwtarpit · 2026-03-13T11:19:56Z

so i tried to interleave many issues we were facing with TopK:

for code maintainablity we don't need to change update method of TopK for future metrics, we would just need to add transform function for new metric and register it. otherwise we would have branching logic in update method with is_instance(base_metric, ...) for each new metric added.
this gives user ability to add a custom TopK if they want for their custom metric by just writing the transform method and registering it in TopK for their custom metric. (although there is a edge case: what if user adds a custom topk transform logic for already registered base metric)
TopK's logic remains seperated in single file from base metrics and we don't need to touch them.

#TopK's update 
class TopK:
     # topk with branching logic
    def update(base_metric, top_k,...):
        if is_instance(base_metric, PrecisonRecall):
        ...
        if is_instance(base_metric, Accuracy):
        ...
        
    # with transform registry logic:
    def update(base_metric, top_k,...):
        transform = None
            for metric_type, k_transform in self._output_transform_registry.items():
                if isinstance(base_metric, metric_type):
                    transform = k_transform
        ...
 # registering metrics to TopK    
TopK.register(PrecisonRecall, _precision_recall_transform)     
TopK.register(PrecisonRecall, _accuracy_transform) 
...

so the main goal is to keep it maintainable and provide option of custom TopK transform logic too. apparently if TopK.register doesn't sound aligned with customs of repo, maybe we can use TopK.attatch??

github-actions bot added the module: metrics Metrics module label Feb 27, 2026

vfdev-5 reviewed Mar 1, 2026

View reviewed changes

rogueslasher mentioned this pull request Mar 5, 2026

Implementation of RecSys Metrics #2631

Open

1 task

implement registry based topk wrapper

3716c6a

rwtarpit force-pushed the feature/TopK branch from 06509cd to 3716c6a Compare March 7, 2026 18:24

extend TopK wrapper and registry idea

da5c7b5

rwtarpit requested a review from vfdev-5 March 9, 2026 17:41

steaphenai reviewed Mar 10, 2026

View reviewed changes

steaphenai mentioned this pull request Mar 10, 2026

Add NDCG metric to rec_sys #3608

Open

3 tasks

vfdev-5 reviewed Mar 11, 2026

View reviewed changes

steaphenai mentioned this pull request Mar 13, 2026

[Feature]: Add NDCG (Normalized Discounted Cumulative Gain) metric to rec_sys #3670

Closed

3 tasks

		for attr in self._base_metric._state_dict_all_req_keys:
		setattr(self._base_metric, attr, self._states[k].get(attr, getattr(self._base_metric, attr)))

	if not getattr(self, "_skip_checks", False):
	if not self._skip_checks:

	from ignite.metrics import Metric
	from ignite.metrics import Metric
	from ignite.metrics.metric import reinit__is_reduced

-        self._ks = sorted(top_k) if isinstance(top_k, list) else [top_k]
+        self._ks = sorted(top_k) if isinstance(top_k, list) else [top_k]
+       identity = lambda x: x
+       if base_metric.output_transform is not identity:
+            import warnings
+            warnings.warn(
+                "base_metric's output_transform will never be called inside TopK. "
+                "Pass output_transform to TopK directly instead.",
+                UserWarning,
+            )

	def update(self, output):
	@reinit__is_reduced
	def update(self, output):


		self._base_metric._skip_checks = False

		def compute(self) -> list:

	_, top_indices = torch.topk(y_pred, k=k, dim=-1)
	actual_k = min(k, y_pred.shape[-1])
	_, top_indices = torch.topk(y_pred, k=actual_k, dim=-1)

-    assert pytest.approx(result[2], abs=1e-4) == 2 / 3
+    assert pytest.approx(result[2], abs=1e-4) == 2 / 3
+def test_top_k_k_exceeds_num_items():
+    """torch.topk crashes if k > number of items — verify guard works."""
+    from ignite.metrics import Precision
+    from ignite.metrics.top_k import TopK
+    import torch
+    y_pred = torch.tensor([[0.9, 0.3, 0.8]])   # 3 items
+    y_true = torch.tensor([[1, 0, 1]])
+    metric = TopK(Precision(average="samples", is_multilabel=True), top_k=[5])  # k=5 > 3 items
+    metric.update((y_pred, y_true))   # should NOT crash
+    result = metric.compute()
+    assert len(result) == 1

Uh oh!

Conversation

rwtarpit commented Feb 27, 2026

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rwtarpit commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwtarpit commented Mar 3, 2026

Uh oh!

rwtarpit commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwtarpit commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steaphenai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steaphenai Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TahaZahid05 Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vfdev-5 commented Mar 11, 2026

Uh oh!

TahaZahid05 commented Mar 12, 2026

Uh oh!

rwtarpit commented Mar 13, 2026

Uh oh!

vfdev-5 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwtarpit commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rwtarpit commented Mar 1, 2026 •

edited

Loading

rwtarpit commented Mar 3, 2026 •

edited

Loading

rwtarpit commented Mar 8, 2026 •

edited

Loading

steaphenai Mar 10, 2026 •

edited

Loading

TahaZahid05 Mar 12, 2026 •

edited

Loading

vfdev-5 commented Mar 13, 2026 •

edited

Loading