Skip to content

Optimize heapq.nlargest/nsmallest by switching to sorting when k is large relative to n #137095

@MengAiDev

Description

@MengAiDev

Feature or enhancement

Proposal:

While benchmarking sorting operations, I observed that heapq.nlargest() shows suboptimal performance when the requested number of elements (k) is large relative to the input size (n). Here is the benchmark result, running on github workflow: https://github.com/MengAiDev/Example/actions/runs/16510518321/job/46691401567

Result: ============================================================
Testing size: 100
  k=1    | heapq: 0.000001s | sorted: 0.000002s | ratio: 1.80x
  k=5    | heapq: 0.000005s | sorted: 0.000003s | ratio: 0.55x
  k=10   | heapq: 0.000006s | sorted: 0.000003s | ratio: 0.40x
  k=50   | heapq: 0.000018s | sorted: 0.000003s | ratio: 0.15x
  k=100  | heapq: 0.000003s | sorted: 0.000003s | ratio: 0.95x
------------------------------------------------------------
Testing size: 500
  k=1    | heapq: 0.000006s | sorted: 0.000017s | ratio: 3.02x
  k=5    | heapq: 0.000011s | sorted: 0.000017s | ratio: 1.58x
  k=10   | heapq: 0.000016s | sorted: 0.000018s | ratio: 1.10x
  k=50   | heapq: 0.000039s | sorted: 0.000017s | ratio: 0.43x
  k=100  | heapq: 0.000062s | sorted: 0.000018s | ratio: 0.29x
  k=500  | heapq: 0.000018s | sorted: 0.000018s | ratio: 0.98x
------------------------------------------------------------
Testing size: 1,000
  k=1    | heapq: 0.000011s | sorted: 0.000037s | ratio: 3.28x
  k=5    | heapq: 0.000020s | sorted: 0.000037s | ratio: 1.82x
  k=10   | heapq: 0.000025s | sorted: 0.000037s | ratio: 1.51x
  k=50   | heapq: 0.000058s | sorted: 0.000038s | ratio: 0.65x
  k=100  | heapq: 0.000084s | sorted: 0.000038s | ratio: 0.45x
  k=500  | heapq: 0.000201s | sorted: 0.000039s | ratio: 0.19x
  k=1000 | heapq: 0.000040s | sorted: 0.000039s | ratio: 0.99x
------------------------------------------------------------
Testing size: 5,000
  k=1    | heapq: 0.000053s | sorted: 0.000471s | ratio: 8.86x
  k=5    | heapq: 0.000083s | sorted: 0.000484s | ratio: 5.84x
  k=10   | heapq: 0.000095s | sorted: 0.000476s | ratio: 5.01x
  k=50   | heapq: 0.000143s | sorted: 0.000490s | ratio: 3.42x
  k=100  | heapq: 0.000198s | sorted: 0.000477s | ratio: 2.41x
  k=500  | heapq: 0.000514s | sorted: 0.000475s | ratio: 0.93x
  k=1000 | heapq: 0.000795s | sorted: 0.000478s | ratio: 0.60x
------------------------------------------------------------
Testing size: 10,000
  k=1    | heapq: 0.000114s | sorted: 0.001133s | ratio: 9.92x
  k=5    | heapq: 0.000163s | sorted: 0.001138s | ratio: 6.98x
  k=10   | heapq: 0.000172s | sorted: 0.001128s | ratio: 6.56x
  k=50   | heapq: 0.000230s | sorted: 0.001135s | ratio: 4.94x
  k=100  | heapq: 0.000291s | sorted: 0.001137s | ratio: 3.91x
  k=500  | heapq: 0.000706s | sorted: 0.001133s | ratio: 1.60x
  k=1000 | heapq: 0.001174s | sorted: 0.001139s | ratio: 0.97x
------------------------------------------------------------
Testing size: 50,000
  k=1    | heapq: 0.000491s | sorted: 0.007304s | ratio: 14.87x
  k=5    | heapq: 0.000769s | sorted: 0.007316s | ratio: 9.52x
  k=10   | heapq: 0.000816s | sorted: 0.007315s | ratio: 8.96x
  k=50   | heapq: 0.000890s | sorted: 0.007325s | ratio: 8.23x
  k=100  | heapq: 0.000967s | sorted: 0.007422s | ratio: 7.68x
  k=500  | heapq: 0.001614s | sorted: 0.007352s | ratio: 4.56x
  k=1000 | heapq: 0.002307s | sorted: 0.007328s | ratio: 3.18x
------------------------------------------------------------
Testing size: 100,000
  k=1    | heapq: 0.001103s | sorted: 0.016051s | ratio: 14.55x
  k=5    | heapq: 0.001516s | sorted: 0.015989s | ratio: 10.55x
  k=10   | heapq: 0.001621s | sorted: 0.016131s | ratio: 9.95x
  k=50   | heapq: 0.001750s | sorted: 0.016152s | ratio: 9.23x
  k=100  | heapq: 0.001787s | sorted: 0.016125s | ratio: 9.02x
  k=500  | heapq: 0.002659s | sorted: 0.015858s | ratio: 5.96x
  k=1000 | heapq: 0.003355s | sorted: 0.015837s | ratio: 4.72x
------------------------------------------------------------
Testing size: 500,000
  k=1    | heapq: 0.005509s | sorted: 0.102558s | ratio: 18.62x
  k=5    | heapq: 0.007727s | sorted: 0.104972s | ratio: 13.59x
  k=10   | heapq: 0.008150s | sorted: 0.101526s | ratio: 12.46x
  k=50   | heapq: 0.008997s | sorted: 0.102024s | ratio: 11.34x
  k=100  | heapq: 0.008422s | sorted: 0.101662s | ratio: 12.07x
  k=500  | heapq: 0.009400s | sorted: 0.101575s | ratio: 10.81x
  k=1000 | heapq: 0.010465s | sorted: 0.102266s | ratio: 9.77x
------------------------------------------------------------
Testing size: 1,000,000
  k=1    | heapq: 0.011033s | sorted: 0.244558s | ratio: 22.17x
  k=5    | heapq: 0.015223s | sorted: 0.244423s | ratio: 16.06x
  k=10   | heapq: 0.016136s | sorted: 0.242489s | ratio: 15.03x
  k=50   | heapq: 0.017396s | sorted: 0.242830s | ratio: 13.96x
  k=100  | heapq: 0.016742s | sorted: 0.244876s | ratio: 14.63x
  k=500  | heapq: 0.018133s | sorted: 0.243711s | ratio: 13.44x
  k=1000 | heapq: 0.019433s | sorted: 0.243561s | ratio: 12.53x
------------------------------------------------------------
============================================================

I think we can add a if statement to check the n and the k, sometimes to use sorted instead of original nlargest.

Now:
When k approaches n (k > n/2), sorted() is 1.5-5 times faster than heapq.nlargest

When k > n/10, sorting on large datasets is usually faster

When k=n, heap operations are more than 20% slower than sorting

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirpendingThe issue will be closed if no feedback is providedperformancePerformance or resource usagestdlibPython modules in the Lib dirtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions