yolov8及11等，多batch推理时，context.enqueue纯推理耗时是单batch的成倍增加，并没有加速？

对于yolov8/11/12,在config.h里设置kBatchSize大小，然后序列化模型进行推理。发现不管设多少，最终推理耗时并没有变少，而是成倍增加了。
我已在context.enqueue(batchsize, buffers, stream, nullptr);处用cudaevent事件记录模型推理的耗时（排除前后处理时间）。发现这块的推理时间没有随着batch变大而变小，而是成倍增加了。比如kBatchSize=4推理时这里的耗时为30ms，kBatchSize=16时推理时间为120ms左右。但是显存是增加的，属于是多用了显存却没有加速。
为什么会出现这种情况？

![Image](https://github.com/user-attachments/assets/ea6bded7-33c2-491c-a0e6-db093f16db94)

![Image](https://github.com/user-attachments/assets/ad74bbb4-fc94-4199-81c6-14cd5fc9732e)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

yolov8及11等，多batch推理时，context.enqueue纯推理耗时是单batch的成倍增加，并没有加速？ #1634

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

yolov8及11等，多batch推理时，context.enqueue纯推理耗时是单batch的成倍增加，并没有加速？ #1634

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions