-
Notifications
You must be signed in to change notification settings - Fork 186
Description
I'm currently in the process of figure out what library to use for getting metadata from media files and this library is definitly one of the fastes around. Only problem I have are the huge allocations it makes:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.201
[Host] : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
DefaultJob : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
| Method | Folder | Mean | Error | StdDev | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|
| ParseWithMetadataExtractor | REDACTED | 563.2 ms | 23.79 ms | 15.74 ms | 536.6 ms | 580.6 ms | 44000.0000 | 22000.0000 | 10000.0000 | 230725.6 KB |
| ParseWithMediaInfo | REDACTED | 771.5 ms | 29.93 ms | 19.80 ms | 745.8 ms | 801.8 ms | - | - | - | 13.7 KB |
The folder I tested this on contained 144 files (136 Videos and 8 Images total 1GB) and saw allocations of around 220MB.
| Method | Folder | Mean | Error | StdDev | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|
| ParseWithMetadataExtractor | REDACTED | 140.9 ms | 9.84 ms | 6.51 ms | 128.5 ms | 147.1 ms | 18000.0000 | 14000.0000 | 8000.0000 | 65805.38 KB |
| ParseWithMediaInfo | REDACTED | 217.7 ms | 15.54 ms | 10.28 ms | 203.3 ms | 231.9 ms | - | - | - | 26.01 KB |
The next benchmark was done on a folder containing 276 files (only images) and again we see allocations way above reason.
Using the Dynamic Program Analysis build into Rider, the most allocations happen because the library reads the entire contents of a section into a byte array and often processes those later on like for PNG and JPEG.
Possible improvements could be made by using Span<T> and .Slice for the chunks which returns a ReadOnlySpan<T> with no extra allocations.
Then there is also the concept of binary overlays that differ from typical binary importating in that you do not read everything from file into memery upfront and then parse it but keep an open stream and only parse the bare minimum needed to know the file layout. With the layout you can then expose getters that call Lazy functions or similar which then jump to the specific position in the file stream and parse the section on demand instead of up front. This method is extremely useful as the program only needs to actually read and parse what you need so allocations will be kept to a minimum. The biggest problem this has it that it requires some not so small amount of refactoring and API changes.