Skip to content

Huge allocations #283

@erri120

Description

@erri120

I'm currently in the process of figure out what library to use for getting metadata from media files and this library is definitly one of the fastes around. Only problem I have are the huge allocations it makes:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
  DefaultJob : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT

Method Folder Mean Error StdDev Min Max Gen 0 Gen 1 Gen 2 Allocated
ParseWithMetadataExtractor REDACTED 563.2 ms 23.79 ms 15.74 ms 536.6 ms 580.6 ms 44000.0000 22000.0000 10000.0000 230725.6 KB
ParseWithMediaInfo REDACTED 771.5 ms 29.93 ms 19.80 ms 745.8 ms 801.8 ms - - - 13.7 KB

The folder I tested this on contained 144 files (136 Videos and 8 Images total 1GB) and saw allocations of around 220MB.

Method Folder Mean Error StdDev Min Max Gen 0 Gen 1 Gen 2 Allocated
ParseWithMetadataExtractor REDACTED 140.9 ms 9.84 ms 6.51 ms 128.5 ms 147.1 ms 18000.0000 14000.0000 8000.0000 65805.38 KB
ParseWithMediaInfo REDACTED 217.7 ms 15.54 ms 10.28 ms 203.3 ms 231.9 ms - - - 26.01 KB

The next benchmark was done on a folder containing 276 files (only images) and again we see allocations way above reason.

Using the Dynamic Program Analysis build into Rider, the most allocations happen because the library reads the entire contents of a section into a byte array and often processes those later on like for PNG and JPEG.

Possible improvements could be made by using Span<T> and .Slice for the chunks which returns a ReadOnlySpan<T> with no extra allocations.

Then there is also the concept of binary overlays that differ from typical binary importating in that you do not read everything from file into memery upfront and then parse it but keep an open stream and only parse the bare minimum needed to know the file layout. With the layout you can then expose getters that call Lazy functions or similar which then jump to the specific position in the file stream and parse the section on demand instead of up front. This method is extremely useful as the program only needs to actually read and parse what you need so allocations will be kept to a minimum. The biggest problem this has it that it requires some not so small amount of refactoring and API changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions