Skip to content

cfb_add and write performance issues #2

@rossj

Description

@rossj

Hi there,

I'm working on a program which converts .pst files to .msg files, primarily in Node but also in the browser, and it uses this library in a very write-heavy way for saving the .msg files. Through testing and profiling, I've noticed a couple write-related performance issues that I wanted to share.

With some modifications, I've been able to reduce the output generation time of my primary "large" test case (4300 .msg files from 1 .pst) by a factor of 8 from about 16 minutes to 2 minutes (running on Node).

The 1st issue, which may just be a matter of documentation, is that using cfb_add repeatedly to add all streams to a new doc is very slow, as it calls cfb_gc and cfb_rebuild every time. We switched from using cfb_add to directly pushing to cfb.FileIndex and cfb.FullPaths (and then calling cfb_rebuild once at the end) which reduced the output time from 16 minutes to 3.5 minutes.

The 2nd issue is that the _write and WriteShift functions do not utilize Buffer capabilities when it is available. By using Buffer.alloc() for the initial creation, which guarantees a 0-filled initialization, along with Buffer.copy for content streams, Buffer.write for hex / utf16le strings, and Buffer's various write int / uint methods, we were able to further reduce the output time from 3.5 minutes to 2 minutes.

If you wish, I would be happy to share my changes, or to work on a pull request which uses Buffer functions when available. My current changes don't do any feature detection, and rather just rely on Buffer being available, as even in the browser we use feross/buffer, so it would need some more work to maintain functionality in non-Buffer environments.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions