Skip to content

Summaries require names of format name/tag #126

@nomadbl

Description

@nomadbl

After updating to ProtoBuf 1.0.0 #124 I found that summaries are not logged correctly to Tensorboard.
Some of them do get logged but some don't. I suspect that's because some summaries are fine but after trying to log incorrectly with some of them, the file or tensorboard stops registring the ones following that.

I prepared a minimal reproducing code by revising the Flux example to the new Flux API (the existing example uses a deprecated API)
nomadbl@fc9ba3e

During logging I observe an error message

[2023-07-01T23:12:43Z WARN  rustboard_core::run] Read error in ./content/log/events.out.tfevents.1.68825314069885e9.lior-HP-Pavilion-Laptop-15-cs3xxx: ReadRecordError(BadLengthCrc(ChecksumError { got: MaskedCrc(0x85987b32), want: MaskedCrc(0x00000000) }))

Which after some googling I can only speculate it indicates it has something to do with multiprocessing and the file trying to get written by multiple instances of the logger in different threads.
So far I tried (without success) to fix it under that assumption by specifying the logger should lock the file:
src/TBLogger.jl, 119:
file = open(fpath, "w"; lock=true)

Any other ideas or insights are welcome. I'll try to isolate the issue using the above mentioned reproducing code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions