Skip to content

Conversation

@pkazmierczak
Copy link
Contributor

@pkazmierczak pkazmierczak commented Dec 31, 2025

While investigating NMD-1105 (possible memory leak in the event stream), I noticed that when we copy allocations, we first call the Copy method which returns us a deep copy of the object, and then we remove the job information from it to save space. Afterwards, we sanitize the allocation (i.e., we remove the SignedIdentities data) and during that operation we perform another copy.

This changeset doesn't necessarily resolve NMD-1105, but should definitely lower the memory footprint of eventFromChange.

@pkazmierczak pkazmierczak self-assigned this Dec 31, 2025
@pkazmierczak pkazmierczak added theme/events Issues related to events backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.10.x+ent backport to 1.10.x+ent release line backport/1.11.x backport to 1.11.x release line labels Dec 31, 2025
@pkazmierczak pkazmierczak changed the title state: avoid unnecessary allocation and node object copies state: avoid unnecessary allocation object copies Dec 31, 2025
@pkazmierczak pkazmierczak marked this pull request as ready for review December 31, 2025 15:50
@pkazmierczak pkazmierczak requested review from a team as code owners December 31, 2025 15:50
@pkazmierczak pkazmierczak marked this pull request as draft January 2, 2026 09:42
@pkazmierczak
Copy link
Contributor Author

Did a bit more digging on this because I thought perhaps we could get rid of copying node and allocation objects altogether, but sadly we need copies to sanitize the sensitive fields. We can actually avoid copying allocations because the state will accept an alloc with no SignedIdentities, but it won't accept a node without a SecretID. (not copying allocations is of course risky too, if we ever upsert an alloc with the same ID more than once we will lose the SignedIdentities field)

Possible solution for reducing the memory footprint of eventFromChange is making event paylod fields configurable. If we could skip some of the maps.Clone calls, that'd be a more significant improvement.

jrasell
jrasell previously approved these changes Jan 5, 2026
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

return structs.Event{}, false
}
alloc := after.Copy()
alloc := after.CopySkipJobAndSanitize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than introducing a new method and the flags on the existing methods, could we just move the call to Sanitize() here? That's what the node events do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but sanitize copies the whole alloc. We'd ideally skip the Job field. I suppose we could alternatively pass a parameter to Sanitize?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. There's an existing Allocation.CopySkipJob() method. I suspect we can use that safely in the Sanitize method rather than Allocation.Copy()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we? Allocation.Sanitize() is called in alloc and node RPC endpoints, to return the full allocation object. Wouldn't we want the Job field present there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopySkipJob() doesn't nil-out the job, it just doesn't do the deep copy. So in the alloc and node RPC endpoints, we can call Sanitize() and it's safe to keep the pointer to the original copy of the job (because we're not mutating the job). And then here we can call Sanitize and then nil-out the job field (which doesn't touch the original job, just removes the pointer to it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in 0e1c678

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followed up with c96c250 which removes no longer needed sanitize param to alloc.copyImpl and cf7137d which nils-out the job field for the event payload.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (don't forget the changelog)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.10.x+ent backport to 1.10.x+ent release line backport/1.11.x backport to 1.11.x release line theme/events Issues related to events

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants