Skip to content

memory leak in 1.4.0 #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
WojciechMula opened this issue Dec 23, 2020 · 6 comments
Open

memory leak in 1.4.0 #135

WojciechMula opened this issue Dec 23, 2020 · 6 comments
Labels

Comments

@WojciechMula
Copy link
Owner

Hi, i'm sorry for opening such an old issue, but i'm currently experiencing the same issue.
I'm using version 1.4.0 now and getting small steady memory leaks (after debugging with tracemalloc) on:

A = ahocorasick.Automaton()
MyList = [...]
for x in MyList:
A.add_word(y, (y, z))

is there a chance this bug has returned

Thanks,
Eden.

Originally posted by @EdenAzulay in #81 (comment)

@AlonSh
Copy link

AlonSh commented Mar 4, 2021

I think that I'm also experiencing the same memory leak on add_word.
Would love to see any updates :)

Edit - I was experiencing a different memory leak.
My leak originated from using multiprocessing Pool and some issue with passing ahocorasick automaton between workers, I think there's some issue with serialization causing old objects not to be cleaned.

@WojciechMula
Copy link
Owner Author

@AlonSh could you please provide some minimal example?

@AlonSh
Copy link

AlonSh commented Mar 11, 2021

Yeah:
create some automaton
create a multiprocessing Pool
and do:

pool.apply_async(
            run_automaton,
            (automaton, text),
            callback=callback_success,
            error_callback=_my_error_callback,
        )

and you'll see your memory exploding after some calls.

@WojciechMula
Copy link
Owner Author

Great! Thank you.

@pombredanne
Copy link
Collaborator

I am pushing tests to run on the CI on many Linux ... but while I can have it fail locally on Ubuntu 16... the tests seem to pass on more recent linux. I wonder if this is not dependent on a certain version of the compiler? Otherwise, this is a head scratcher.
@AlonSh FWIW, I recycle processes after a 1000 calls in my pools to cope with leaks. Not perfect, but a workaround at least. See for instance
https://github.com/nexB/scancode-toolkit/blob/e080f8354bed5813df9b619efe575ce9931a5a5b/src/scancode/cli.py#L1209

pombredanne added a commit that referenced this issue Mar 6, 2022
This test for #81 and #135 fails on on Ubuntu 16.x but not on 18 and up
It is also passing on the manylinux containers from PyPA.

Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne pombredanne mentioned this issue Jan 14, 2023
@Azzonith
Copy link

Hello guys,

Is there any update on this issue?
I've tested a library version 2.0.0 today and memory consumption added up every time automaton was used in ProcessPoolExecutor futures. We had to stop the service after RAM consumption crossed 140GB.
I attempted to build an image FROM ubuntu:20.04, python:3.8, python:3.10. The issue is reproduced every time.
The latest usable lib version for us remains 1.1.8.
Please let me know if there is any troubleshooting info I could provide for the research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants