[IMP] snippets: move all work from parent to mp workers #137

cawo-odoo · 2024-09-13T14:12:55Z

In convert_html_columns(), we select 100MiB worth of DB tuples and pass them to a ProcessPoolExecutor together with a converter callable. So far, the converter returns all tuples, changed or unchanged together with the information if it has changed something. All this is returned through IPC to the parent process. In the parent process, the caller only acts on the changed tuples, though, the rest is ignored. In any scenario I've seen, only a small proportion of the input tuples is actually changed, meaning that a large proportion is returned through IPC unnecessarily.

What makes it worse is that processing of the converted results in the parent process is often slower than the conversion, leading to two effects:

The results of all workers sit in the parent process's memory, possibly leading to MemoryError (upg-2021031)
The parallel processing is being serialized on the feedback, defeating a large part of the intended performance gains

To improve this, this commit

moves all work into the workers, meaning not just the conversion filter, but also the DB query as well as the DB updates.
- by doing so reduces the amount of data passed by IPC to just the query texts
- by doing so distributes the data held in memory to all worker processes
reduces the chunk size by one order of magnitude, which means
- a lot less memory used at a time
- a lot better distribution of "to-be-changed" rows when these rows are clustered in the table

All in all, in my test case, this

reduces maximum process size in memory to 300MiB for all processes compared to formerly >2GiB (and MemoryError) in the parent process
reduces runtime from 17 minutes to less than 2 minutes

robodoo · 2024-09-13T14:12:58Z

cawo-odoo · 2024-09-16T12:33:58Z

ok, I stopped the mattness. If anyone wants to review ... :-)

aj-fuentes

This loos mostly good.

src/util/snippets.py

src/base/tests/test_util.py

aj-fuentes

Some suggestion about the row processing method. Untested!

Regarding the test I'm OK with it, but you could put a comment to somehow clarify the extra steps in the test.

src/util/snippets.py

cawo-odoo · 2024-09-23T11:32:57Z

all comments resolved

src/util/snippets.py

aj-fuentes

cc: @KangOl

src/util/snippets.py

cawo-odoo · 2024-10-09T06:15:12Z

𓃠 ?

cawo-odoo · 2024-11-13T13:33:52Z

🐈 ?

cawo-odoo · 2024-11-14T14:41:16Z

@nseinlet concerning our earlier coffee-machine meeting: wdyt about the fixup just pushed?

cawo-odoo · 2025-05-15T09:18:28Z

rebased, conflicts resolved

cawo-odoo · 2025-06-12T07:01:09Z

@nseinlet concerning our earlier coffee-machine meeting: wdyt about the fixup just pushed?

Today I improved the comments about the backwards compatibility, rebased, squashed the fixups. Do I get a review? And/or a 🐈‍⬛ ?

src/base/tests/test_util.py

src/util/snippets.py

aj-fuentes · 2025-06-12T07:35:19Z

src/util/snippets.py

+    def __call__(self, query):
+        # backwards compatibility: caller passes rows in the "query" argument and expects us to return converted rows
+        if not (self.dbname and self.update_query and isinstance(query, str)):
+            return self._convert_row(query)


We can make this simpler: if dbname we know is the new impl.

Suggested change

def __call__(self, query):

# backwards compatibility: caller passes rows in the "query" argument and expects us to return converted rows

if not (self.dbname and self.update_query and isinstance(query, str)):

return self._convert_row(query)

def __call__(self, row_or_query):

# backwards compatibility: caller passes rows in the "query" argument and expects us to return converted rows

if not self.dbname:

return self._convert_row(query)

I find this merged implementation unnecessary. The two interfaces cannot be used transparently by the same client code because we need the extra parameters in the new version.

In this case I think the good old OOP may be cleaner. For example, rename this class to ConvertorNG and code it for the query variant. Then make Convertor a subclass that keeps the old behavior.

class ConvertorNG: def __init__(self, converters, callback, dbname, update_query): ... def __call__(self, query): # filter rows if they do not have id, instead of by None def _convert_row(self, row): # do not return None if no id, use original impl class Convertor(ConvertorNG): """Deprecated kept for compatibility use ConvertorNG""" def __init__(self, converters, callback): super().__init__(converters, callback, None, None) def __call__(self, row): return self._convert_row(self, row)

I took the suggestion to make it simpler. Sub-classing may be perfectly clean, but it is also a lot of additional loc. Do we want that?

I took the suggestion to make it simpler. Sub-classing may be perfectly clean, but it is also a lot of additional loc. Do we want that?

TL;DR: I'm OK with both variants.

It's not about LOCs, it's about the clarity in the implementation. If somebody out there is using the old interface we don't break them. We will always use the new one including this PR. IMO it isn't worth to have a double approach for us internally if we would never use the other alternative. I'm still OK with this variant of the code, but having to understand and read the "compatibility code" next time we need to debug the code is unnecessary if we can just move it out of the way.

IMO having a double use parameter query_or_row only makes sense if we will use both variants. Since we are not, I'd suggest a new class keeping the old one for compatibility. Note that we do that already in many other places, just that as function aliases, here we are modifying a callable class hence the subclass suggestion.

That being said. Do not change it just for me. Let's ping @KangOl for a final review :)

In `convert_html_columns()`, we select 100MiB worth of DB tuples and pass them to a ProcessPoolExecutor together with a converter callable. So far, the converter returns all tuples, changed or unchanged together with the information if it has changed something. All this is returned through IPC to the parent process. In the parent process, the caller only acts on the changed tuples, though, the rest is ignored. In any scenario I've seen, only a small proportion of the input tuples is actually changed, meaning that a large proportion is returned through IPC unnecessarily. What makes it worse is that processing of the converted results in the parent process is often slower than the conversion, leading to two effects: 1) The results of all workers sit in the parent process's memory, possibly leading to MemoryError (upg-2021031) 2) The parallel processing is being serialized on the feedback, defeating a large part of the intended performance gains To improve this, this commit - moves all work into the workers, meaning not just the conversion filter, but also the DB query as well as the DB updates. - by doing so reduces the amount of data passed by IPC to just the query texts - by doing so distributes the data held in memory to all worker processes - reduces the chunk size by one order of magnitude, which means - a lot less memory used at a time - a lot better distribution of "to-be-changed" rows when these rows are clustered in the table All in all, in my test case, this - reduces maximum process size in memory to 300MiB for all processes compared to formerly >2GiB (and MemoryError) in the parent process - reduces runtime from 17 minutes to less than 2 minutes

cawo-odoo requested review from a team and UemuS September 13, 2024 14:12

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch 4 times, most recently from 3cb6112 to 2c980fa Compare September 16, 2024 11:52

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from 2c980fa to bd55b91 Compare September 16, 2024 15:31

cawo-odoo changed the title ~~[IMP] snippets: do not return unneeded data from mp worker~~ [IMP] snippets: move all work from parent to mp workers Sep 23, 2024

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from bd55b91 to 3a69c90 Compare September 23, 2024 07:28

aj-fuentes reviewed Sep 23, 2024

View reviewed changes

src/util/snippets.py Outdated Show resolved Hide resolved

src/base/tests/test_util.py Outdated Show resolved Hide resolved

aj-fuentes requested a review from KangOl September 23, 2024 07:50

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from 3a69c90 to 9ba22c9 Compare September 23, 2024 09:00

cawo-odoo requested a review from aj-fuentes September 23, 2024 09:01

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from 9ba22c9 to a697852 Compare September 23, 2024 09:04

aj-fuentes reviewed Sep 23, 2024

View reviewed changes

src/util/snippets.py Outdated Show resolved Hide resolved

src/util/snippets.py Outdated Show resolved Hide resolved

src/util/snippets.py Outdated Show resolved Hide resolved

src/util/snippets.py Outdated Show resolved Hide resolved

src/util/snippets.py Outdated Show resolved Hide resolved

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from b6888a6 to 117114b Compare September 23, 2024 11:31

aj-fuentes reviewed Sep 23, 2024

View reviewed changes

src/util/snippets.py Outdated Show resolved Hide resolved

aj-fuentes approved these changes Sep 23, 2024

View reviewed changes

aj-fuentes reviewed Sep 23, 2024

View reviewed changes

src/util/snippets.py Outdated Show resolved Hide resolved

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from 117114b to b8eebed Compare September 23, 2024 11:59

aj-fuentes approved these changes Nov 14, 2024

View reviewed changes

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch 2 times, most recently from 0ee95a6 to 3a916ac Compare November 14, 2024 14:39

cawo-odoo requested a review from nseinlet January 3, 2025 10:36

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from 3a916ac to 9405018 Compare May 15, 2025 09:17

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from 9405018 to fe1f381 Compare June 12, 2025 06:59

aj-fuentes reviewed Jun 12, 2025

View reviewed changes

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch 2 times, most recently from 5532679 to aca366e Compare June 12, 2025 08:57

aj-fuentes approved these changes Jun 12, 2025

View reviewed changes

cawo-odoo force-pushed the master-imp_snippets_memerror_performance-cawo branch from aca366e to 34f4ec9 Compare June 20, 2025 09:53

[IMP] snippets: move all work from parent to mp workers #137

Are you sure you want to change the base?

[IMP] snippets: move all work from parent to mp workers #137

Conversation

cawo-odoo commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robodoo commented Sep 13, 2024

Uh oh!

cawo-odoo commented Sep 16, 2024

Uh oh!

aj-fuentes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aj-fuentes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cawo-odoo commented Sep 23, 2024

Uh oh!

Uh oh!

aj-fuentes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cawo-odoo commented Oct 9, 2024

Uh oh!

cawo-odoo commented Nov 13, 2024

Uh oh!

cawo-odoo commented Nov 14, 2024

Uh oh!

cawo-odoo commented May 15, 2025

Uh oh!

cawo-odoo commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aj-fuentes Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cawo-odoo Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

aj-fuentes Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cawo-odoo commented Sep 13, 2024 •

edited

Loading

aj-fuentes Jun 12, 2025 •

edited

Loading

aj-fuentes Jun 12, 2025 •

edited

Loading