[FIX] spreadsheet: batch process `spreadsheet_revision.commands` #284

vval-odoo · 2025-06-11T08:48:06Z

[FIX] spreadsheet: batch process spreadsheet_revision.commands

Some dbs have spreadsheet_revision records with over 10 millions characters in commands. If the number of record is high, this leads to memory errors here. We distribute them in buckets of memory_cap maximum size, and use a named cursor to process them in buckets. Commands larger than memory_cap fit in one bucket.

select count(*),SUM(CHAR_LENGTH(commands)) as characters from spreadsheet_revision where commands like any (array['%"DELETE_SHEET"%', '%"MOVE_COLUMNS_ROWS"%', '%"REMOVE_COLUMNS_ROWS"%', '%"ADD_COLUMNS_ROWS"%', '%"RENAME_SHEET"%'])
+-------+------------+
| count | characters |
|-------+------------|
| 2317  | 3419017646 |
+-------+------------+

select id,CHAR_LENGTH(commands) from spreadsheet_revision where commands like any (array['%"DELETE_SHEET"%', '%"MOVE_COLUMNS_ROWS"%', '%"REMOVE_COLUMNS_ROWS"%', '%"ADD_COLUMNS_ROWS"%', '%"RENAME_SHEET"%']) order by CHAR_LENGTH(commands) desc limit 10
+------+-------------+
| id   | char_length |
|------+-------------|
| 3871 | 13197157    |
| 4788 | 13197157    |
| 3290 | 13197157    |
| 3557 | 13197157    |
| 4179 | 13197157    |
| 4481 | 13197157    |
| 2757 | 13197157    |
| 3022 | 13197157    |
| 2492 | 13197157    |
| 5097 | 13197157    |
+------+-------------+

Fixes upg-2899961

robodoo · 2025-06-11T08:48:08Z

Pirols

Good work! :)

src/util/spreadsheet/misc.py

Pirols · 2025-06-11T09:20:33Z

src/util/spreadsheet/misc.py



 def iter_commands(cr, like_all=(), like_any=()):
    if not (bool(like_all) ^ bool(like_any)):
        raise ValueError("Please specify `like_all` or `like_any`, not both")
-    cr.execute(
+    ncr = pg.named_cursor(cr, itersize=BATCH_SIZE)


Using a context manager you do not need to close it explicitely¹.

Suggested change

ncr = pg.named_cursor(cr, itersize=BATCH_SIZE)

with pg.named_cursor(cr, itersize=BATCH_SIZE) as ncr:

That said, this is just in the name of a more pythonic implementation. IOW: imo you can keep your current version, if you like it better.

Footnotes

https://github.com/odoo/odoo//blob/bdc28e53b7c426640e30b900064442f19c946d9e/odoo/sql_db.py#L221-L238 ↩

KangOl · 2025-06-11T13:08:09Z

upgradeci retry with always only *spreadsheet*

aj-fuentes · 2025-06-11T13:20:43Z

src/util/spreadsheet/misc.py

+            SELECT id,
+                commands
+            FROM spreadsheet_revision
+            WHERE commands LIKE {}(%s::text[])


Keep the formatting.

Suggested change

SELECT id,

commands

FROM spreadsheet_revision

WHERE commands LIKE {}(%s::text[])

SELECT id,

commands

FROM spreadsheet_revision

WHERE commands LIKE {}(%s::text[])

aj-fuentes · 2025-06-11T13:23:32Z

src/util/spreadsheet/misc.py

-        if "commands" not in data_loaded:
-            continue
-        data_old = json.dumps(data_loaded, sort_keys=True)
+    with pg.named_cursor(cr, itersize=1) as ncr:


You can either leave the default itersize, or optimize the value to something that would work depending on the data. Another alternative is to use fetchmany directly.

Suggested change

with pg.named_cursor(cr, itersize=1) as ncr:

with pg.named_cursor(cr) as ncr:

Some dbs have `spreadsheet_revision` records with over 10 millions characters in `commands`. If the number of record is high, this leads to memory errors. We distribute them in buckets of `memory_cap` maximum size, and use a named cursor to process them in buckets. Commands larger than `memory_cap` fit in one bucket.

aj-fuentes · 2025-06-23T08:04:32Z

src/util/spreadsheet/misc.py

+        start_bucket AS (
+            SELECT 1 AS bucket
+        ),
+        ordered_rows AS (
+            SELECT id,
+                   LENGTH(commands) AS length,
+                   ROW_NUMBER() OVER (ORDER BY LENGTH(commands), id) AS rownum
+              FROM spreadsheet_revision
+             WHERE commands LIKE {}(%s::text[])
+        ),
+        assign AS (
+            SELECT o.id AS id,
+                   o.length,
+                   o.rownum,
+                   sb.bucket AS bucket,
+                   o.length AS sum
+              FROM ordered_rows o, start_bucket sb
+             WHERE o.rownum = 1
+
+             UNION ALL
+
+            SELECT o.id AS id,
+                   o.length,
+                   o.rownum,
+                   CASE
+                        WHEN a.sum + o.length > {memory_cap} THEN a.bucket + 1
+                        ELSE a.bucket
+                   END AS bucket,
+                   CASE
+                        WHEN a.sum + o.length > {memory_cap} THEN o.length
+                        ELSE a.sum + o.length
+                   END AS sum
+              FROM assign a
+              JOIN ordered_rows o
+                ON o.rownum = a.rownum + 1
+        )
+        SELECT count(*),ARRAY_AGG(id)
+          FROM assign
+      GROUP BY bucket
+      ORDER BY bucket;


I think this can be simplified. You only need for each id the number of the bucket where it would be. In case one length is longer than one bucket you want to keep the record alone in it's bucket to avoid some smaller records making the group take even more space.
You can return the ids and commands already per bucket. That way you can iterate over the named cursor one by one.

Example query: (/!\warning: untested/!)

buckets AS ( SELECT id, mod(SUM(length(commands) OVER (ORDER BY id), %(bucket_size)s) AS num, length(commands) > %(bucket_size)s AS alone FROM spreadsheet_revision WHERE commands LIKE {}(%(filters)s::text[]) ORDER BY id ) SELECT ARRAY_AGG(id ORDER BY id) ARRAY_AGG(commands ORDER BY id) FROM buckets GROUP BY num, alone

then

for ids, datas in ncr: # you can iterate one by one here for id, data in zip(ids, datas): ...

vval-odoo requested review from a team, jjmaksoud and aj-fuentes June 11, 2025 08:49

Pirols reviewed Jun 11, 2025

View reviewed changes

vval-odoo force-pushed the master-batch-process-big-commands-vval branch from 3752d09 to 327a6f6 Compare June 11, 2025 11:16

aj-fuentes reviewed Jun 11, 2025

View reviewed changes

vval-odoo force-pushed the master-batch-process-big-commands-vval branch from 327a6f6 to 508732d Compare June 13, 2025 12:20

vval-odoo requested a review from aj-fuentes June 20, 2025 07:56

aj-fuentes reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FIX] spreadsheet: batch process `spreadsheet_revision.commands` #284

[FIX] spreadsheet: batch process `spreadsheet_revision.commands` #284

Uh oh!

vval-odoo commented Jun 11, 2025 •

edited

Loading

Uh oh!

robodoo commented Jun 11, 2025

Uh oh!

Pirols left a comment

Uh oh!

Uh oh!

Pirols Jun 11, 2025

Uh oh!

KangOl commented Jun 11, 2025

Uh oh!

aj-fuentes Jun 11, 2025

Uh oh!

aj-fuentes Jun 11, 2025

Uh oh!

aj-fuentes Jun 23, 2025

Uh oh!

Uh oh!

	ncr = pg.named_cursor(cr, itersize=BATCH_SIZE)
	with pg.named_cursor(cr, itersize=BATCH_SIZE) as ncr:

	with pg.named_cursor(cr, itersize=1) as ncr:
	with pg.named_cursor(cr) as ncr:

[FIX] spreadsheet: batch process spreadsheet_revision.commands #284

Are you sure you want to change the base?

[FIX] spreadsheet: batch process spreadsheet_revision.commands #284

Uh oh!

Conversation

vval-odoo commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robodoo commented Jun 11, 2025

Uh oh!

Pirols left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pirols Jun 11, 2025

Choose a reason for hiding this comment

Footnotes

Uh oh!

KangOl commented Jun 11, 2025

Uh oh!

aj-fuentes Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

aj-fuentes Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

aj-fuentes Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[FIX] spreadsheet: batch process `spreadsheet_revision.commands` #284

[FIX] spreadsheet: batch process `spreadsheet_revision.commands` #284

vval-odoo commented Jun 11, 2025 •

edited

Loading