Skip to content

[FIX] spreadsheet: batch process spreadsheet_revision.commands #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vval-odoo
Copy link

@vval-odoo vval-odoo commented Jun 11, 2025

[FIX] spreadsheet: batch process spreadsheet_revision.commands

Some dbs have spreadsheet_revision records with over 10 millions characters in commands. If the number of record is high, this leads to memory errors here. We distribute them in buckets of memory_cap maximum size, and use a named cursor to process them in buckets. Commands larger than memory_cap fit in one bucket.

select count(*),SUM(CHAR_LENGTH(commands)) as characters from spreadsheet_revision where commands like any (array['%"DELETE_SHEET"%', '%"MOVE_COLUMNS_ROWS"%', '%"REMOVE_COLUMNS_ROWS"%', '%"ADD_COLUMNS_ROWS"%', '%"RENAME_SHEET"%'])
+-------+------------+
| count | characters |
|-------+------------|
| 2317  | 3419017646 |
+-------+------------+
select id,CHAR_LENGTH(commands) from spreadsheet_revision where commands like any (array['%"DELETE_SHEET"%', '%"MOVE_COLUMNS_ROWS"%', '%"REMOVE_COLUMNS_ROWS"%', '%"ADD_COLUMNS_ROWS"%', '%"RENAME_SHEET"%']) order by CHAR_LENGTH(commands) desc limit 10
+------+-------------+
| id   | char_length |
|------+-------------|
| 3871 | 13197157    |
| 4788 | 13197157    |
| 3290 | 13197157    |
| 3557 | 13197157    |
| 4179 | 13197157    |
| 4481 | 13197157    |
| 2757 | 13197157    |
| 3022 | 13197157    |
| 2492 | 13197157    |
| 5097 | 13197157    |
+------+-------------+

Fixes upg-2899961

@robodoo
Copy link
Contributor

robodoo commented Jun 11, 2025

Pull request status dashboard

@vval-odoo vval-odoo requested review from a team, jjmaksoud and aj-fuentes June 11, 2025 08:49
Copy link
Contributor

@Pirols Pirols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! :)



def iter_commands(cr, like_all=(), like_any=()):
if not (bool(like_all) ^ bool(like_any)):
raise ValueError("Please specify `like_all` or `like_any`, not both")
cr.execute(
ncr = pg.named_cursor(cr, itersize=BATCH_SIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a context manager you do not need to close it explicitely1.

Suggested change
ncr = pg.named_cursor(cr, itersize=BATCH_SIZE)
with pg.named_cursor(cr, itersize=BATCH_SIZE) as ncr:

That said, this is just in the name of a more pythonic implementation. IOW: imo you can keep your current version, if you like it better.

Footnotes

  1. https://github.com/odoo/odoo//blob/bdc28e53b7c426640e30b900064442f19c946d9e/odoo/sql_db.py#L221-L238

@vval-odoo vval-odoo force-pushed the master-batch-process-big-commands-vval branch from 3752d09 to 327a6f6 Compare June 11, 2025 11:16
@KangOl
Copy link
Contributor

KangOl commented Jun 11, 2025

upgradeci retry with always only *spreadsheet*

Comment on lines 10 to 13
SELECT id,
commands
FROM spreadsheet_revision
WHERE commands LIKE {}(%s::text[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the formatting.

Suggested change
SELECT id,
commands
FROM spreadsheet_revision
WHERE commands LIKE {}(%s::text[])
SELECT id,
commands
FROM spreadsheet_revision
WHERE commands LIKE {}(%s::text[])

if "commands" not in data_loaded:
continue
data_old = json.dumps(data_loaded, sort_keys=True)
with pg.named_cursor(cr, itersize=1) as ncr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can either leave the default itersize, or optimize the value to something that would work depending on the data. Another alternative is to use fetchmany directly.

Suggested change
with pg.named_cursor(cr, itersize=1) as ncr:
with pg.named_cursor(cr) as ncr:

Some dbs have `spreadsheet_revision` records with over 10 millions characters
in `commands`. If the number of record is high, this leads to memory errors.
We distribute them in buckets of `memory_cap` maximum size, and use a named
cursor to process them in buckets. Commands larger than `memory_cap` fit in one
bucket.
@vval-odoo vval-odoo force-pushed the master-batch-process-big-commands-vval branch from 327a6f6 to 508732d Compare June 13, 2025 12:20
@vval-odoo vval-odoo requested a review from aj-fuentes June 20, 2025 07:56
Comment on lines +13 to +52
start_bucket AS (
SELECT 1 AS bucket
),
ordered_rows AS (
SELECT id,
LENGTH(commands) AS length,
ROW_NUMBER() OVER (ORDER BY LENGTH(commands), id) AS rownum
FROM spreadsheet_revision
WHERE commands LIKE {}(%s::text[])
),
assign AS (
SELECT o.id AS id,
o.length,
o.rownum,
sb.bucket AS bucket,
o.length AS sum
FROM ordered_rows o, start_bucket sb
WHERE o.rownum = 1

UNION ALL

SELECT o.id AS id,
o.length,
o.rownum,
CASE
WHEN a.sum + o.length > {memory_cap} THEN a.bucket + 1
ELSE a.bucket
END AS bucket,
CASE
WHEN a.sum + o.length > {memory_cap} THEN o.length
ELSE a.sum + o.length
END AS sum
FROM assign a
JOIN ordered_rows o
ON o.rownum = a.rownum + 1
)
SELECT count(*),ARRAY_AGG(id)
FROM assign
GROUP BY bucket
ORDER BY bucket;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified. You only need for each id the number of the bucket where it would be. In case one length is longer than one bucket you want to keep the record alone in it's bucket to avoid some smaller records making the group take even more space.
You can return the ids and commands already per bucket. That way you can iterate over the named cursor one by one.

Example query: (/!\warning: untested/!)

  buckets AS (
      SELECT id,
             mod(SUM(length(commands) OVER (ORDER BY id), %(bucket_size)s) AS num,
             length(commands) > %(bucket_size)s AS alone
        FROM spreadsheet_revision
       WHERE commands LIKE {}(%(filters)s::text[])
    ORDER BY id
  )
  SELECT ARRAY_AGG(id ORDER BY id)
         ARRAY_AGG(commands ORDER BY id)
    FROM buckets
GROUP BY num, alone

then

for ids, datas in ncr:  # you can iterate one by one here
    for id, data in zip(ids, datas):
        ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants