Skip to content

[Healthcheck] Size of the elasticsuite_tracker_log_customer_link table and "anonymization" configuration #3559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rbayet opened this issue Mar 14, 2025 · 2 comments
Assignees
Labels

Comments

@rbayet
Copy link
Collaborator

rbayet commented Mar 14, 2025

Is your feature request related to a problem? Please describe.
The table elasticsuite_tracker_log_customer_link keeps track of links between the a visitor session.uid (session id) and session.vid (visitor) id and the customer id if the visitor is logged.

MariaDB [magento2]> show columns from elasticsuite_tracker_log_customer_link;
+--------------+------------------+------+-----+---------+-------+
| Field        | Type             | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| customer_id  | int(10) unsigned | NO   | PRI | NULL    |       |
| session_id   | varchar(255)     | NO   | PRI | NULL    |       |
| visitor_id   | varchar(255)     | NO   | PRI | NULL    |       |
| delete_after | datetime         | YES  |     | NULL    |       |
+--------------+------------------+------+-----+---------+-------+
4 rows in set (0.001 sec)

By default, the data is supposed to be removed after 365 days ("Stores > Configuration > Elasticsuite > Tracking > Tracking Anonymization > Anonymization Delay") but only if "anonymization" is enabled ("Stores > Configuration > Elasticsuite > Tracking > Tracking Anonymization > Anonymize customer data after a delay.").
And by default, this setting is set to "No".

So it can happen on sites with a huge traffic load, that this table grows and grows and grows.
And Elasticsuite (both OpenSource and Premium versions) do not use this relation table (anymore ?).

Describe the solution you'd like
Introduce a new healthcheck test that looks at the settings and the size (number of rows) of the table.
[1] But the number of rows we are interested in are the number of rows with a "delete_after" in the past.

  • If "Stores > Configuration > Elasticsuite > Tracking > Tracking Anonymization > Anonymize customer data after a delay." is set to Yes, the test passed with message
    • "Elasticsuite is configured to periodically cleanup the tracker customer link table (elasticsuite_tracker_log_customer_link)"
  • If "Stores > Configuration > Elasticsuite > Tracking > Tracking Anonymization > Anonymize customer data after a delay." is set to No, the test is failed
    • with a severity of Notice if the number of rows [1] are below 100k rows
      • with message "Elasticsuite is not configured to periodically cleanup the tracker customer link table (elasticsuite_tracker_log_customer_link) and it already contains %rows. You should consider enabling the automatic cleanup by switching "Anonymize customer data after a delay." to "Yes"."
    • with a severity of Warning if the number of rows is greater than 100k rows but lower than 1M rows
      • with the same message as above
    • with a severity of Critical if the number of rows exceed 1M rows
      • with message "Elasticsuite is not configured to periodically cleanup the tracker customer link table (elasticsuite_tracker_log_customer_link) and it already contains %rows. You should consider enabling the automatic cleanup by switching "Anonymize customer data after a delay." to "Yes". The first cleanup occurring at midnight might take some time"
    • with a severity of Critical if the number of rows exceed 10M rows
      • with message "Elasticsuite is not configured to periodically cleanup the tracker customer link table (elasticsuite_tracker_log_customer_link) and it already contains a massive number of rows (%rows). You should consider enabling the automatic cleanup by switching "Anonymize customer data after a delay." to "Yes". But considering the number of rows, we recommend that you manually truncate the table before enabling the automatic cleanup to avoid a database hangup at the first midnight occurring cleanup."
@rbayet
Copy link
Collaborator Author

rbayet commented Mar 14, 2025

Assigned to you @romainruaud for validating severity/actions to recommend

@romainruaud
Copy link
Collaborator

LGTM !

@vahonc you can implement

@romainruaud romainruaud assigned vahonc and unassigned romainruaud Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants