-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
get_high_potential_ids_from_filter_results takes a score_threshold, and will only update the records for posts that have a sufficiently high score.
I wonder why not simply update all the rows, even for the "low scores", just to have a complete representation in the sqlite DB? after all, updating these records should be cheap and quick anyway.
I suppose right now I could just reduce the threshold, and the whole pipeline again, but there is no easy way to trigger reprocessing of the batch files I think, so i would have to re-scrape and re-process them with openAI which seems a bit silly, so again seems simpler to just update all sqlite records
not sure if my query is correct, but seems only about 10% of my posts have scores:
sqlite> select count(relevance_score) from posts;
1309
sqlite> select count(*) from posts;
11897
Metadata
Metadata
Assignees
Labels
No labels