Skip to content

we only set scores on "high scoring" posts, why? #15

@Dieterbe

Description

@Dieterbe

get_high_potential_ids_from_filter_results takes a score_threshold, and will only update the records for posts that have a sufficiently high score.
I wonder why not simply update all the rows, even for the "low scores", just to have a complete representation in the sqlite DB? after all, updating these records should be cheap and quick anyway.

I suppose right now I could just reduce the threshold, and the whole pipeline again, but there is no easy way to trigger reprocessing of the batch files I think, so i would have to re-scrape and re-process them with openAI which seems a bit silly, so again seems simpler to just update all sqlite records

not sure if my query is correct, but seems only about 10% of my posts have scores:

sqlite> select count(relevance_score) from posts;
1309
sqlite> select count(*) from posts;
11897

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions