Skip to content

Snippet shown is sometimes a poor representation of the article content #996

@Jaifroid

Description

@Jaifroid

This may be related to #988, but it specifically concerns the snippets extracted. These are supposed to show the most relevant text, with the search term highlighted. However, in many cases, we get quite random text selection and no highlighted search terms. For example, in Full English Wikipedia searching for Kasos Massacre, although the top result is correct, the snippet shown seems completely random:

Image

It would have been much better just to show the lede of the article in this case:

The Kasos massacre was the massacre of Greek civilians during the Greek War of Independence by Ottoman forces after the Greek Christian population rebelled against the Ottoman Empire.

I realize this is low-level, but there may be something wrong with the setup or retrieval parameters given that the vectors to the relevant piece of full text appear so very wrong in many cases. It would almost be better just to extract the lede instead and show it if none of the search terms are actually in the snippet, though that would be a kludgy workaround.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions