Back to Browse

MSCI 541 : BM25

3.5K views
Jul 8, 2021
25:56

As presented in this video, BM25 can return negative values if we have very frequent terms, or a doc with only very frequent terms.  One solution to this is to compute IDF by adding 1 before taking the log: log( (N-n_i+0.5)/(n_i+0.5) + 1)  As is done in Lucene: https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/ You can see other approaches and formulations of BM25 here:  https://cs.uwaterloo.ca/~jimmylin/publications/Kamphuis_etal_ECIR2020_preprint.pdf

Download

1 formats

Video Formats

360pmp469.9 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

MSCI 541 : BM25 | NatokHD