08 Apr, 2026
1 commit
-
Previously, both `b` and `k1` were set to `0.0`. The original intention was to avoid two common issues in e-commerce search relevance: 1. Over-penalizing longer product titles In product search, a shorter title should not automatically rank higher just because BM25 favors shorter fields. For example, for a query like “遥控车”, a product whose title is simply “遥控车” is not necessarily a better candidate than a product with a slightly longer but more descriptive title. In practice, extremely short titles may even indicate lower-quality catalog data. 2. Over-rewarding repeated occurrences of the same term For longer queries such as “遥控喷雾翻滚多功能车玩具车”, the default BM25 behavior may give too much weight to a term that appears multiple times (for example “遥控”), even when other important query terms such as “喷雾” or “翻滚” are missing. This can cause products with repeated partial matches to outrank products that actually cover more of the user intent. Setting both parameters to zero was an intentional way to suppress length normalization and term-frequency amplification. However, after introducing a `combined_fields` query, this configuration becomes too aggressive. Since `combined_fields` scores multiple fields as a unified relevance signal, completely disabling both effects may also remove useful ranking information, especially when we still want documents matching more query terms across fields to be distinguishable from weaker matches. This update therefore relaxes the previous setting and reintroduces a controlled amount of BM25 normalization/scoring behavior. The goal is to keep the original intent — avoiding short-title bias and excessive repeated-term gain — while allowing the combined query to better preserve meaningful relevance differences across candidates. Expected effect: - reduce the bias toward unnaturally short product titles - limit score inflation caused by repeated occurrences of the same term - improve ranking stability for `combined_fields` queries - better reward candidates that cover more of the overall query intent, instead of those that only repeat a subset of terms
07 Apr, 2026
2 commits
-
2. issues文档
04 Apr, 2026
2 commits
-
Exact Match High Relevant Low Relevant Irrelevant to Fully Relevant Mostly Relevant Weakly Relevant Irrelevant
03 Apr, 2026
2 commits