1. 目前检索系统评测的主要指标是这几个
            "NDCG@20, NDCG@50, ERR@10, Strong_Precision@10, Strong_Precision@20, "
参考_err_at_k，计算逻辑好像没问题
现在的问题是，ERR 指标跟其他几个指标好像经常有相反的趋势。请再分析他是否适合作为主指标之一，目前有什么问题。

2. 目前bm25参数是：
"b": 0.1, 
"k1": 0.3
对应的基线是 /data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260408T055948Z_00b6a8aa3d.md （Primary_Metric_Score: 0.604555

） 

（比之前b和k1都设置为0好了很多，之前都设置为0的情况：/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260407T150946Z_00b6a8aa3d.md
 Primary_Metric_Score: 0.602598

）

这两个参数从0改为0.1/0.3的背景是：
This change adjusts the BM25 parameters used by the combined query.

Previously, both `b` and `k1` were set to `0.0`. The original intention was to avoid two common issues in e-commerce search relevance:

1. Over-penalizing longer product titles  
   In product search, a shorter title should not automatically rank higher just because BM25 favors shorter fields. For example, for a query like “遥控车”, a product whose title is simply “遥控车” is not necessarily a better candidate than a product with a slightly longer but more descriptive title. In practice, extremely short titles may even indicate lower-quality catalog data.

2. Over-rewarding repeated occurrences of the same term  
   For longer queries such as “遥控喷雾翻滚多功能车玩具车”, the default BM25 behavior may give too much weight to a term that appears multiple times (for example “遥控”), even when other important query terms such as “喷雾” or “翻滚” are missing. This can cause products with repeated partial matches to outrank products that actually cover more of the user intent.

Setting both parameters to zero was an intentional way to suppress length normalization and term-frequency amplification. However, after introducing a `combined_fields` query, this configuration becomes too aggressive. Since `combined_fields` scores multiple fields as a unified relevance signal, completely disabling both effects may also remove useful ranking information, especially when we still want documents matching more query terms across fields to be distinguishable from weaker matches.

This update therefore relaxes the previous setting and reintroduces a controlled amount of BM25 normalization/scoring behavior. The goal is to keep the original intent — avoiding short-title bias and excessive repeated-term gain — while allowing the combined query to better preserve meaningful relevance differences across candidates.

Expected effect:
- reduce the bias toward unnaturally short product titles
- limit score inflation caused by repeated occurrences of the same term
- improve ranking stability for `combined_fields` queries
- better reward candidates that cover more of the overall query intent, instead of those that only repeat a subset of terms


因为实验有效，因此帮我继续进行实验

请帮我再进行这四轮实验，对比效果，优化bm25参数：
{ "b": 0.10, "k1": 0.30 }
{ "b": 0.20, "k1": 0.60 }
{ "b": 0.50, "k1": 1.0 } 
{ "b": 0.10, "k1": 0.75 }

参考修改索引级设置的方法：（ BM25 `similarity.default`） 

`mappings/search_products.json` 里的 `settings.similarity` 只在**创建新索引**时生效；**已有索引**需先关闭索引，再 `PUT _settings`，最后重新打开。

**适用场景**：调整默认 BM25 的 `b`、`k1`（例如与仓库映射对齐：`b: 0.1`、`k1: 0.3`）。

```bash
# 按需替换：索引名、账号密码、ES 地址
INDEX="search_products_tenant_163"
AUTH='saas:4hOaLaf41y2VuI8y'
ES="http://localhost:9200"

# 1) 关闭索引（写入类请求会失败，注意维护窗口）
curl -s -u "$AUTH" -X POST "$ES/${INDEX}/_close"

# 2) 更新设置（仅示例：与 mappings 中 default 一致时可照抄）
curl -s -u "$AUTH" -X PUT "$ES/${INDEX}/_settings" \
  -H 'Content-Type: application/json' \
  -d '{
  "index": {
    "similarity": {
      "default": {
        "type": "BM25",
        "b": 0.1,
        "k1": 0.3
      }
    }
  }
}'

# 3) 重新打开索引
curl -s -u "$AUTH" -X POST "$ES/${INDEX}/_open"
```

**检查是否生效**：

```bash
curl -s -u "$AUTH" -X GET "$ES/${INDEX}/_settings?filter_path=**.similarity&pretty"
```

期望在响应中看到 `similarity.default` 的 `type`、`b`、`k1`（API 可能将数值以字符串形式返回，属正常）。

**多租户批量**：先列出索引，再对每个 `search_products_tenant_*` 重复上述 close → settings → open。

```bash
curl -s -u "$AUTH" -X GET "$ES/_cat/indices/search_products_tenant_*?h=index&v"
```


实验结果：
已完成四轮实验，并把 `search_products_tenant_163` 当前索引参数留在了最优组：`b=0.5, k1=1.0`。刚刚已校验生效。

对比结果如下：

- 基线当前线上组 `b=0.10, k1=0.30`：`0.604555`
  报告：[batch_20260408T055948Z_00b6a8aa3d.md](/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260408T055948Z_00b6a8aa3d.md)
- 复跑 `b=0.10, k1=0.30`：`0.604298`
  报告：[batch_20260408T064011Z_00b6a8aa3d.json](/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260408T064011Z_00b6a8aa3d.json)
- 实验 `b=0.20, k1=0.60`：`0.604274`
  报告：[batch_20260408T064123Z_00b6a8aa3d.json](/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260408T064123Z_00b6a8aa3d.json)
- 实验 `b=0.50, k1=1.0`：`0.606041`
  报告：[batch_20260408T064216Z_00b6a8aa3d.json](/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260408T064216Z_00b6a8aa3d.json)
- 实验 `b=0.10, k1=0.75`：`0.603438`
  报告：[batch_20260408T064304Z_00b6a8aa3d.json](/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260408T064304Z_00b6a8aa3d.json)
- 老基线 `b=0.0, k1=0.0`：`0.602598`
  报告：[batch_20260407T150946Z_00b6a8aa3d.md](/data/saas-search/artifacts/search_evaluation/batch_reports/batch_20260407T150946Z_00b6a8aa3d.md)

结论是这四组里 `b=0.50, k1=1.0` 最好，相比你给的当前基线 `0.604555` 提升了 `+0.001486`，相比 `0/0` 提升了 `+0.003443`。`0.20/0.60` 基本和 `0.10/0.30` 持平，`0.10/0.75` 反而更差，说明这轮里单独加大 `k1` 不如同时把 `b` 和 `k1` 一起放宽。

补充一点：我只改了现有索引 `search_products_tenant_163` 的 index-level setting，没有改仓库里的默认映射。[search_products.json](/data/saas-search/mappings/search_products.json) 现在仍是 `0.1/0.3`。如果你要，我下一步可以直接帮你把映射默认值也改成 `0.5/1.0`，并顺手批量同步其他 `search_products_tenant_*` 索引。