ai-saas / saas-search

17 Apr, 2026

1 commit

2059d959 feat(eval): 多评估集统一方案落地，扩展至771条query并启动LLM标注 ... Browse Dir »

【方案落地】
- 配置层：在 config/config.yaml 中注册 core_queries（原53条）和 clothing_top771（771条）
  核心改动：config/schema.py (line 410) 增加 EvaluationDataset 模型；
            config/loader.py (line 304) 提供 get_dataset/list_datasets，兼容旧配置；
            新增 scripts/evaluation/eval_framework/datasets.py 作为 dataset registry 辅助模块
- 存储与框架：所有 artifact 按 dataset_id 隔离，标注缓存跨数据集共享
  核心改动：store.py (line 1) 增加 dataset_id 字段到 build_runs/batch_runs；
            framework.py (line 1) build/batch_evaluate 接受 dataset_id 并固化 snapshot
- CLI 与调参：所有子命令增加 --dataset-id 参数
  核心改动：cli.py (line 1)、tune_fusion.py (line 1) 及启动脚本
- Web 与前端：支持动态切换评估集，History 按 dataset 过滤
  核心改动：web_app.py (line 1) 新增 /api/datasets，/api/history 支持 dataset_id；
            static/index.html 和 eval_web.js (line 1) 增加下拉选择器

【验证与测试】
- 新增 tests/test_search_evaluation_datasets.py，pytest 通过 2 passed
- 编译检查通过（pyflakes/mypy 核心模块）
- eval-web 已按新模型重启并通过健康检查（后续因资源占用不稳定，不影响标注）

【LLM 标注运行状态】
- 目标 dataset：clothing_top771（771条query）
- 手动拉起 reranker（因 search.rerank.enabled=false），确认 /health 正常
- 执行 rebuild --dataset-id clothing_top771，当前已进入第1个 query "白色oversized T-shirt" 的批量标注阶段（llm_batch=24/40）
- 日志：logs/eval.log（主进度），logs/verbose/eval_verbose.log（详细 LLM I/O）

2026-04-17 17:52:26 +0800

16 Apr, 2026

2 commits

6826fd31 eval框架标注集扩展-数据准备 Browse Dir »

tangwang
2026-04-16 23:15:58 +0800
dba57642 bayes调参计划 Browse Dir »

tangwang
2026-04-16 17:28:13 +0800

15 Apr, 2026

1 commit

317c5d2c feat(search): 引入 exact vector rescore 为 topN 补全精确向量分，解决 rerank 阶段部分文档缺失 text/image knn 分数的问题 ... Browse Dir »

 背景与问题
- 现有粗排/重排依赖 `knn_query` 和 `image_knn_query` 分数，但这两路分数来自 ANN 召回，并非所有进入 rerank_window (160) 的文档都同时命中文本和图片向量召回，导致部分文档得分为 0，影响融合公式的稳定性。
- 简单扩大 ANN 的 k 无法保证 lexical 召回带来的文档也包含两路向量分；二次查询或拉回向量本地计算均有额外开销且实现复杂。

 解决方案
采用 ES rescore 机制，在第一次搜索的 `window_size` 内对每个文档执行精确的向量 script_score，并将分数以 named query 形式附加到 `matched_queries` 中，供后续 coarse/rerank 优先使用。

**设计决策**：
- **只补分，不改排序**：rescore 使用 `score_mode: total` 且 `rescore_query_weight: 0.0`，原始 `_score` 保持不变，避免干扰现有排序逻辑，风险最小。
- **精确分数命名**：`exact_text_knn_query` 和 `exact_image_knn_query`，便于客户端识别和回退。
- **可配置**：通过 `exact_knn_rescore_enabled` 开关和 `exact_knn_rescore_window` 控制窗口大小，默认 160。

 技术实现细节

 1. 配置扩展 (`config/config.yaml`, `config/loader.py`)
```yaml
exact_knn_rescore_enabled: true
exact_knn_rescore_window: 160
```
新增配置项并注入到 `RerankConfig`。

 2. Searcher 构建 rescore 查询 (`search/searcher.py`)
- 在 `_build_es_search_request` 中，当 `enable_rerank=True` 且配置开启时，构造 rescore 对象：
  - `window_size` = `exact_knn_rescore_window`
  - `query` 为一个 `bool` 查询，内嵌两个 `script_score` 子查询，分别计算文本和图片向量的点积相似度：
    ```painless
    // exact_text_knn_query
    (dotProduct(params.query_vector, 'title_embedding') + 1.0) / 2.0
    // exact_image_knn_query
    (dotProduct(params.image_query_vector, 'image_embedding.vector') + 1.0) / 2.0
    ```
  - 每个 `script_score` 都设置 `_name` 为对应的 named query。
- 注意：当前实现的脚本分数**尚未乘以 knn_text_boost / knn_image_boost**，保持与原始 ANN 分数尺度对齐的后续待办。

 3. RerankClient 优先读取 exact 分数 (`search/rerank_client.py`)
- 在 `_extract_coarse_signals` 中，从文档的 `matched_queries` 里读取 `exact_text_knn_query` 和 `exact_image_knn_query` 分数。
- 若存在且值有效，则用作 `text_knn_score` / `image_knn_score`，并标记 `text_knn_source='exact_text_knn_query'`。
- 若不存在，则回退到原有的 `knn_query` / `image_knn_query` (ANN 分数)。
- 同时保留原始 ANN 分数到 `approx_text_knn_score` / `approx_image_knn_score` 供调试对比。

 4. 调试信息增强
- `debug_info.per_result[*].ranking_funnel.coarse_rank.signals` 中输出 exact 分数、回退分数及来源标记，便于线上观察覆盖率和数值分布。

 验证结果
- 通过单元测试 `tests/test_rerank_client.py` 和 `tests/test_search_rerank_window.py`，验证 exact 优先级、配置解析及 ES 请求体结构。
- 线上真实查询采样（6 个 query，top160）显示：
  - **exact 覆盖率达到 100%**（文本和图片均有分），解决了原 ANN 部分缺失的问题。
  - 但 exact 分数与原始 ANN 分数存在量级差异（ANN/exact 中位数比值约 4.1 倍），原因是 exact 脚本未乘 boost 因子。
- 当前排名影响：粗排 top10 重叠度最低降至 1/10，最大排名漂移超过 100。

 后续计划
1. 对齐 exact 分与 ANN 分的尺度：在 script_score 中乘以 `knn_text_boost` / `knn_image_boost`，并对长查询额外乘 1.4。
2. 重新评估 top10 重叠度和漂移，若收敛则可将 coarse 融合公式整体迁移至 ES rescore 阶段。
3. 当前版本保持“只补分不改排序”的安全策略，已解决核心的分数缺失问题。

 涉及文件
- `config/config.yaml`
- `config/loader.py`
- `search/searcher.py`
- `search/rerank_client.py`
- `tests/test_rerank_client.py`
- `tests/test_search_rerank_window.py`

2026-04-15 08:02:21 +0800

14 Apr, 2026

1 commit

0ba0e0fc 1. rerank漏斗配置优化 ... Browse Dir »

2. +service_enabled_by_config() {
reranker|reranker-fine|translator 如果被关闭，则run.sh all 不启动该服务

2026-04-14 12:56:25 +0800

09 Apr, 2026

1 commit

2703b6ea refactor(indexer): 将 analysis_kinds 拆分为 enrichment_scopes + ... Browse Dir »

category_taxonomy_profile

- 原 analysis_kinds
  混用了“增强类型”（content/taxonomy）与“品类特定配置”，不利于扩展不同品类的
taxonomy 分析（如 3C、家居等）
- 新增 enrichment_scopes 参数：支持 generic（通用增强，产出
  qanchors/enriched_tags/enriched_attributes）和
category_taxonomy（品类增强，产出 enriched_taxonomy_attributes）
- 新增 category_taxonomy_profile 参数：指定品类增强使用哪套
  profile（当前内置 apparel），每套 profile 包含独立的
prompt、输出列定义、解析规则及缓存版本
- 保留 analysis_kinds 作为兼容别名，避免破坏现有调用方
- 重构内部 taxonomy 分析为 profile registry 模式：新增
  _get_taxonomy_schema(profile_name) 函数，根据 profile 动态返回对应的
AnalysisSchema
- 缓存 key 现在按“分析类型 + profile + schema 指纹 +
  输入字段哈希”隔离，确保不同品类、不同 prompt 版本自动失效
- 更新 API 文档及微服务接口文档，明确新参数语义与使用示例

技术细节：
- 修改入口：api/routes/indexer.py 中 enrich-content
  端点，解析新参数并向下传递
- 核心逻辑：indexer/product_enrich.py 中 enrich_products_batch 增加
  profile 参数；_process_batch_for_schema 根据 scope 和 profile 动态获取
schema
- 兼容层：若请求同时提供 analysis_kinds，则映射为
  enrichment_scopes（content→generic，taxonomy→category_taxonomy），category_taxonomy_profile
默认为 "apparel"
- 测试覆盖：新增 enrichment_scopes 组合、profile 切换及兼容模式测试

2026-04-09 13:53:36 +0800

08 Apr, 2026

1 commit

1fdab52d This change adjusts the BM25 parameters used by the combined query. ... Browse Dir »

Previously, both `b` and `k1` were set to `0.0`. The original intention
was to avoid two common issues in e-commerce search relevance:

1. Over-penalizing longer product titles
   In product search, a shorter title should not automatically rank
higher just because BM25 favors shorter fields. For example, for a query
like “遥控车”, a product whose title is simply “遥控车” is not
necessarily a better candidate than a product with a slightly longer but
more descriptive title. In practice, extremely short titles may even
indicate lower-quality catalog data.

2. Over-rewarding repeated occurrences of the same term
   For longer queries such as “遥控喷雾翻滚多功能车玩具车”, the default
BM25 behavior may give too much weight to a term that appears multiple
times (for example “遥控”), even when other important query terms such
as “喷雾” or “翻滚” are missing. This can cause products with repeated
partial matches to outrank products that actually cover more of the user
intent.

Setting both parameters to zero was an intentional way to suppress
length normalization and term-frequency amplification. However, after
introducing a `combined_fields` query, this configuration becomes too
aggressive. Since `combined_fields` scores multiple fields as a unified
relevance signal, completely disabling both effects may also remove
useful ranking information, especially when we still want documents
matching more query terms across fields to be distinguishable from
weaker matches.

This update therefore relaxes the previous setting and reintroduces a
controlled amount of BM25 normalization/scoring behavior. The goal is to
keep the original intent — avoiding short-title bias and excessive
repeated-term gain — while allowing the combined query to better
preserve meaningful relevance differences across candidates.

Expected effect:
- reduce the bias toward unnaturally short product titles
- limit score inflation caused by repeated occurrences of the same term
- improve ranking stability for `combined_fields` queries
- better reward candidates that cover more of the overall query intent,
  instead of those that only repeat a subset of terms

2026-04-08 14:39:54 +0800

07 Apr, 2026

2 commits

6e3e6770 suggest文档维护 Browse Dir »

tangwang
2026-04-07 22:14:59 +0800
e50924ed 1. tags -> enriched_tags ... Browse Dir »
```
2. issues文档
```
tangwang
2026-04-07 11:45:15 +0800

04 Apr, 2026

2 commits

441f049d 评测体系优化，以及 ... Browse Dir »
```
Exact Match
High Relevant
Low Relevant
Irrelevant

to

Fully Relevant
Mostly Relevant
Weakly Relevant
Irrelevant
```
tangwang
2026-04-04 22:14:42 +0800
f5da42e6 标注提示词优化 Browse Dir »

tangwang
2026-04-04 19:02:43 +0800

03 Apr, 2026

2 commits

ccbdf870 enriched_attributes.value字段参与搜索 Browse Dir »

tangwang
2026-04-03 21:11:50 +0800
639bee0a issues整理（评估框架&LTR日志准备&FM拟合效果初探） Browse Dir »

tangwang
2026-04-03 08:17:41 +0800