ai-saas / saas-search

27 Mar, 2026

4 commits

8c8b9d84 ES 拉取 coarse_rank.input_window 条 -> 粗排按 text/knn 融合裁到 ... Browse File »

coarse_rank.output_window -> 再做 SKU 选择和 title suffix ->
精排调用轻量 reranker 裁到 fine_rank.output_window -> 最终重排调用现有
reranker，并在最终融合里加入 fine_score。同时把 reranker client/provider
改成了按 service_profile 选不同 service_url，这样 fine/final
可以共用同一套服务代码，只起不同实例。

2026-03-27 17:56:04 +0800

ed13851c 图片文本两个knn召回相关参数配置 Browse File »

tangwang
2026-03-27 11:58:00 +0800

24edc208 修改_extract_combined_knn_score相关的代码以及配置， ... Browse File »

重排融合：之前有knn的配置bias和exponential。现在，文本和图片的embedding相似需要融合，融合方式是dis_max，因此需要配置：
1）各自的权重和tie_breaker
2）整个向量方面的权重（bias和exponential）

2026-03-27 08:33:16 +0800

dc403578 多模态搜索 Browse File »

tangwang
2026-03-27 08:11:35 +0800

26 Mar, 2026

1 commit

93be98cb 清理过时的文档 Browse File »

tangwang
2026-03-26 22:18:31 +0800

25 Mar, 2026

1 commit

87cacb1b 融合公式优化。加入意图匹配因子 Browse File »

tangwang
2026-03-25 10:58:56 +0800

24 Mar, 2026

5 commits

74fdf9bd 1. ... Browse File »

加了一个过滤/降权词典，query中有独立分词匹配到指定的触发词，将过滤带某些分词的商品（比如fitted/修身，过滤宽松、loose、relaxed、baggy,slouchy等商品）
2. reranker的query使用翻译后的

2026-03-24 22:54:38 +0800

2efad04b 意图匹配的性能优化： ... Browse File »

上面一版实现，性能上完全无法接受。因此进行了一轮策略简化

style_sku_prepare_hits阶段耗时太长。请根据需求，思考优化的方法，给出性能优化的方案。
1.
_select_by_embedding，有缓存吗，option_value的值是有限的，之前已经算过的，就不用再算了。不仅仅是embedding相似的结果，整个option_value的匹配结果，是有包含、还是没包含，相似度多少，都不用重新计算。比如之前已经有一个sku的某个属性值叫做“卡其色”，已经算出来是否文本匹配了，那么不需要再去做文本匹配。如果已经算出来向量的相似度，那么不需要再去取向量以及计算相似度。
2. 匹配可以适当的优化：
匹配流程简化：
1）找到第一个文本匹配的，如果有直接匹配成功。不需要考虑匹配多个的情况。
2）如果全部都没有匹配，那么进行embedding筛选。

匹配规则：
option_name的匹配，直接看规范化后的option_name是不是意图维度的泛化词之一（比如颜色、color、colour），如果没有匹配的，现在应该是把所有维度都算上，这样匹配成本和比较成本太高了，去掉这些逻辑，这种情况不需要加后缀、不需要选择sku。
ption_value的匹配。意图检测的时候，有匹配的query中的命中的词，这个词被包含在属性值中，那么就算匹配。属性值被包含在query（包括翻译文本）中，也算匹配。提高匹配的覆盖率。

3.
这一阶段得到sku选择的结果即可（选中的sku的id，也可以为空值表示没找到匹配成功的，这种情况不需要拼接title后缀给重排输入），但是不用着急做image_url的替换和sku的置顶。等最后填充的时候判断有选中sku的时候直接做替换和置顶即可。
请你思考如何进行设计，提高性能的时候不带来复杂度的提升，可以适当的重构以降低修改后的代码行数。
@search/sku_intent_selector.py @query/style_intent.py

2026-03-24 15:58:18 +0800

814e352b 乘法公式配置化 Browse File »

tangwang
2026-03-24 12:44:11 +0800

581dafae debug工具，每条结果的打分中间过程展示 ... Browse File »

The backend now exposes a structured debug_info that is much closer to
the real ranking pipeline:

query_analysis now includes index_languages, query_tokens, query-vector
summary, translation/enrichment plan, and translation debug.
query_build now explains the ES recall plan: base-language clause,
translated clauses, filters vs post-filters, KNN settings,
function-score config, and related inputs.
es_request distinguishes the logical DSL from the actual body sent to
ES, including rerank prefetch _source.
es_response now includes the initial ES ranking window stats used for
score interpretation.
rerank now includes execution state, templates, rendered rerank query
text, window/top_n, service/meta, and the fusion formula.
pagination now shows rerank-window fetch vs requested page plus
page-fill details.
For each result in debug_info.per_result, ranking debug is now much
richer:

initial rank and final rank
raw ES score
es_score_normalized = raw score / initial ES window max
es_score_norm = min-max normalization over the initial ES window
explicit normalization notes explaining that fusion does not directly
consume an ES-normalized score
rerank input details: doc template, title suffix, template field values,
doc preview/length
fusion breakdown: rerank_factor, text_factor, knn_factor, constants, raw
inputs, final fused score
text subcomponents: source/translation/weighted/primary/support/fallback
evidence via matched_queries
richer style-intent SKU debug, including selected SKU summary and intent
texts

2026-03-24 11:30:35 +0800

8ae95af0 1. Stage Timings: 为每个阶段耗时补充起止时间戳。 ... Browse File »
```
2， 漏了一些重要的stage，比如「款式意图 SKU
预筛选（StyleSkuSelector.prepare_hits）」，补上这个stage
```
tangwang
2026-03-24 09:05:47 +0800

23 Mar, 2026

3 commits

cda1cd62 意图分析&应用 baseline Browse File »

tangwang
2026-03-23 22:35:20 +0800
35da3813 中英混写query的优化逻辑，不适合新的combined_fields+best_fields+phrase查询方式，带来的复杂度较多，清理该部分逻辑 Browse File »

tangwang
2026-03-23 17:12:01 +0800

e756b18e 重构了文本召回构建器，现在每个 base_query / base_query_trans_* ... Browse File »

子句都变成了一个带有以下结构的命名布尔查询：

must：combined_fields

should：加权后的 best_fields 和 phrase 子句

主要改动位于
search/es_query_builder.py，但此次调整沿用了现有语言路由设计，并未引入一次性分支。额外的
should 子句权重现在通过
config/schema.py、config/loader.py、search/searcher.py 以及
config/config.yaml 进行配置驱动，从而保持结构的集中管理。

2026-03-23 14:45:06 +0800

22 Mar, 2026

2 commits

0536222c query parser优化 Browse File »

tangwang
2026-03-22 18:30:05 +0800
ef5baa86 混杂语言处理 Browse File »

tangwang
2026-03-22 14:16:39 +0800

20 Mar, 2026

3 commits

a7cc9078 sku排序 Browse File »

tangwang
2026-03-20 17:02:19 +0800

deccd68a Added the SKU pre-selection step in search/searcher.py right before ... Browse File »

ResultFormatter.format_search_results() runs.

What changed:

For each final paginated SPU hit, the searcher now scans
skus[].option1_value against the query text set built from the original
query, normalized query, rewritten query, and translations.
If no option1_value matches textually, it falls back to embedding
similarity and picks the SKU with the highest inner product against the
query embedding.
The matched SKU is promoted to the front of the SPU’s skus list.
The SPU-level image_url is replaced with that matched SKU’s image_src.
I left api/result_formatter.py unchanged because it already preserves
the SKU order and reads image_url from _source; updating the page hits
in searcher makes the formatter return the desired result automatically.

Verification:

ReadLints on the edited files: no errors
Passed targeted tests:
pytest tests/test_search_rerank_window.py -k "translated_query or
no_direct_option_match"

2026-03-20 16:31:46 +0800

b754fd41 图片向量化支持优先级参数 Browse File »

tangwang
2026-03-20 11:59:57 +0800

18 Mar, 2026

3 commits

c90f80ed 相关性优化 Browse File »

tangwang
2026-03-18 16:44:27 +0800
a8261ece 检索效果优化 Browse File »

tangwang
2026-03-18 10:55:57 +0800

a47416ec 把融合逻辑改成乘法公式，并把 ES 命名子句分数回传链路补上了。 ... Browse File »

核心改动在 rerank_client.py (line 99)：fuse_scores_and_resort 现在按
rerank * knn * text 的平滑乘法公式计算，优先从 hit["matched_queries"]
里取 base_query 和 knn_query，并把 _text_score / _knn_score
一并写回调试字段。为了让 KNN 也有名字，我给 top-level knn 加了 name:
"knn_query"，见 es_query_builder.py (line 273)。搜索执行时会在 rerank
窗口内打开 include_named_queries_score，并在显式排序时加上
track_scores，见 searcher.py (line 400) 和 es_client.py (line 224)。

2026-03-18 10:24:05 +0800

13 Mar, 2026

3 commits

af827ce9 rerank Browse File »

tangwang
2026-03-13 23:21:51 +0800
33f8f578 tidy Browse File »

tangwang
2026-03-13 22:59:54 +0800
985752f5 1. 前端调试功能 ... Browse File »
```
2. 翻译限速 对应处理（qwen-mt限速）
```
tangwang
2026-03-13 16:15:06 +0800

12 Mar, 2026

4 commits

d31c7f65 补充云服务reranker Browse File »

tangwang
2026-03-12 12:53:08 +0800
a99e62ba 记录各阶段耗时 Browse File »

tangwang
2026-03-12 11:42:49 +0800
c51d254f 性能测试 Browse File »

tangwang
2026-03-12 10:28:43 +0800
5f7d7f09 性能测试报告.md Browse File »

tangwang
2026-03-12 08:44:55 +0800

11 Mar, 2026

2 commits

28e57bb1 日志体系优化 Browse File »

tangwang
2026-03-11 23:04:17 +0800
7fbca0d7 启动脚本优化 Browse File »

tangwang
2026-03-11 19:23:57 +0800

10 Mar, 2026

4 commits

bcada818 last Browse File »

tangwang
2026-03-10 16:17:18 +0800

bd96cead 1. 动态多语言字段与统一策略配置 ... Browse File »

- 配置改为“字段基名 + 动态语言后缀”方案，已不再依赖旧 `indexes`。
[config.yaml](/data/saas-search/config/config.yaml#L17)
- `search_fields` / `text_query_strategy` 已进入强校验与解析流程。
[config_loader.py](/data/saas-search/config/config_loader.py#L254)

2. 查询语言计划与翻译等待策略
- `QueryParser` 现在产出
  `query_text_by_lang`、`search_langs`、`source_in_index_languages`。
[query_parser.py](/data/saas-search/query/query_parser.py#L41)
- 你要求的两种翻译路径都在：
  - 源语言不在店铺 `index_languages`：`translate_multi_async` + 等待
    future
  - 源语言在 `index_languages`：`translate_multi(...,
    async_mode=True)`，尽量走缓存
[query_parser.py](/data/saas-search/query/query_parser.py#L284)

3. ES 查询统一文本策略（无 AST 分支）
- 主召回按 `search_langs` 动态拼 `field.{lang}`，翻译语种做次权重
  `should`。
[es_query_builder.py](/data/saas-search/search/es_query_builder.py#L454)
- 布尔 AST 路径已删除，仅保留统一文本策略。
[es_query_builder.py](/data/saas-search/search/es_query_builder.py#L185)

4. LanguageDetector 优化
- 从“拉丁字母默认英文”升级为：脚本优先 +
  拉丁语系打分（词典/变音/后缀）。
[language_detector.py](/data/saas-search/query/language_detector.py#L68)

5. 布尔能力清理（补充）
- 已删除废弃模块：
[boolean_parser.py](/data/saas-search/search/boolean_parser.py)
- `search/__init__` 已无相关导出。
[search/__init__.py](/data/saas-search/search/__init__.py)

6. `indexes` 过时收口（补充）
- 兼容函数改为基于动态字段生成，不再依赖 `config.indexes`。
[utils.py](/data/saas-search/config/utils.py#L24)
- Admin 配置接口改为返回动态字段配置，不再暴露 `num_indexes`。
[admin.py](/data/saas-search/api/routes/admin.py#L52)

7. suggest

2026-03-10 16:06:31 +0800

24e92141 delete enable_multilang_search Browse File »

tangwang
2026-03-10 13:12:56 +0800
26b910bd refactor service init and tighten multi-tenant search contracts Browse File »

tangwang
2026-03-10 13:09:24 +0800

09 Mar, 2026

2 commits

07cf5a93 START_EMBEDDING=1 START_TRANSLATOR=1 START_RERANKER=1 START_TEI=1 ... Browse File »
```
CNCLIP_DEVICE=cuda TEI_USE_GPU=1 ./scripts/service_ctl.sh start
搜索后端+indexer+测试前段+4个微服务 跑通
```
tangwang
2026-03-09 23:29:07 +0800
ed948666 tidy Browse File »

tangwang
2026-03-09 17:04:00 +0800

07 Mar, 2026

2 commits

42e3aea6 tidy Browse File »

tangwang
2026-03-07 19:44:25 +0800
d1d356f8 脚本优化 Browse File »

tangwang
2026-03-07 11:48:59 +0800

05 Feb, 2026

1 commit

ff32d894 rerank Browse File »

tangwang
2026-02-05 16:13:46 +0800