ai-saas / saas-search

27 Mar, 2026

2 commits

daa2690b 漏斗参数调优&呈现优化 Browse File »

tangwang
2026-03-27 23:00:16 +0800

8c8b9d84 ES 拉取 coarse_rank.input_window 条 -> 粗排按 text/knn 融合裁到 ... Browse File »

coarse_rank.output_window -> 再做 SKU 选择和 title suffix ->
精排调用轻量 reranker 裁到 fine_rank.output_window -> 最终重排调用现有
reranker，并在最终融合里加入 fine_score。同时把 reranker client/provider
改成了按 service_profile 选不同 service_url，这样 fine/final
可以共用同一套服务代码，只起不同实例。

2026-03-27 17:56:04 +0800

25 Mar, 2026

1 commit

b712a831 意图识别策略和性能优化 ... Browse File »

@config/dictionaries/style_intent_color.csv
@config/dictionaries/style_intent_size.csv @query/style_intent.py
@search/sku_intent_selector.py
1. 两个csv词典，分为三列，
- 英文关键词
- 中文关键词
- 标准属性名称词
三列都可以允许逗号分割。补充的第三列使用在商品属性中，使用的是标准的英文名称
2.
判断意图的时候，中文词用中文翻译名去匹配，如果不存在中文翻译名，则用原始
query，英文词同理
3. SKU 选择的时候，用每一个 SKU 的属性名去匹配。
匹配规则要大幅度简化，并做性能优化：
1）文本匹配规则只需要看规范化后的属性值是否包含了词典配置的第三列"标准属性名称词"，如果包含了，则认为匹配成功。
找到第一个匹配成功的即可。如果都没有成功，后面也不再需要用向量匹配。
暂时废弃向量匹配、双向匹配等复杂逻辑。

2026-03-25 09:33:16 +0800

24 Mar, 2026

4 commits

74fdf9bd 1. ... Browse File »

加了一个过滤/降权词典，query中有独立分词匹配到指定的触发词，将过滤带某些分词的商品（比如fitted/修身，过滤宽松、loose、relaxed、baggy,slouchy等商品）
2. reranker的query使用翻译后的

2026-03-24 22:54:38 +0800

2efad04b 意图匹配的性能优化： ... Browse File »

上面一版实现，性能上完全无法接受。因此进行了一轮策略简化

style_sku_prepare_hits阶段耗时太长。请根据需求，思考优化的方法，给出性能优化的方案。
1.
_select_by_embedding，有缓存吗，option_value的值是有限的，之前已经算过的，就不用再算了。不仅仅是embedding相似的结果，整个option_value的匹配结果，是有包含、还是没包含，相似度多少，都不用重新计算。比如之前已经有一个sku的某个属性值叫做“卡其色”，已经算出来是否文本匹配了，那么不需要再去做文本匹配。如果已经算出来向量的相似度，那么不需要再去取向量以及计算相似度。
2. 匹配可以适当的优化：
匹配流程简化：
1）找到第一个文本匹配的，如果有直接匹配成功。不需要考虑匹配多个的情况。
2）如果全部都没有匹配，那么进行embedding筛选。

匹配规则：
option_name的匹配，直接看规范化后的option_name是不是意图维度的泛化词之一（比如颜色、color、colour），如果没有匹配的，现在应该是把所有维度都算上，这样匹配成本和比较成本太高了，去掉这些逻辑，这种情况不需要加后缀、不需要选择sku。
ption_value的匹配。意图检测的时候，有匹配的query中的命中的词，这个词被包含在属性值中，那么就算匹配。属性值被包含在query（包括翻译文本）中，也算匹配。提高匹配的覆盖率。

3.
这一阶段得到sku选择的结果即可（选中的sku的id，也可以为空值表示没找到匹配成功的，这种情况不需要拼接title后缀给重排输入），但是不用着急做image_url的替换和sku的置顶。等最后填充的时候判断有选中sku的时候直接做替换和置顶即可。
请你思考如何进行设计，提高性能的时候不带来复杂度的提升，可以适当的重构以降低修改后的代码行数。
@search/sku_intent_selector.py @query/style_intent.py

2026-03-24 15:58:18 +0800

814e352b 乘法公式配置化 Browse File »

tangwang
2026-03-24 12:44:11 +0800

581dafae debug工具，每条结果的打分中间过程展示 ... Browse File »

The backend now exposes a structured debug_info that is much closer to
the real ranking pipeline:

query_analysis now includes index_languages, query_tokens, query-vector
summary, translation/enrichment plan, and translation debug.
query_build now explains the ES recall plan: base-language clause,
translated clauses, filters vs post-filters, KNN settings,
function-score config, and related inputs.
es_request distinguishes the logical DSL from the actual body sent to
ES, including rerank prefetch _source.
es_response now includes the initial ES ranking window stats used for
score interpretation.
rerank now includes execution state, templates, rendered rerank query
text, window/top_n, service/meta, and the fusion formula.
pagination now shows rerank-window fetch vs requested page plus
page-fill details.
For each result in debug_info.per_result, ranking debug is now much
richer:

initial rank and final rank
raw ES score
es_score_normalized = raw score / initial ES window max
es_score_norm = min-max normalization over the initial ES window
explicit normalization notes explaining that fusion does not directly
consume an ES-normalized score
rerank input details: doc template, title suffix, template field values,
doc preview/length
fusion breakdown: rerank_factor, text_factor, knn_factor, constants, raw
inputs, final fused score
text subcomponents: source/translation/weighted/primary/support/fallback
evidence via matched_queries
richer style-intent SKU debug, including selected SKU summary and intent
texts

2026-03-24 11:30:35 +0800

23 Mar, 2026

1 commit

cda1cd62 意图分析&应用 baseline Browse File »

tangwang
2026-03-23 22:35:20 +0800

22 Mar, 2026

1 commit

ef5baa86 混杂语言处理 Browse File »

tangwang
2026-03-22 14:16:39 +0800

20 Mar, 2026

2 commits

a7cc9078 sku排序 Browse File »

tangwang
2026-03-20 17:02:19 +0800

deccd68a Added the SKU pre-selection step in search/searcher.py right before ... Browse File »

ResultFormatter.format_search_results() runs.

What changed:

For each final paginated SPU hit, the searcher now scans
skus[].option1_value against the query text set built from the original
query, normalized query, rewritten query, and translations.
If no option1_value matches textually, it falls back to embedding
similarity and picks the SKU with the highest inner product against the
query embedding.
The matched SKU is promoted to the front of the SPU’s skus list.
The SPU-level image_url is replaced with that matched SKU’s image_src.
I left api/result_formatter.py unchanged because it already preserves
the SKU order and reads image_url from _source; updating the page hits
in searcher makes the formatter return the desired result automatically.

Verification:

ReadLints on the edited files: no errors
Passed targeted tests:
pytest tests/test_search_rerank_window.py -k "translated_query or
no_direct_option_match"

2026-03-20 16:31:46 +0800

18 Mar, 2026

1 commit

a47416ec 把融合逻辑改成乘法公式，并把 ES 命名子句分数回传链路补上了。 ... Browse File »

核心改动在 rerank_client.py (line 99)：fuse_scores_and_resort 现在按
rerank * knn * text 的平滑乘法公式计算，优先从 hit["matched_queries"]
里取 base_query 和 knn_query，并把 _text_score / _knn_score
一并写回调试字段。为了让 KNN 也有名字，我给 top-level knn 加了 name:
"knn_query"，见 es_query_builder.py (line 273)。搜索执行时会在 rerank
窗口内打开 include_named_queries_score，并在显式排序时加上
track_scores，见 searcher.py (line 400) 和 es_client.py (line 224)。

2026-03-18 10:24:05 +0800

13 Mar, 2026

1 commit

77ab67ad 更新测试用例 Browse File »

tangwang
2026-03-13 12:39:40 +0800

12 Mar, 2026

2 commits

c51d254f 性能测试 Browse File »

tangwang
2026-03-12 10:28:43 +0800
5f7d7f09 性能测试报告.md Browse File »

tangwang
2026-03-12 08:44:55 +0800