ai-saas / saas-search

24 Mar, 2026

2 commits

581dafae debug工具，每条结果的打分中间过程展示 ... Browse Code »

The backend now exposes a structured debug_info that is much closer to
the real ranking pipeline:

query_analysis now includes index_languages, query_tokens, query-vector
summary, translation/enrichment plan, and translation debug.
query_build now explains the ES recall plan: base-language clause,
translated clauses, filters vs post-filters, KNN settings,
function-score config, and related inputs.
es_request distinguishes the logical DSL from the actual body sent to
ES, including rerank prefetch _source.
es_response now includes the initial ES ranking window stats used for
score interpretation.
rerank now includes execution state, templates, rendered rerank query
text, window/top_n, service/meta, and the fusion formula.
pagination now shows rerank-window fetch vs requested page plus
page-fill details.
For each result in debug_info.per_result, ranking debug is now much
richer:

initial rank and final rank
raw ES score
es_score_normalized = raw score / initial ES window max
es_score_norm = min-max normalization over the initial ES window
explicit normalization notes explaining that fusion does not directly
consume an ES-normalized score
rerank input details: doc template, title suffix, template field values,
doc preview/length
fusion breakdown: rerank_factor, text_factor, knn_factor, constants, raw
inputs, final fused score
text subcomponents: source/translation/weighted/primary/support/fallback
evidence via matched_queries
richer style-intent SKU debug, including selected SKU summary and intent
texts

2026-03-24 11:30:35 +0800

8ae95af0 1. Stage Timings: 为每个阶段耗时补充起止时间戳。 ... Browse Code »
```
2， 漏了一些重要的stage，比如「款式意图 SKU
预筛选（StyleSkuSelector.prepare_hits）」，补上这个stage
```
tangwang
2026-03-24 09:05:47 +0800

23 Mar, 2026

8 commits

4650fcec 日志优化、日志串联（uid rqid） Browse Code »

tangwang
2026-03-23 23:45:04 +0800
cda1cd62 意图分析&应用 baseline Browse Code »

tangwang
2026-03-23 22:35:20 +0800
dad3c867 configs Browse Code »

tangwang
2026-03-23 19:59:49 +0800
35da3813 中英混写query的优化逻辑，不适合新的combined_fields+best_fields+phrase查询方式，带来的复杂度较多，清理该部分逻辑 Browse Code »

tangwang
2026-03-23 17:12:01 +0800
445496cd fix last up: 每个翻译结果的检索表达式，单个multimatch -> ... Browse Code »
```
combined_fields+best_field+phrase_boost
```
tangwang
2026-03-23 15:20:29 +0800

e756b18e 重构了文本召回构建器，现在每个 base_query / base_query_trans_* ... Browse Code »

子句都变成了一个带有以下结构的命名布尔查询：

must：combined_fields

should：加权后的 best_fields 和 phrase 子句

主要改动位于
search/es_query_builder.py，但此次调整沿用了现有语言路由设计，并未引入一次性分支。额外的
should 子句权重现在通过
config/schema.py、config/loader.py、search/searcher.py 以及
config/config.yaml 进行配置驱动，从而保持结构的集中管理。

2026-03-23 14:45:06 +0800

a3d3fb11 加phrase提权 Browse Code »

tangwang
2026-03-23 09:12:40 +0800
69881ecb 相关性调参、enrich内容解析优化 Browse Code »

tangwang
2026-03-23 09:02:19 +0800

22 Mar, 2026

4 commits

8140e942 translator model priority Browse Code »

tangwang
2026-03-22 22:30:14 +0800
86d0e83d query翻译，根据源语言是否在索引语言中区分配置 Browse Code »

tangwang
2026-03-22 18:53:53 +0800
0536222c query parser优化 Browse Code »

tangwang
2026-03-22 18:30:05 +0800
ef5baa86 混杂语言处理 Browse Code »

tangwang
2026-03-22 14:16:39 +0800

21 Mar, 2026

3 commits

fb973d19 configs Browse Code »

tangwang
2026-03-21 22:11:41 +0800
00c8ddb9 suggest rank optimize Browse Code »

tangwang
2026-03-21 19:41:23 +0800
e8443ea0 docs Browse Code »

tangwang
2026-03-21 14:56:12 +0800

20 Mar, 2026

13 commits

39306492 fix(translation): 补全 NLLB 本地翻译的语言码解析（FLORES 短码 + 完整 tokenizer 码） ... Browse Code »

问题描述
----------
使用 facebook/nllb-200-distilled-600M（CTranslate2 后端）时，若 API 传入 ISO 639-1
或 FLORES 短标签（如 ca、da、nl、sv、no、tr 等），会触发
「Unsupported NLLB source/target language」。模型与 tokenizer 实际支持这些语言；
根因是 resolve_nllb_language_code 仅依赖 translation/languages.py 里十余条
NLLB_LANGUAGE_CODES 映射，大量合法短码未注册，校验误报为不支持。

修改内容
----------
1. 新增 translation/nllb_flores_short_map.py
   - NLLB_FLORES_SHORT_TO_CODE：与 HF 模型卡 language 列表对齐的短标签 ->
     NLLB 强制 BOS/src_lang 形式（<ISO639-3>_<ISO15924>，如 cat_Latn）。
   - NLLB_TOKENIZER_LANGUAGE_CODES：从 tokenizer.json 提取的 202 个语言 token
     全集，供直接传入 deu_Latn 等形式时做规范化解析。
   - 额外约定：ISO 639-1「no」映射 nob_Latn（书面挪威语 Bokmål）；nb/nn 分别
     对应 nob_Latn / nno_Latn；「ar」显式指向 arb_Arab（与 NLLB 一致）。

2. 调整 translation/languages.py
   - build_nllb_language_catalog：合并顺序为 FLORES 全表 -> NLLB_LANGUAGE_CODES
    （保留少量显式覆盖，如 zh->zho_Hans）-> 调用方 overrides。
   - resolve_nllb_language_code：在目录与别名之后，增加基于
     NLLB_TOKENIZER_LANGUAGE_CODES 的大小写不敏感匹配（如 eng_latn -> eng_Latn），
     覆盖「已传完整 NLLB 码」的场景。

3. tests/test_translation_local_backends.py
   - 新增 test_nllb_resolves_flores_short_tags_and_iso_no，覆盖用户关心的短码及
     deu_Latn 直通解析。

方案说明
----------
NLLB 接口语义以 Hugging Face NllbTokenizer 为准：语言标识为 FLORES-200 风格
三字母语种码 + 下划线 + 四字母脚本子标签（ISO 15924）。业务侧常用 ISO 639-1
（de、sv）或模型卡短列表（ca、nl），需在服务内统一映射到 tokenizer 特殊 token。
本实现以模型卡 language 字段 + tokenizer 词表为单一事实来源生成静态表，
避免运行时依赖额外库；同时保留原有 NLLB_LANGUAGE_CODES 作为薄覆盖层以兼容
既有配置与测试。

Refs: https://huggingface.co/facebook/nllb-200-distilled-600M
Made-with: Cursor

2026-03-20 22:29:54 +0800

41856690 embedding logs Browse Code »

tangwang
2026-03-20 21:49:07 +0800
0ea456b2 +lingua-language-detector Browse Code »

tangwang
2026-03-20 20:54:17 +0800
272aeabe 调参 Browse Code »

tangwang
2026-03-20 17:37:04 +0800
a7cc9078 sku排序 Browse Code »

tangwang
2026-03-20 17:02:19 +0800

deccd68a Added the SKU pre-selection step in search/searcher.py right before ... Browse Code »

ResultFormatter.format_search_results() runs.

What changed:

For each final paginated SPU hit, the searcher now scans
skus[].option1_value against the query text set built from the original
query, normalized query, rewritten query, and translations.
If no option1_value matches textually, it falls back to embedding
similarity and picks the SKU with the highest inner product against the
query embedding.
The matched SKU is promoted to the front of the SPU’s skus list.
The SPU-level image_url is replaced with that matched SKU’s image_src.
I left api/result_formatter.py unchanged because it already preserves
the SKU order and reads image_url from _source; updating the page hits
in searcher makes the formatter return the desired result automatically.

Verification:

ReadLints on the edited files: no errors
Passed targeted tests:
pytest tests/test_search_rerank_window.py -k "translated_query or
no_direct_option_match"

2026-03-20 16:31:46 +0800

e874eb50 docs Browse Code »

tangwang
2026-03-20 16:12:22 +0800

6823fe3e feat(search): 混合语种查询分析与跨语言字段召回 ... Browse Code »

## 背景
多语言索引下，用户查询常中英混写；需在解析阶段显式标记脚本类型，并在 BM25 子句中同时覆盖对应语言字段。

## 方案

### 1. Query 分析（query_parser.ParsedQuery）
- 新增 `contains_chinese`：query 文本含 CJK（沿用 _contains_cjk）。
- 新增 `contains_english`：分词结果中存在「纯英文、len>=3」token（fullmatch 字母及可选连字符）。
- 写入 to_dict、请求 context 中间结果，便于调试与 API 透出。

### 2. ES 文本召回（es_query_builder._build_advanced_text_query）
- 对每个 search_lang 子句：若含英文且子句语言非 en（且租户 index_languages 含 en），合并 en 列字段；若含中文且子句语言非 zh（且含 zh），合并 zh 列字段。
- 合并进来的字段 boost 乘以 `mixed_script_merged_field_boost_scale`（默认 0.8，可在 ESQueryBuilder 构造参数调整）。
- fallback_original_query_* 分支同样应用上述逻辑。

### 3. 实现整理
- 引入 `MatchFieldSpec = (field_path, boost)`：`_build_match_field_specs` 为唯一权重来源；`_merge_supplemental_lang_field_specs` / `_expand_match_field_specs_for_mixed_script` 在 tuple 上合并与缩放；最后 `_format_match_field_specs` 再格式化为 ES `path^boost`，避免先拼字符串再解析。

## 测试
- tests/test_query_parser_mixed_language.py：脚本标记与 token 规则。
- tests/test_es_query_builder.py：合并字段、0.8 缩放、index_languages 限制。

Made-with: Cursor

2026-03-20 14:45:57 +0800

1556989b query翻译等待超时逻辑 Browse Code »

tangwang
2026-03-20 14:29:57 +0800
fe80e80e fix host config Browse Code »

tangwang
2026-03-20 12:30:23 +0800
b754fd41 图片向量化支持优先级参数 Browse Code »

tangwang
2026-03-20 11:59:57 +0800
16204531 docs Browse Code »

tangwang
2026-03-20 10:05:47 +0800
0342d897 搜索API对接指南拆分 Browse Code »

tangwang
2026-03-20 08:48:23 +0800

19 Mar, 2026

10 commits

41f0b2e9 product_enrich支持并发 Browse Code »

tangwang
2026-03-19 23:32:53 +0800
86d8358b config optimize Browse Code »

tangwang
2026-03-19 23:04:11 +0800
77bfa7e3 query translate Browse Code »

tangwang
2026-03-19 17:22:14 +0800
af03fdef embedding模块代码整理 Browse Code »

tangwang
2026-03-19 14:24:35 +0800
5bac9649 文本 embedding 与图片 embedding 已拆分为两个独立进程 / 端口 Browse Code »

tangwang
2026-03-19 13:54:05 +0800

7214c2e7 mplemented** ... Browse Code »

- Text and image embedding are now split into separate
  services/processes, while still keeping a single replica as requested.
The split lives in
[embeddings/server.py](/data/saas-search/embeddings/server.py#L112),
[config/services_config.py](/data/saas-search/config/services_config.py#L68),
[providers/embedding.py](/data/saas-search/providers/embedding.py#L27),
and the start scripts
[scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L36),
[scripts/start_embedding_text_service.sh](/data/saas-search/scripts/start_embedding_text_service.sh),
[scripts/start_embedding_image_service.sh](/data/saas-search/scripts/start_embedding_image_service.sh).
- Independent admission control is in place now: text and image have
  separate inflight limits, and image can be kept much stricter than
text. The request handling, reject path, `/health`, and `/ready` are in
[embeddings/server.py](/data/saas-search/embeddings/server.py#L613),
[embeddings/server.py](/data/saas-search/embeddings/server.py#L786), and
[embeddings/server.py](/data/saas-search/embeddings/server.py#L1028).
- I checked the Redis embedding cache. It did exist, but there was a
  real flaw: cache keys did not distinguish `normalize=true` from
`normalize=false`. I fixed that in
[embeddings/cache_keys.py](/data/saas-search/embeddings/cache_keys.py#L6),
and both text and image now use the same normalize-aware keying. I also
added service-side BF16 cache hits that short-circuit before the model
lane, so repeated requests no longer get throttled behind image
inference.

**What This Means**
- Image pressure no longer blocks text, because they are on different
  ports/processes.
- Repeated text/image requests now return from Redis without consuming
  model capacity.
- Over-capacity requests are rejected quickly instead of sitting
  blocked.
- I did not add a load balancer or multi-replica HA, per your GPU
  constraint. I also did not build Grafana/Prometheus dashboards in this
pass, but `/health` now exposes the metrics needed to wire them.

**Validation**
- Tests passed: `.venv/bin/python -m pytest -q
  tests/test_embedding_pipeline.py
tests/test_embedding_service_limits.py` -> `10 passed`
- Stress test tool updates are in
  [scripts/perf_api_benchmark.py](/data/saas-search/scripts/perf_api_benchmark.py#L155)
- Fresh benchmark on split text service `6105`: 535 requests / 3s, 100%
  success, `174.56 rps`, avg `88.48 ms`
- Fresh benchmark on split image service `6108`: 1213 requests / 3s,
  100% success, `403.32 rps`, avg `9.64 ms`
- Live health after the run showed cache hits and non-zero cache-hit
  latency accounting:
  - text `avg_latency_ms=4.251`
  - image `avg_latency_ms=1.462`

2026-03-19 13:21:01 +0800

4747e2f4 embedding performance ... Browse Code »

The instability is very likely real overload, but `lsof -i :6005 | wc -l
= 75` alone does not prove it. What does matter is the live shape of the
service: it is a single `uvicorn` worker on port `6005`, and the code
had one shared process handling both text and image requests, with image
work serialized behind a single lock. Under bursty image traffic,
requests could pile up and sit blocked with almost no useful tracing,
which matches the “only blocking observed” symptom.

now adds persistent log files, request IDs, per-request
request/response/failure logs, text microbatch dispatch logs, health
stats with active/rejected counts, and explicit overload admission
control. New knobs are `TEXT_MAX_INFLIGHT`, `IMAGE_MAX_INFLIGHT`, and
`EMBEDDING_OVERLOAD_STATUS_CODE`. Startup output now shows those limits
and log paths in
[scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L80).
I also added focused tests in
[tests/test_embedding_service_limits.py](/data/saas-search/tests/test_embedding_service_limits.py#L1).

What this means operationally:
- Text and image are still in one process, so this is not the final
  architecture.
- But image spikes will now be rejected quickly once the image lane is
  full instead of sitting around and consuming the worker pool.
- Logs will now show each request, each rejection, each microbatch
  dispatch, backend time, response time, and request ID.

Verification:
- Passed: `.venv/bin/python -m pytest -q
  tests/test_embedding_service_limits.py`
- I also ran a wider test command, but 3 failures came from pre-existing
  drift in
[tests/test_embedding_pipeline.py](/data/saas-search/tests/test_embedding_pipeline.py#L95),
where the tests still monkeypatch `embeddings.text_encoder.redis.Redis`
even though
[embeddings/text_encoder.py](/data/saas-search/embeddings/text_encoder.py#L1)
no longer imports `redis` that way.

已把 CLIP_AS_SERVICE 的默认模型切到
ViT-L-14，并把这套配置收口成可变更的统一入口了。现在默认值在
embeddings/config.py (line 29) 的 CLIP_AS_SERVICE_MODEL_NAME，当前为
CN-CLIP/ViT-L-14；scripts/start_cnclip_service.sh (line 37)
会自动读取这个配置，不再把默认模型写死在脚本里，同时支持
CNCLIP_MODEL_NAME 和 --model-name
临时覆盖。scripts/start_embedding_service.sh (line 29) 和
embeddings/server.py (line 425)
也补了模型信息输出，方便排查实际连接的配置。

文档也一起更新了，重点在 docs/CNCLIP_SERVICE说明文档.md (line 62) 和
embeddings/README.md (line
58)：现在说明的是“以配置为准、可覆盖”的机制，而不是写死某个模型名；相关总结文档和内部说明也同步改成了配置驱动表述。

2026-03-19 12:27:05 +0800

14e67b71 分句后的 batching 现在是“先全量分句，再按 segment 总数按模型 batch_size ... Browse Code »

推理”，不再是先按原始输入条数切块。也就是说，如果 100 条请求分句后变成
150 个 segments，batch_size=64 时会按 64 + 64 + 22
三批推理，推理完再按原始分句计划合并并还原成 100 条返回。这个改动在
local_seq2seq.py (line 241) 和 local_ctranslate2.py (line 391)。

日志这边也补上了两层你要的关键信息：

分句摘要日志：Translation segmentation
summary，会打印输入条数、非空条数、发生分句的输入数、总 segments
数、当前 batch_size、每条输入分成多少段的统计，见 local_seq2seq.py (line
216) 和 local_ctranslate2.py (line 366)。
每个预测批次日志：Translation inference
batch，会打印第几批、总批数、该批 segment
数、长度统计、首条预览。CTranslate2 另外还会打印 Translation model batch
detail，补充 token 长度和 max_decoding_length，见 local_ctranslate2.py
(line 294)。
我也补了测试，覆盖了“分句后再
batching”和“日志中有分句摘要与每批推理日志”，在
test_translation_local_backends.py (line 358)。

2026-03-19 10:54:30 +0800

294c3d0a 实现第一版“按模型预算智能分句”的基础能力。 ... Browse Code »

改动：

新增分句与预算工具：translation/text_splitter.py
接入 HF 本地后端：translation/backends/local_seq2seq.py (line 157)
接入 CT2 本地后端：translation/backends/local_ctranslate2.py (line 301)
补了测试：tests/test_translation_local_backends.py
我先把代码里实际限制梳理了一遍，关键配置在 config/config.yaml (line
133)：

nllb-200-distilled-600m: max_input_length=256，max_new_tokens=64，并且是
ct2_decoding_length_mode=source +
extra=8。现在按这个配置计算出的保守输入预算是 56 token。
opus-mt-zh-en:
max_input_length=256，max_new_tokens=256。现在保守输入预算是 248 token。
opus-mt-en-zh: 同上，也是 248 token。
这版分句策略是：

先按强边界切：。！？!?；;…、换行、英文句号
不够再按弱边界切：，,、：:()（）[]【】/|
再不够才按空白切
最后才做 token 预算下的硬切
超长时会“分句翻译后再回拼”，中文目标语言默认无空格回拼，英文等默认按空格回拼，尽量别切太碎
验证：

python3 -m compileall translation
tests/test_translation_local_backends.py 已通过

2026-03-19 09:51:06 +0800

46ce858d 在NLLB模型的 /data/saas-search/config/config.yaml#L133 ... Browse Code »

中采用了最优T4配置：ct2_inter_threads=2、ct2_max_queued_batches=16、ct2_batch_type=examples。该设置使NLLB获得了显著更优的在线式性能，同时大致保持了大批次吞吐量不变。我没有将相同配置应用于两个Marian模型，因为聚焦式报告显示了复杂的权衡：opus-mt-zh-en
在保守默认配置下更为均衡，而 opus-mt-en-zh 虽然获得了吞吐量提升，但在
c=8 时尾延迟波动较大。
我还将部署/配置经验记录在 /data/saas-search/translation/README.md
中，并在 /data/saas-search/docs/TODO.txt
中标记了优化结果。关键实践要点现已记录如下：使用CT2 +
float16，保持单worker，将NLLB的 inter_threads 设为2、max_queued_batches
设为16，在T4上避免使用
inter_threads=4（因为这会损害高批次吞吐量），除非区分在线/离线配置，否则保持Marian模型的默认配置保守。

2026-03-19 07:45:15 +0800