ai-saas / saas-search

14 Apr, 2026

1 commit

f07947a5 Improve portability and harden public frontend search Browse File »

tangwang
2026-04-14 20:38:56 +0800

22 Mar, 2026

1 commit

8140e942 translator model priority Browse File »

tangwang
2026-03-22 22:30:14 +0800

19 Mar, 2026

3 commits

86d8358b config optimize Browse File »

tangwang
2026-03-19 23:04:11 +0800

14e67b71 分句后的 batching 现在是“先全量分句，再按 segment 总数按模型 batch_size ... Browse File »

推理”，不再是先按原始输入条数切块。也就是说，如果 100 条请求分句后变成
150 个 segments，batch_size=64 时会按 64 + 64 + 22
三批推理，推理完再按原始分句计划合并并还原成 100 条返回。这个改动在
local_seq2seq.py (line 241) 和 local_ctranslate2.py (line 391)。

日志这边也补上了两层你要的关键信息：

分句摘要日志：Translation segmentation
summary，会打印输入条数、非空条数、发生分句的输入数、总 segments
数、当前 batch_size、每条输入分成多少段的统计，见 local_seq2seq.py (line
216) 和 local_ctranslate2.py (line 366)。
每个预测批次日志：Translation inference
batch，会打印第几批、总批数、该批 segment
数、长度统计、首条预览。CTranslate2 另外还会打印 Translation model batch
detail，补充 token 长度和 max_decoding_length，见 local_ctranslate2.py
(line 294)。
我也补了测试，覆盖了“分句后再
batching”和“日志中有分句摘要与每批推理日志”，在
test_translation_local_backends.py (line 358)。

2026-03-19 10:54:30 +0800

46ce858d 在NLLB模型的 /data/saas-search/config/config.yaml#L133 ... Browse File »

中采用了最优T4配置：ct2_inter_threads=2、ct2_max_queued_batches=16、ct2_batch_type=examples。该设置使NLLB获得了显著更优的在线式性能，同时大致保持了大批次吞吐量不变。我没有将相同配置应用于两个Marian模型，因为聚焦式报告显示了复杂的权衡：opus-mt-zh-en
在保守默认配置下更为均衡，而 opus-mt-en-zh 虽然获得了吞吐量提升，但在
c=8 时尾延迟波动较大。
我还将部署/配置经验记录在 /data/saas-search/translation/README.md
中，并在 /data/saas-search/docs/TODO.txt
中标记了优化结果。关键实践要点现已记录如下：使用CT2 +
float16，保持单worker，将NLLB的 inter_threads 设为2、max_queued_batches
设为16，在T4上避免使用
inter_threads=4（因为这会损害高批次吞吐量），除非区分在线/离线配置，否则保持Marian模型的默认配置保守。

2026-03-19 07:45:15 +0800

18 Mar, 2026

2 commits

ea293660 CTranslate2 ... Browse File »

Implemented CTranslate2 for the three local translation models and
switched the existing local_nllb / local_marian factories over to it.
The new runtime lives in local_ctranslate2.py, including HF->CT2
auto-conversion, float16 compute type mapping, Marian direction
handling, and NLLB target-prefix decoding. The service wiring is in
service.py (line 113), and the three model configs now point at explicit
ctranslate2-float16 dirs in config.yaml (line 133).

I also updated the setup path so this is usable end-to-end:
ctranslate2>=4.7.0 was added to requirements_translator_service.txt and
requirements.txt, the download script now supports pre-conversion in
download_translation_models.py (line 27), and the docs/config examples
were refreshed in translation/README.md. I installed ctranslate2 into
.venv-translator, pre-converted all three models, and the CT2 artifacts
are now already on disk:

models/translation/facebook/nllb-200-distilled-600M/ctranslate2-float16
models/translation/Helsinki-NLP/opus-mt-zh-en/ctranslate2-float16
models/translation/Helsinki-NLP/opus-mt-en-zh/ctranslate2-float16
Verification was solid. python3 -m compileall passed, direct
TranslationService smoke tests ran successfully in .venv-translator, and
the focused NLLB benchmark on the local GPU showed a clear win:

batch_size=16: HF 0.347s/batch, 46.1 items/s vs CT2 0.130s/batch, 123.0
items/s
batch_size=1: HF 0.396s/request vs CT2 0.126s/request
One caveat: translation quality on some very short phrases, especially
opus-mt-en-zh, still looks a bit rough in smoke tests, so I’d run your
real quality set before fully cutting over. If you want, I can take the
next step and update the benchmark script/report so you have a fresh
full CT2 performance report for all three models.

2026-03-18 23:15:46 +0800

cd4ce66d trans logs Browse File »

tangwang
2026-03-18 20:32:37 +0800

17 Mar, 2026

3 commits

3eff49b7 trans nllb-200-distilled-600M性能提升 Browse File »

tangwang
2026-03-17 21:29:18 +0800
0fd2f875 translate Browse File »

tangwang
2026-03-17 19:21:34 +0800

5e4dc8e4 翻译架构按“一个翻译服务 + ... Browse File »

多个独立翻译能力”重构。现在业务侧不再把翻译当 provider
选型，QueryParser 和 indexer 统一通过 6006 的 translator service client
调用；真正的能力选择、启用开关、model + scene 路由，都收口到服务端和新的
translation/ 目录里了。

这次的核心改动在
config/services_config.py、providers/translation.py、api/translator_app.py、config/config.yaml
和新的 translation/service.py。配置从旧的
services.translation.provider/providers 改成了 service_url +
default_model + default_scene + capabilities，每个能力可独立
enabled；服务端新增了统一的 backend 管理与懒加载，真实实现集中到
translation/backends/qwen_mt.py、translation/backends/llm.py、translation/backends/deepl.py，旧的
query/qwen_mt_translate.py、query/llm_translate.py、query/deepl_provider.py
只保留兼容导出。接口上，/translate 现在标准支持 scene，context
作为兼容别名继续可用，健康检查会返回默认模型、默认场景和已启用能力。

2026-03-17 15:50:53 +0800