ai-saas / saas-search

27 Mar, 2026

1 commit

8c8b9d84 ES 拉取 coarse_rank.input_window 条 -> 粗排按 text/knn 融合裁到 ... Browse Dir »

coarse_rank.output_window -> 再做 SKU 选择和 title suffix ->
精排调用轻量 reranker 裁到 fine_rank.output_window -> 最终重排调用现有
reranker，并在最终融合里加入 fine_score。同时把 reranker client/provider
改成了按 service_profile 选不同 service_url，这样 fine/final
可以共用同一套服务代码，只起不同实例。

2026-03-27 17:56:04 +0800

19 Mar, 2026

1 commit

7214c2e7 mplemented** ... Browse Dir »

- Text and image embedding are now split into separate
  services/processes, while still keeping a single replica as requested.
The split lives in
[embeddings/server.py](/data/saas-search/embeddings/server.py#L112),
[config/services_config.py](/data/saas-search/config/services_config.py#L68),
[providers/embedding.py](/data/saas-search/providers/embedding.py#L27),
and the start scripts
[scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L36),
[scripts/start_embedding_text_service.sh](/data/saas-search/scripts/start_embedding_text_service.sh),
[scripts/start_embedding_image_service.sh](/data/saas-search/scripts/start_embedding_image_service.sh).
- Independent admission control is in place now: text and image have
  separate inflight limits, and image can be kept much stricter than
text. The request handling, reject path, `/health`, and `/ready` are in
[embeddings/server.py](/data/saas-search/embeddings/server.py#L613),
[embeddings/server.py](/data/saas-search/embeddings/server.py#L786), and
[embeddings/server.py](/data/saas-search/embeddings/server.py#L1028).
- I checked the Redis embedding cache. It did exist, but there was a
  real flaw: cache keys did not distinguish `normalize=true` from
`normalize=false`. I fixed that in
[embeddings/cache_keys.py](/data/saas-search/embeddings/cache_keys.py#L6),
and both text and image now use the same normalize-aware keying. I also
added service-side BF16 cache hits that short-circuit before the model
lane, so repeated requests no longer get throttled behind image
inference.

**What This Means**
- Image pressure no longer blocks text, because they are on different
  ports/processes.
- Repeated text/image requests now return from Redis without consuming
  model capacity.
- Over-capacity requests are rejected quickly instead of sitting
  blocked.
- I did not add a load balancer or multi-replica HA, per your GPU
  constraint. I also did not build Grafana/Prometheus dashboards in this
pass, but `/health` now exposes the metrics needed to wire them.

**Validation**
- Tests passed: `.venv/bin/python -m pytest -q
  tests/test_embedding_pipeline.py
tests/test_embedding_service_limits.py` -> `10 passed`
- Stress test tool updates are in
  [scripts/perf_api_benchmark.py](/data/saas-search/scripts/perf_api_benchmark.py#L155)
- Fresh benchmark on split text service `6105`: 535 requests / 3s, 100%
  success, `174.56 rps`, avg `88.48 ms`
- Fresh benchmark on split image service `6108`: 1213 requests / 3s,
  100% success, `403.32 rps`, avg `9.64 ms`
- Live health after the run showed cache hits and non-zero cache-hit
  latency accounting:
  - text `avg_latency_ms=4.251`
  - image `avg_latency_ms=1.462`

2026-03-19 13:21:01 +0800

17 Mar, 2026

3 commits

0fd2f875 translate Browse Dir »

tangwang
2026-03-17 19:21:34 +0800

5e4dc8e4 翻译架构按“一个翻译服务 + ... Browse Dir »

多个独立翻译能力”重构。现在业务侧不再把翻译当 provider
选型，QueryParser 和 indexer 统一通过 6006 的 translator service client
调用；真正的能力选择、启用开关、model + scene 路由，都收口到服务端和新的
translation/ 目录里了。

这次的核心改动在
config/services_config.py、providers/translation.py、api/translator_app.py、config/config.yaml
和新的 translation/service.py。配置从旧的
services.translation.provider/providers 改成了 service_url +
default_model + default_scene + capabilities，每个能力可独立
enabled；服务端新增了统一的 backend 管理与懒加载，真实实现集中到
translation/backends/qwen_mt.py、translation/backends/llm.py、translation/backends/deepl.py，旧的
query/qwen_mt_translate.py、query/llm_translate.py、query/deepl_provider.py
只保留兼容导出。接口上，/translate 现在标准支持 scene，context
作为兼容别名继续可用，健康检查会返回默认模型、默认场景和已启用能力。

2026-03-17 15:50:53 +0800

6f7840cf refactor: rename product annotator to enrich and expand multilingual prompts ... Browse Dir »

- Rename indexer/product_annotator.py to indexer/product_enrich.py and remove CSV-based CLI entrypoint, keeping only in-memory analyze_products API
- Introduce dedicated product_enrich logging with separate verbose log file for full LLM requests/responses
- Change indexer and /indexer/enrich-content API wiring to use indexer.product_enrich instead of indexer.product_annotator, updating tests and docs accordingly
- Switch translate_prompts to share SUPPORTED_INDEX_LANGUAGES from tenant_config_loader and reuse that mapping for language code → display name
- Remove hard SUPPORTED_LANGS constraint from LLM content-enrichment flow, driving languages directly from tenant/indexer configuration
- Redesign LLM prompt generation to support multi-round, multi-language tables: first round in English, subsequent rounds translate the entire table (headers + cells) into target languages using English instructions

2026-03-17 11:26:03 +0800