27 Mar, 2026

1 commit

  • coarse_rank.output_window -> 再做 SKU 选择和 title suffix ->
    精排调用轻量 reranker 裁到 fine_rank.output_window -> 最终重排调用现有
    reranker,并在最终融合里加入 fine_score。同时把 reranker client/provider
    改成了按 service_profile 选不同 service_url,这样 fine/final
    可以共用同一套服务代码,只起不同实例。
    tangwang
     

19 Mar, 2026

1 commit

  • - Text and image embedding are now split into separate
      services/processes, while still keeping a single replica as requested.
    The split lives in
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L112),
    [config/services_config.py](/data/saas-search/config/services_config.py#L68),
    [providers/embedding.py](/data/saas-search/providers/embedding.py#L27),
    and the start scripts
    [scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L36),
    [scripts/start_embedding_text_service.sh](/data/saas-search/scripts/start_embedding_text_service.sh),
    [scripts/start_embedding_image_service.sh](/data/saas-search/scripts/start_embedding_image_service.sh).
    - Independent admission control is in place now: text and image have
      separate inflight limits, and image can be kept much stricter than
    text. The request handling, reject path, `/health`, and `/ready` are in
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L613),
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L786), and
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L1028).
    - I checked the Redis embedding cache. It did exist, but there was a
      real flaw: cache keys did not distinguish `normalize=true` from
    `normalize=false`. I fixed that in
    [embeddings/cache_keys.py](/data/saas-search/embeddings/cache_keys.py#L6),
    and both text and image now use the same normalize-aware keying. I also
    added service-side BF16 cache hits that short-circuit before the model
    lane, so repeated requests no longer get throttled behind image
    inference.
    
    **What This Means**
    - Image pressure no longer blocks text, because they are on different
      ports/processes.
    - Repeated text/image requests now return from Redis without consuming
      model capacity.
    - Over-capacity requests are rejected quickly instead of sitting
      blocked.
    - I did not add a load balancer or multi-replica HA, per your GPU
      constraint. I also did not build Grafana/Prometheus dashboards in this
    pass, but `/health` now exposes the metrics needed to wire them.
    
    **Validation**
    - Tests passed: `.venv/bin/python -m pytest -q
      tests/test_embedding_pipeline.py
    tests/test_embedding_service_limits.py` -> `10 passed`
    - Stress test tool updates are in
      [scripts/perf_api_benchmark.py](/data/saas-search/scripts/perf_api_benchmark.py#L155)
    - Fresh benchmark on split text service `6105`: 535 requests / 3s, 100%
      success, `174.56 rps`, avg `88.48 ms`
    - Fresh benchmark on split image service `6108`: 1213 requests / 3s,
      100% success, `403.32 rps`, avg `9.64 ms`
    - Live health after the run showed cache hits and non-zero cache-hit
      latency accounting:
      - text `avg_latency_ms=4.251`
      - image `avg_latency_ms=1.462`
    tangwang
     

17 Mar, 2026

3 commits

  • tangwang
     
  • 多个独立翻译能力”重构。现在业务侧不再把翻译当 provider
    选型,QueryParser 和 indexer 统一通过 6006 的 translator service client
    调用;真正的能力选择、启用开关、model + scene 路由,都收口到服务端和新的
    translation/ 目录里了。
    
    这次的核心改动在
    config/services_config.py、providers/translation.py、api/translator_app.py、config/config.yaml
    和新的 translation/service.py。配置从旧的
    services.translation.provider/providers 改成了 service_url +
    default_model + default_scene + capabilities,每个能力可独立
    enabled;服务端新增了统一的 backend 管理与懒加载,真实实现集中到
    translation/backends/qwen_mt.py、translation/backends/llm.py、translation/backends/deepl.py,旧的
    query/qwen_mt_translate.py、query/llm_translate.py、query/deepl_provider.py
    只保留兼容导出。接口上,/translate 现在标准支持 scene,context
    作为兼容别名继续可用,健康检查会返回默认模型、默认场景和已启用能力。
    tangwang
     
  • - Rename indexer/product_annotator.py to indexer/product_enrich.py and remove CSV-based CLI entrypoint, keeping only in-memory analyze_products API
    - Introduce dedicated product_enrich logging with separate verbose log file for full LLM requests/responses
    - Change indexer and /indexer/enrich-content API wiring to use indexer.product_enrich instead of indexer.product_annotator, updating tests and docs accordingly
    - Switch translate_prompts to share SUPPORTED_INDEX_LANGUAGES from tenant_config_loader and reuse that mapping for language code → display name
    - Remove hard SUPPORTED_LANGS constraint from LLM content-enrichment flow, driving languages directly from tenant/indexer configuration
    - Redesign LLM prompt generation to support multi-round, multi-language tables: first round in English, subsequent rounds translate the entire table (headers + cells) into target languages using English instructions
    tangwang
     

13 Mar, 2026

3 commits


12 Mar, 2026

1 commit


10 Mar, 2026

1 commit


09 Mar, 2026

2 commits


07 Mar, 2026

1 commit