23 Mar, 2026

1 commit


20 Mar, 2026

1 commit


19 Mar, 2026

3 commits

  • tangwang
     
  • tangwang
     
  • The instability is very likely real overload, but `lsof -i :6005 | wc -l
    = 75` alone does not prove it. What does matter is the live shape of the
    service: it is a single `uvicorn` worker on port `6005`, and the code
    had one shared process handling both text and image requests, with image
    work serialized behind a single lock. Under bursty image traffic,
    requests could pile up and sit blocked with almost no useful tracing,
    which matches the “only blocking observed” symptom.
    
    now adds persistent log files, request IDs, per-request
    request/response/failure logs, text microbatch dispatch logs, health
    stats with active/rejected counts, and explicit overload admission
    control. New knobs are `TEXT_MAX_INFLIGHT`, `IMAGE_MAX_INFLIGHT`, and
    `EMBEDDING_OVERLOAD_STATUS_CODE`. Startup output now shows those limits
    and log paths in
    [scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L80).
    I also added focused tests in
    [tests/test_embedding_service_limits.py](/data/saas-search/tests/test_embedding_service_limits.py#L1).
    
    What this means operationally:
    - Text and image are still in one process, so this is not the final
      architecture.
    - But image spikes will now be rejected quickly once the image lane is
      full instead of sitting around and consuming the worker pool.
    - Logs will now show each request, each rejection, each microbatch
      dispatch, backend time, response time, and request ID.
    
    Verification:
    - Passed: `.venv/bin/python -m pytest -q
      tests/test_embedding_service_limits.py`
    - I also ran a wider test command, but 3 failures came from pre-existing
      drift in
    [tests/test_embedding_pipeline.py](/data/saas-search/tests/test_embedding_pipeline.py#L95),
    where the tests still monkeypatch `embeddings.text_encoder.redis.Redis`
    even though
    [embeddings/text_encoder.py](/data/saas-search/embeddings/text_encoder.py#L1)
    no longer imports `redis` that way.
    
    已把 CLIP_AS_SERVICE 的默认模型切到
    ViT-L-14,并把这套配置收口成可变更的统一入口了。现在默认值在
    embeddings/config.py (line 29) 的 CLIP_AS_SERVICE_MODEL_NAME,当前为
    CN-CLIP/ViT-L-14;scripts/start_cnclip_service.sh (line 37)
    会自动读取这个配置,不再把默认模型写死在脚本里,同时支持
    CNCLIP_MODEL_NAME 和 --model-name
    临时覆盖。scripts/start_embedding_service.sh (line 29) 和
    embeddings/server.py (line 425)
    也补了模型信息输出,方便排查实际连接的配置。
    
    文档也一起更新了,重点在 docs/CNCLIP_SERVICE说明文档.md (line 62) 和
    embeddings/README.md (line
    58):现在说明的是“以配置为准、可覆盖”的机制,而不是写死某个模型名;相关总结文档和内部说明也同步改成了配置驱动表述。
    tangwang
     

17 Mar, 2026

1 commit


11 Mar, 2026

1 commit

  • 去掉 START_* 控制变量逻辑,默认只启动核心服务 backend/indexer/frontend。
    可选服务改为显式命令:./scripts/service_ctl.sh start embedding
    translator reranker tei cnclip。
    统一 translator 端口读取为 TRANSLATION_PORT(移除 TRANSLATOR_PORT
    兼容)。
    保留未知服务强校验。
    关键文件:service_ctl.sh
    “重名/歧义”修复
    frontend 端口命名统一:FRONTEND_PORT 为主,PORT 仅后备。
    start_frontend.sh 显式导出 PORT="${FRONTEND_PORT}",避免配置了
    FRONTEND_PORT 但服务仍跑 6003 的问题。
    文件:start_frontend.sh、frontend_server.py、env_config.py
    日志/PID 命名治理继续收口
    统一规则继续落地为 logs/<service>.log、logs/<service>.pid。
    cnclip 保持 logs/cnclip.log + logs/cnclip.pid。
    文件:service_ctl.sh、start_cnclip_service.sh、stop_cnclip_service.sh
    backend/indexer 启动风格统一补齐相关项
    frontend/translator 也对齐到 set -euo pipefail,并用 exec 直启主进程。
    文件:start_frontend.sh、start_translator.sh、start_backend.sh、start_indexer.sh
    legacy 入口清理
    删除:start_servers.py、stop_reranker.sh、stop_translator.sh。
    reranker 停止逻辑并入 service_ctl(含 VLLM::EngineCore 清理)。
    benchmark 脚本改为统一入口:service_ctl.sh stop reranker。
    文件:benchmark_reranker_1000docs.sh
    tangwang
     

10 Mar, 2026

1 commit


09 Mar, 2026

3 commits


08 Mar, 2026

1 commit


07 Mar, 2026

1 commit


22 Dec, 2025

2 commits


08 Nov, 2025

1 commit


07 Nov, 2025

1 commit