27 Mar, 2026

2 commits


26 Mar, 2026

1 commit


23 Mar, 2026

1 commit


20 Mar, 2026

3 commits


19 Mar, 2026

5 commits

  • tangwang
     
  • tangwang
     
  • tangwang
     
  • - Text and image embedding are now split into separate
      services/processes, while still keeping a single replica as requested.
    The split lives in
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L112),
    [config/services_config.py](/data/saas-search/config/services_config.py#L68),
    [providers/embedding.py](/data/saas-search/providers/embedding.py#L27),
    and the start scripts
    [scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L36),
    [scripts/start_embedding_text_service.sh](/data/saas-search/scripts/start_embedding_text_service.sh),
    [scripts/start_embedding_image_service.sh](/data/saas-search/scripts/start_embedding_image_service.sh).
    - Independent admission control is in place now: text and image have
      separate inflight limits, and image can be kept much stricter than
    text. The request handling, reject path, `/health`, and `/ready` are in
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L613),
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L786), and
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L1028).
    - I checked the Redis embedding cache. It did exist, but there was a
      real flaw: cache keys did not distinguish `normalize=true` from
    `normalize=false`. I fixed that in
    [embeddings/cache_keys.py](/data/saas-search/embeddings/cache_keys.py#L6),
    and both text and image now use the same normalize-aware keying. I also
    added service-side BF16 cache hits that short-circuit before the model
    lane, so repeated requests no longer get throttled behind image
    inference.
    
    **What This Means**
    - Image pressure no longer blocks text, because they are on different
      ports/processes.
    - Repeated text/image requests now return from Redis without consuming
      model capacity.
    - Over-capacity requests are rejected quickly instead of sitting
      blocked.
    - I did not add a load balancer or multi-replica HA, per your GPU
      constraint. I also did not build Grafana/Prometheus dashboards in this
    pass, but `/health` now exposes the metrics needed to wire them.
    
    **Validation**
    - Tests passed: `.venv/bin/python -m pytest -q
      tests/test_embedding_pipeline.py
    tests/test_embedding_service_limits.py` -> `10 passed`
    - Stress test tool updates are in
      [scripts/perf_api_benchmark.py](/data/saas-search/scripts/perf_api_benchmark.py#L155)
    - Fresh benchmark on split text service `6105`: 535 requests / 3s, 100%
      success, `174.56 rps`, avg `88.48 ms`
    - Fresh benchmark on split image service `6108`: 1213 requests / 3s,
      100% success, `403.32 rps`, avg `9.64 ms`
    - Live health after the run showed cache hits and non-zero cache-hit
      latency accounting:
      - text `avg_latency_ms=4.251`
      - image `avg_latency_ms=1.462`
    tangwang
     
  • The instability is very likely real overload, but `lsof -i :6005 | wc -l
    = 75` alone does not prove it. What does matter is the live shape of the
    service: it is a single `uvicorn` worker on port `6005`, and the code
    had one shared process handling both text and image requests, with image
    work serialized behind a single lock. Under bursty image traffic,
    requests could pile up and sit blocked with almost no useful tracing,
    which matches the “only blocking observed” symptom.
    
    now adds persistent log files, request IDs, per-request
    request/response/failure logs, text microbatch dispatch logs, health
    stats with active/rejected counts, and explicit overload admission
    control. New knobs are `TEXT_MAX_INFLIGHT`, `IMAGE_MAX_INFLIGHT`, and
    `EMBEDDING_OVERLOAD_STATUS_CODE`. Startup output now shows those limits
    and log paths in
    [scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L80).
    I also added focused tests in
    [tests/test_embedding_service_limits.py](/data/saas-search/tests/test_embedding_service_limits.py#L1).
    
    What this means operationally:
    - Text and image are still in one process, so this is not the final
      architecture.
    - But image spikes will now be rejected quickly once the image lane is
      full instead of sitting around and consuming the worker pool.
    - Logs will now show each request, each rejection, each microbatch
      dispatch, backend time, response time, and request ID.
    
    Verification:
    - Passed: `.venv/bin/python -m pytest -q
      tests/test_embedding_service_limits.py`
    - I also ran a wider test command, but 3 failures came from pre-existing
      drift in
    [tests/test_embedding_pipeline.py](/data/saas-search/tests/test_embedding_pipeline.py#L95),
    where the tests still monkeypatch `embeddings.text_encoder.redis.Redis`
    even though
    [embeddings/text_encoder.py](/data/saas-search/embeddings/text_encoder.py#L1)
    no longer imports `redis` that way.
    
    已把 CLIP_AS_SERVICE 的默认模型切到
    ViT-L-14,并把这套配置收口成可变更的统一入口了。现在默认值在
    embeddings/config.py (line 29) 的 CLIP_AS_SERVICE_MODEL_NAME,当前为
    CN-CLIP/ViT-L-14;scripts/start_cnclip_service.sh (line 37)
    会自动读取这个配置,不再把默认模型写死在脚本里,同时支持
    CNCLIP_MODEL_NAME 和 --model-name
    临时覆盖。scripts/start_embedding_service.sh (line 29) 和
    embeddings/server.py (line 425)
    也补了模型信息输出,方便排查实际连接的配置。
    
    文档也一起更新了,重点在 docs/CNCLIP_SERVICE说明文档.md (line 62) 和
    embeddings/README.md (line
    58):现在说明的是“以配置为准、可覆盖”的机制,而不是写死某个模型名;相关总结文档和内部说明也同步改成了配置驱动表述。
    tangwang
     

17 Mar, 2026

3 commits

  • 2. 抽象出可复用的 embedding Redis 缓存类(图文共用)
    
    详细:
    1. embedding 缓存改为 BF16 存 Redis(读回恢复 FP32)
    关键行为(按你给的流程落地)
    写入前:FP32 embedding →(normalize_embeddings=True 时)L2 normalize →
    转 BF16 → bytes(2字节/维,大端) → redis.setex
    读取后:redis.get bytes → BF16 → 恢复 FP32(np.float32 向量)
    变更点
    新增 embeddings/bf16.py
    提供 float32_to_bf16 / bf16_to_float32
    encode_embedding_for_redis():FP32 → BF16 → bytes
    decode_embedding_from_redis():bytes → BF16 → FP32
    l2_normalize_fp32():按需归一化
    修改 embeddings/text_encoder.py
    Redis value 从 pickle.dumps(np.ndarray) 改为 BF16 bytes
    缓存 key 改为包含 normalize 标记:{prefix}:{n0|n1}:{query}(避免
    normalize 开关不同却共用缓存)
    修改 tests/test_embedding_pipeline.py
    cache hit 用例改为写入 BF16 bytes,并使用新
    key:embedding:n1:cached-text
    修改 docs/缓存与Redis使用说明.md
    embedding 缓存的 Key/Value 格式更新为 BF16 bytes + n0/n1
    修改 scripts/redis/redis_cache_health_check.py
    embedding pattern 不再硬编码 embedding:*,改为读取
    REDIS_CONFIG["embedding_cache_prefix"]
    value 预览从 pickle 解码改为 BF16 解码后展示 dim/bytes/dtype
    自检
    在激活环境后跑过 BF16 编解码往返 sanity check:bytes
    长度、维度恢复正常;归一化向量读回后范数接近 1(会有 BF16 量化误差)。
    
    2. 抽象出可复用的 embedding Redis 缓存类(图文共用)
    新增
    embeddings/redis_embedding_cache.py:RedisEmbeddingCache
    统一 Redis 初始化(读 REDIS_CONFIG)
    统一 BF16 bytes 编解码(复用 embeddings/bf16.py)
    统一过期策略:写入 setex(expire_time),命中读取后 expire(expire_time)
    滑动过期刷新 TTL
    统一异常/坏数据处理:解码失败或向量非 1D/为空/含 NaN/Inf 会删除该 key
    并当作 miss
    已接入复用
    文本 embeddings/text_encoder.py
    用 self.cache = RedisEmbeddingCache(key_prefix=..., namespace="")
    key 仍是:{prefix}:{query}
    图片 embeddings/image_encoder.py
    用 self.cache = RedisEmbeddingCache(key_prefix=..., namespace="image")
    key 仍是:{prefix}:image:{url_or_path}
    tangwang
     
  • tangwang
     
  • tangwang
     

11 Mar, 2026

3 commits

  • tangwang
     
  • 去掉 START_* 控制变量逻辑,默认只启动核心服务 backend/indexer/frontend。
    可选服务改为显式命令:./scripts/service_ctl.sh start embedding
    translator reranker tei cnclip。
    统一 translator 端口读取为 TRANSLATION_PORT(移除 TRANSLATOR_PORT
    兼容)。
    保留未知服务强校验。
    关键文件:service_ctl.sh
    “重名/歧义”修复
    frontend 端口命名统一:FRONTEND_PORT 为主,PORT 仅后备。
    start_frontend.sh 显式导出 PORT="${FRONTEND_PORT}",避免配置了
    FRONTEND_PORT 但服务仍跑 6003 的问题。
    文件:start_frontend.sh、frontend_server.py、env_config.py
    日志/PID 命名治理继续收口
    统一规则继续落地为 logs/<service>.log、logs/<service>.pid。
    cnclip 保持 logs/cnclip.log + logs/cnclip.pid。
    文件:service_ctl.sh、start_cnclip_service.sh、stop_cnclip_service.sh
    backend/indexer 启动风格统一补齐相关项
    frontend/translator 也对齐到 set -euo pipefail,并用 exec 直启主进程。
    文件:start_frontend.sh、start_translator.sh、start_backend.sh、start_indexer.sh
    legacy 入口清理
    删除:start_servers.py、stop_reranker.sh、stop_translator.sh。
    reranker 停止逻辑并入 service_ctl(含 VLLM::EngineCore 清理)。
    benchmark 脚本改为统一入口:service_ctl.sh stop reranker。
    文件:benchmark_reranker_1000docs.sh
    tangwang
     
  • ./scripts/start_tei_service.sh
    START_TEI=0 ./scripts/service_ctl.sh restart embedding
    
    curl -sS -X POST "http://127.0.0.1:6005/embed/text" \
      -H "Content-Type: application/json" \
      -d '["芭比娃娃 儿童玩具", "纯棉T恤 短袖"]'
    tangwang
     

10 Mar, 2026

2 commits


09 Mar, 2026

6 commits


08 Mar, 2026

1 commit


07 Mar, 2026

2 commits


06 Mar, 2026

1 commit


29 Dec, 2025

1 commit


22 Dec, 2025

2 commits


20 Dec, 2025

1 commit


19 Dec, 2025

1 commit


18 Dec, 2025

1 commit

  • 索引的两项功能:
    1. 多语言。 店铺配置的语言如果不等于zh,那么要调用翻译 获得中文翻译结果,同时 如果不等于en,要翻译en的结果。
    要缓存到redis。 先查询缓存,没命中缓存再调用翻译,然后存入redis缓存起来。
    这些逻辑应该是 @query/translator.py 内部的,不需要调用的地方关心。但是现在是  DictCache,直接改掉,改为redis的缓存
    
    2. 填充 标题的向量化字段。如果该店铺的标题向量化打开,那么应该请求向量化模型根据英文的title得到embedding。使用 BgeEncoder.
    
    以上两个模块的缓存,过期时间都是 最近多长时间内没有访问过。
    
    feat:
    1. 更新 REDIS_CONFIG 配置
    在 config/env_config.py 中添加了用户提供的配置项(snapshot_db, translation_cache_expire_days, translation_cache_prefix 等)
    2. 修改 query/translator.py
    将 DictCache 改为 Redis 缓存
    实现了 translate_for_indexing 方法,自动处理多语言翻译:
    如果店铺语言不等于 zh,自动翻译成 zh
    如果店铺语言不等于 en,自动翻译成 en
    翻译逻辑封装在 translator.py 内部,调用方无需关心
    3. 修改 embeddings/text_encoder.py
    在 BgeEncoder 中添加了 Redis 缓存
    实现了滑动过期策略(每次访问时重置过期时间)
    缓存逻辑参考了提供的 CacheManager 对象
    4. 修改 indexer/document_transformer.py
    添加了 encoder 和 enable_title_embedding 参数
    实现了 _fill_title_embedding 方法,使用英文标题(title_en)生成 embedding
    更新了 _fill_text_fields 方法,使用新的 translate_for_indexing 方法
    5. 更新 indexer/indexing_utils.py
    更新了 create_document_transformer 函数,支持新的 encoder 和 enable_title_embedding 参数
    如果启用标题向量化且未提供 encoder,会自动初始化 BgeEncoder
    tangwang
     

05 Dec, 2025

1 commit


14 Nov, 2025

1 commit


11 Nov, 2025

1 commit

  • ## 🎯 Major Features
    - Request context management system for complete request visibility
    - Structured JSON logging with automatic daily rotation
    - Performance monitoring with detailed stage timing breakdowns
    - Query analysis result storage and intermediate result tracking
    - Error and warning collection with context correlation
    
    ## 🔧 Technical Improvements
    - **Context Management**: Request-level context with reqid/uid correlation
    - **Performance Monitoring**: Automatic timing for all search pipeline stages
    - **Structured Logging**: JSON format logs with request context injection
    - **Query Enhancement**: Complete query analysis tracking and storage
    - **Error Handling**: Enhanced error tracking with context information
    
    ## 🐛 Bug Fixes
    - Fixed DeepL API endpoint (paid vs free API confusion)
    - Fixed vector generation (GPU memory cleanup)
    - Fixed logger parameter passing format (reqid/uid handling)
    - Fixed translation and embedding functionality
    
    ## 🌟 API Improvements
    - Simplified API interface (8→5 parameters, 37.5% reduction)
    - Made internal functionality transparent to users
    - Added performance info to API responses
    - Enhanced request correlation and tracking
    
    ## 📁 New Infrastructure
    - Comprehensive test suite (unit, integration, API tests)
    - CI/CD pipeline with automated quality checks
    - Performance monitoring and testing tools
    - Documentation and example usage guides
    
    ## 🔒 Security & Reliability
    - Thread-safe context management for concurrent requests
    - Automatic log rotation and structured output
    - Error isolation with detailed context information
    - Complete request lifecycle tracking
    
    🤖 Generated with Claude Code
    
    Co-Authored-By: Claude <noreply@anthropic.com>
    tangwang
     

08 Nov, 2025

1 commit