19 Mar, 2026

3 commits

  • tangwang
     
  • tangwang
     
  • - Text and image embedding are now split into separate
      services/processes, while still keeping a single replica as requested.
    The split lives in
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L112),
    [config/services_config.py](/data/saas-search/config/services_config.py#L68),
    [providers/embedding.py](/data/saas-search/providers/embedding.py#L27),
    and the start scripts
    [scripts/start_embedding_service.sh](/data/saas-search/scripts/start_embedding_service.sh#L36),
    [scripts/start_embedding_text_service.sh](/data/saas-search/scripts/start_embedding_text_service.sh),
    [scripts/start_embedding_image_service.sh](/data/saas-search/scripts/start_embedding_image_service.sh).
    - Independent admission control is in place now: text and image have
      separate inflight limits, and image can be kept much stricter than
    text. The request handling, reject path, `/health`, and `/ready` are in
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L613),
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L786), and
    [embeddings/server.py](/data/saas-search/embeddings/server.py#L1028).
    - I checked the Redis embedding cache. It did exist, but there was a
      real flaw: cache keys did not distinguish `normalize=true` from
    `normalize=false`. I fixed that in
    [embeddings/cache_keys.py](/data/saas-search/embeddings/cache_keys.py#L6),
    and both text and image now use the same normalize-aware keying. I also
    added service-side BF16 cache hits that short-circuit before the model
    lane, so repeated requests no longer get throttled behind image
    inference.
    
    **What This Means**
    - Image pressure no longer blocks text, because they are on different
      ports/processes.
    - Repeated text/image requests now return from Redis without consuming
      model capacity.
    - Over-capacity requests are rejected quickly instead of sitting
      blocked.
    - I did not add a load balancer or multi-replica HA, per your GPU
      constraint. I also did not build Grafana/Prometheus dashboards in this
    pass, but `/health` now exposes the metrics needed to wire them.
    
    **Validation**
    - Tests passed: `.venv/bin/python -m pytest -q
      tests/test_embedding_pipeline.py
    tests/test_embedding_service_limits.py` -> `10 passed`
    - Stress test tool updates are in
      [scripts/perf_api_benchmark.py](/data/saas-search/scripts/perf_api_benchmark.py#L155)
    - Fresh benchmark on split text service `6105`: 535 requests / 3s, 100%
      success, `174.56 rps`, avg `88.48 ms`
    - Fresh benchmark on split image service `6108`: 1213 requests / 3s,
      100% success, `403.32 rps`, avg `9.64 ms`
    - Live health after the run showed cache hits and non-zero cache-hit
      latency accounting:
      - text `avg_latency_ms=4.251`
      - image `avg_latency_ms=1.462`
    tangwang
     

17 Mar, 2026

3 commits

  • 2. 抽象出可复用的 embedding Redis 缓存类(图文共用)
    
    详细:
    1. embedding 缓存改为 BF16 存 Redis(读回恢复 FP32)
    关键行为(按你给的流程落地)
    写入前:FP32 embedding →(normalize_embeddings=True 时)L2 normalize →
    转 BF16 → bytes(2字节/维,大端) → redis.setex
    读取后:redis.get bytes → BF16 → 恢复 FP32(np.float32 向量)
    变更点
    新增 embeddings/bf16.py
    提供 float32_to_bf16 / bf16_to_float32
    encode_embedding_for_redis():FP32 → BF16 → bytes
    decode_embedding_from_redis():bytes → BF16 → FP32
    l2_normalize_fp32():按需归一化
    修改 embeddings/text_encoder.py
    Redis value 从 pickle.dumps(np.ndarray) 改为 BF16 bytes
    缓存 key 改为包含 normalize 标记:{prefix}:{n0|n1}:{query}(避免
    normalize 开关不同却共用缓存)
    修改 tests/test_embedding_pipeline.py
    cache hit 用例改为写入 BF16 bytes,并使用新
    key:embedding:n1:cached-text
    修改 docs/缓存与Redis使用说明.md
    embedding 缓存的 Key/Value 格式更新为 BF16 bytes + n0/n1
    修改 scripts/redis/redis_cache_health_check.py
    embedding pattern 不再硬编码 embedding:*,改为读取
    REDIS_CONFIG["embedding_cache_prefix"]
    value 预览从 pickle 解码改为 BF16 解码后展示 dim/bytes/dtype
    自检
    在激活环境后跑过 BF16 编解码往返 sanity check:bytes
    长度、维度恢复正常;归一化向量读回后范数接近 1(会有 BF16 量化误差)。
    
    2. 抽象出可复用的 embedding Redis 缓存类(图文共用)
    新增
    embeddings/redis_embedding_cache.py:RedisEmbeddingCache
    统一 Redis 初始化(读 REDIS_CONFIG)
    统一 BF16 bytes 编解码(复用 embeddings/bf16.py)
    统一过期策略:写入 setex(expire_time),命中读取后 expire(expire_time)
    滑动过期刷新 TTL
    统一异常/坏数据处理:解码失败或向量非 1D/为空/含 NaN/Inf 会删除该 key
    并当作 miss
    已接入复用
    文本 embeddings/text_encoder.py
    用 self.cache = RedisEmbeddingCache(key_prefix=..., namespace="")
    key 仍是:{prefix}:{query}
    图片 embeddings/image_encoder.py
    用 self.cache = RedisEmbeddingCache(key_prefix=..., namespace="image")
    key 仍是:{prefix}:image:{url_or_path}
    tangwang
     
  • tangwang
     
  • tangwang
     

10 Mar, 2026

1 commit


09 Mar, 2026

2 commits


07 Mar, 2026

1 commit


22 Dec, 2025

1 commit


20 Dec, 2025

1 commit


19 Dec, 2025

1 commit


18 Dec, 2025

1 commit

  • 索引的两项功能:
    1. 多语言。 店铺配置的语言如果不等于zh,那么要调用翻译 获得中文翻译结果,同时 如果不等于en,要翻译en的结果。
    要缓存到redis。 先查询缓存,没命中缓存再调用翻译,然后存入redis缓存起来。
    这些逻辑应该是 @query/translator.py 内部的,不需要调用的地方关心。但是现在是  DictCache,直接改掉,改为redis的缓存
    
    2. 填充 标题的向量化字段。如果该店铺的标题向量化打开,那么应该请求向量化模型根据英文的title得到embedding。使用 BgeEncoder.
    
    以上两个模块的缓存,过期时间都是 最近多长时间内没有访问过。
    
    feat:
    1. 更新 REDIS_CONFIG 配置
    在 config/env_config.py 中添加了用户提供的配置项(snapshot_db, translation_cache_expire_days, translation_cache_prefix 等)
    2. 修改 query/translator.py
    将 DictCache 改为 Redis 缓存
    实现了 translate_for_indexing 方法,自动处理多语言翻译:
    如果店铺语言不等于 zh,自动翻译成 zh
    如果店铺语言不等于 en,自动翻译成 en
    翻译逻辑封装在 translator.py 内部,调用方无需关心
    3. 修改 embeddings/text_encoder.py
    在 BgeEncoder 中添加了 Redis 缓存
    实现了滑动过期策略(每次访问时重置过期时间)
    缓存逻辑参考了提供的 CacheManager 对象
    4. 修改 indexer/document_transformer.py
    添加了 encoder 和 enable_title_embedding 参数
    实现了 _fill_title_embedding 方法,使用英文标题(title_en)生成 embedding
    更新了 _fill_text_fields 方法,使用新的 translate_for_indexing 方法
    5. 更新 indexer/indexing_utils.py
    更新了 create_document_transformer 函数,支持新的 encoder 和 enable_title_embedding 参数
    如果启用标题向量化且未提供 encoder,会自动初始化 BgeEncoder
    tangwang
     

14 Nov, 2025

1 commit


11 Nov, 2025

1 commit

  • ## 🎯 Major Features
    - Request context management system for complete request visibility
    - Structured JSON logging with automatic daily rotation
    - Performance monitoring with detailed stage timing breakdowns
    - Query analysis result storage and intermediate result tracking
    - Error and warning collection with context correlation
    
    ## 🔧 Technical Improvements
    - **Context Management**: Request-level context with reqid/uid correlation
    - **Performance Monitoring**: Automatic timing for all search pipeline stages
    - **Structured Logging**: JSON format logs with request context injection
    - **Query Enhancement**: Complete query analysis tracking and storage
    - **Error Handling**: Enhanced error tracking with context information
    
    ## 🐛 Bug Fixes
    - Fixed DeepL API endpoint (paid vs free API confusion)
    - Fixed vector generation (GPU memory cleanup)
    - Fixed logger parameter passing format (reqid/uid handling)
    - Fixed translation and embedding functionality
    
    ## 🌟 API Improvements
    - Simplified API interface (8→5 parameters, 37.5% reduction)
    - Made internal functionality transparent to users
    - Added performance info to API responses
    - Enhanced request correlation and tracking
    
    ## 📁 New Infrastructure
    - Comprehensive test suite (unit, integration, API tests)
    - CI/CD pipeline with automated quality checks
    - Performance monitoring and testing tools
    - Documentation and example usage guides
    
    ## 🔒 Security & Reliability
    - Thread-safe context management for concurrent requests
    - Automatic log rotation and structured output
    - Error isolation with detailed context information
    - Complete request lifecycle tracking
    
    🤖 Generated with Claude Code
    
    Co-Authored-By: Claude <noreply@anthropic.com>
    tangwang
     

08 Nov, 2025

1 commit