14 Nov, 2025

3 commits


13 Nov, 2025

2 commits

  • tangwang
     
  • 主要变更:
    1. 去掉数据源应用结构配置化,我们只针对店匠的spu sku表设计索引,数据灌入流程是写死的(只是满足测试需求,后面外层应用负责数据全量+增量灌入)。搜索系统主要关注如何适配外部搜索需求
    目前有两个数据灌入脚本,一种是之前的,一种是现在的从两个店匠的表sku表+spu表读取并且以spu为单位组织doc。
       - 配置只关注ES搜索相关配置,提高可维护性
       - 创建base配置(店匠通用配置)
    
    2. 索引结构重构(SPU维度)
       - 所有客户共享search_products索引,通过tenant_id隔离
       - 支持嵌套variants字段(SKU变体数组)
       - 创建SPUTransformer用于SPU数据转换
    
    3. API响应格式优化
       - 约定一套搜索结果的格式,而不是直接暴露ES doc的结构(_id _score _source内的字段)
       - 添加ProductResult和VariantResult模型
       - 添加suggestions和related_searches字段 (预留接口,逻辑暂未实现)
    
    4. 数据导入流程
       - 创建店匠数据导入脚本(ingest_shoplazza.py)
       - Pipeline层决定数据源,配置不包含数据源信息
       - 创建测试数据生成和导入脚本
    
    5. 文档更新
       - 更新设计文档,反映新架构
       - 创建BASE_CONFIG_GUIDE.md使用指南
    tangwang
     

12 Nov, 2025

4 commits

  • tangwang
     
  • 核心改动:
    1. 修复facets类型问题
       - 统一使用Pydantic模型(FacetResult, FacetValue)
       - SearchResult.facets改为List[FacetResult]
       - _standardize_facets直接构建Pydantic对象
    
    2. 修复RangeFilter支持日期时间
       - RangeFilter字段改为Union[float, str]
       - 支持数值范围和ISO日期时间字符串
       - 修复前端listing time筛选422错误
    
    3. 重构ES查询结构(核心)
       - 使用function_score包裹整个查询
       - 文本和KNN放入内层bool.should(minimum_should_match=1)
       - Filter在外层bool,同时作用于文本和KNN查询
       - 添加时效性加权函数(days_since_last_update<=30 weight:1.1)
    
    4. RankingEngine重构
       - 重命名为RerankEngine(语义更准确)
       - 默认禁用(enabled=False)
       - 优先使用ES的function_score打分
    
    5. 统一约定原则
       - 移除所有字典兼容代码
       - 全系统统一使用Pydantic模型
       - build_facets只接受str或FacetConfig
       - _build_filters直接接受RangeFilter模型
    
    修改文件:
    - search/multilang_query_builder.py: 重构查询构建逻辑
    - search/es_query_builder.py: 统一Pydantic模型支持
    - search/searcher.py: 使用RerankEngine,更新导入
    - search/rerank_engine.py: 新建(从ranking_engine.py重命名)
    - search/ranking_engine.py: 删除
    - search/__init__.py: 更新导出
    - api/models.py: RangeFilter支持Union[float, str]
    
    测试验证:
    ✓ Facets正常返回
    ✓ Filter同时作用于文本和KNN
    ✓ 日期时间范围过滤正常
    ✓ Function score时效性加权正常
    ✓ 所有测试通过
    
    架构原则:统一约定,不做兼容,保持简单
    tangwang
     
  • tangwang
     
  • tangwang
     

11 Nov, 2025

4 commits

  • tangwang
     
  • tangwang
     
  • tangwang
     
  • ## 🎯 Major Features
    - Request context management system for complete request visibility
    - Structured JSON logging with automatic daily rotation
    - Performance monitoring with detailed stage timing breakdowns
    - Query analysis result storage and intermediate result tracking
    - Error and warning collection with context correlation
    
    ## 🔧 Technical Improvements
    - **Context Management**: Request-level context with reqid/uid correlation
    - **Performance Monitoring**: Automatic timing for all search pipeline stages
    - **Structured Logging**: JSON format logs with request context injection
    - **Query Enhancement**: Complete query analysis tracking and storage
    - **Error Handling**: Enhanced error tracking with context information
    
    ## 🐛 Bug Fixes
    - Fixed DeepL API endpoint (paid vs free API confusion)
    - Fixed vector generation (GPU memory cleanup)
    - Fixed logger parameter passing format (reqid/uid handling)
    - Fixed translation and embedding functionality
    
    ## 🌟 API Improvements
    - Simplified API interface (8→5 parameters, 37.5% reduction)
    - Made internal functionality transparent to users
    - Added performance info to API responses
    - Enhanced request correlation and tracking
    
    ## 📁 New Infrastructure
    - Comprehensive test suite (unit, integration, API tests)
    - CI/CD pipeline with automated quality checks
    - Performance monitoring and testing tools
    - Documentation and example usage guides
    
    ## 🔒 Security & Reliability
    - Thread-safe context management for concurrent requests
    - Automatic log rotation and structured output
    - Error isolation with detailed context information
    - Complete request lifecycle tracking
    
    🤖 Generated with Claude Code
    
    Co-Authored-By: Claude <noreply@anthropic.com>
    tangwang
     

10 Nov, 2025

1 commit


08 Nov, 2025

2 commits