tangwang / SearchEngine

09 Dec, 2025

1 commit

f54b3854 pu_ids参数。目前总共3个参数： ... Browse Dir »

tenant_id
spu_ids
delete_spu_ids

spu_ids里面的，如果is_delete字段为1，我这边也要做删除。
delete_spu_ids的 直接删除
为您的变更输入提交说明。以 '#' 开始的行将被忽略，而一个空的提交

2025-12-09 11:03:47 +0800

08 Dec, 2025

2 commits

c55c5e47 feat: 新增增量索引接口并重构索引接口命名 ... Browse Dir »

新增功能：
- 新增 POST /indexer/index 增量索引接口，支持按SPU ID列表进行增量索引
- 新增 indexer/indexer_logger.py 索引日志模块，统一记录全量和增量索引日志到 logs/indexer.log（JSON格式）
- IncrementalIndexerService 新增 index_spus_to_es 方法，实现增量索引功能

接口重命名：
- POST /indexer/bulk -> POST /indexer/reindex（全量重建索引）
- POST /indexer/incremental -> POST /indexer/index（增量索引）
- POST /indexer/spus -> POST /indexer/documents（查询文档）

日志系统：
- 全量和增量索引操作统一记录到 logs/indexer.log
- 记录请求参数、处理过程、ES写入结果、成功/失败统计等关键信息
- 支持按索引类型、租户ID、SPU ID等维度查询日志

文档更新：
- 更新接口文档，包含新的接口命名和增量索引接口说明
- 添加日志查询示例（grep和jq两种方式）

2025-12-08 12:14:38 +0800

3c1f8031 api/routes/indexer.py ... Browse Dir »

- 新增批量索引接口: POST /indexer/bulk - 全量索引功能
  - SPU接口改进: POST /indexer/spus - 支持批量获取SPU文档（最多100个）

新增 全量索引服务
indexer/bulk_indexing_service.py

docs/搜索API对接指南.md
  - 新增索引接口文档: 详细的批量索引和SPU索引接口说明
  - 请求示例: 提供完整的curl命令示例

2025-12-08 09:41:34 +0800

07 Dec, 2025

1 commit

0064e946 feat: 增量索引服务、租户配置和翻译功能集成 ... Browse Dir »

主要功能：
1. 增量数据获取服务
   - 新增 IncrementalIndexerService 提供单个SPU数据获取
   - 新增 /indexer/spu/{spu_id} API接口
   - 服务启动时预加载分类映射等公共数据
   - 提取 SPUDocumentTransformer 统一全量和增量转换逻辑
   - 支持根据租户配置进行语言处理和翻译

3. 租户配置系统
   - 租户配置合并到统一配置文件 config/config.yaml
   - 支持每个租户独立配置主语言和翻译选项
   - 租户162配置为翻译关闭（用于测试）

4. 翻译功能集成
   - 翻译提示词作为DeepL API的context参数传递
   - 支持中英文提示词配置
   - 索引场景：同步翻译，使用缓存
   - 查询场景：异步翻译，立即返回

测试：
- 新增 indexer/test_indexing.py 和 query/test_translation.py
- 验证租户162翻译关闭功能
- 验证全量和增量索引功能

2025-12-07 11:11:12 +0800

02 Dec, 2025

1 commit

33839b37 属性值参与搜索： ... Browse Dir »

1. 加了一个配置searchable_option_dimensions，功能是配置子sku的option1_value option2_value option3_value 哪些参与检索（进索引、以及在线搜索的时候将对应字段纳入搜索field）。格式为list，选择三者中的一个或多个。

2. 索引 @mappings/search_products.json 要加3个字段 option1_values option2_values option3_values，各自的 数据灌入（mysql->ES）的模块也要修改，这个字段是对子sku的option1_value option2_value option3_value分别提取去抽后得到的list。
searchable_option_dimensions 中配置的，才进索引，比如 searchable_option_dimensions = ['option1'] 则 只对option1提取属性值去重组织list进入索引，其余两个字段为空

3. 在线 对应的将 searchable_option_dimensions 中 对应的索引字段纳入 multi_match 的 fields，权重设为0.5 （各个字段的权重配置放到一起集中管理）

1. 配置文件改动 (config/config.yaml)
✅ 在 spu_config 中添加了 searchable_option_dimensions 配置项，默认值为 ['option1', 'option2', 'option3']
✅ 添加了3个新字段定义：option1_values, option2_values, option3_values，类型为 KEYWORD，权重为 0.5
✅ 在 default 索引域的 fields 列表中添加了这3个字段，使其参与搜索
2. ES索引Mapping改动 (mappings/search_products.json)
✅ 添加了3个新字段：option1_values, option2_values, option3_values，类型为 keyword
3. 配置加载器改动 (config/config_loader.py)
✅ 在 SPUConfig 类中添加了 searchable_option_dimensions 字段
✅ 更新了配置解析逻辑，支持读取 searchable_option_dimensions
✅ 更新了配置转换为字典的逻辑
4. 数据灌入改动 (indexer/spu_transformer.py)
✅ 在初始化时加载配置，获取 searchable_option_dimensions
✅ 在 _transform_spu_to_doc 方法中添加逻辑：
从所有子SKU中提取 option1, option2, option3 值
去重后存入 option1_values, option2_values, option3_values
根据配置决定哪些字段实际写入数据（未配置的字段写空数组）

=

2025-12-02 18:35:50 +0800

01 Dec, 2025

1 commit

99bea633 add logs Browse Dir »

tangwang
2025-12-01 15:21:22 +0800

27 Nov, 2025

1 commit

ca91352a 更新文档 ... Browse Dir »

1. 搜索API对接指南.md
在“精确匹配过滤器”部分添加了 specifications 嵌套过滤说明
支持单个规格过滤和多个规格过滤（OR 逻辑）
在“分面配置”部分完善了 specifications 分面说明
添加了两种分面模式：所有规格名称和指定规格名称
在“常见场景示例”部分添加了场景5-8，包含规格过滤和分面的完整示例
2. 搜索API速查表.md
在“精确匹配过滤”部分添加了 specifications 过滤的快速参考
在“分面搜索”部分添加了 specifications 分面的快速参考
更新了完整示例，包含 specifications 的使用
3. Search-API-Examples.md
在“过滤器使用”部分添加了示例4-6，展示 specifications 过滤
在“分面搜索”部分添加了示例2-3，展示 specifications 分面
更新了 Python 和 JavaScript 完整示例，包含 specifications 的使用
在“常见使用场景”部分添加了场景2.1，展示带规格过滤的搜索结果页
4. 索引字段说明v2.md
更新了 specifications 字段的查询示例，包含 API 格式和 ES 查询结构
添加了两种分面模式的说明和示例
更新了“分面字段”说明，明确支持指定规格名称的分面

5. 补充参数
参数说明：sku_filter_dimension 是可选参数，用于按指定维度过滤每个SPU下的SKU
支持的维度：
直接选项字段：option1、option2、option3
规格名称：通过 option1_name、option2_name、option3_name 匹配（如 color、size）

2025-11-27 12:13:55 +0800

26 Nov, 2025

1 commit

577ec972 返回给前端的字段、格式适配。主要包括字段配置、前端补充一个语言字段处理title_en title_zh等语言选择、分面信息的提取等 Browse Dir »

tangwang
2025-11-26 22:35:07 +0800

13 Nov, 2025

3 commits

37e994bb 命名修改、代码清理 Browse Dir »

tangwang
2025-11-13 15:18:35 +0800

4d824a77 所有租户共用一套统一配置.tenantID只在请求层级.服务层级没有tenantID相关的独立配置. ... Browse Dir »

创建统一配置文件 config/config.yaml（从 base 配置迁移，移除 customer_name）

创建脚本体系
启动、停止、重启、moc数据到mysql、从mysql灌入数据到ES 这些脚本
restart.sh
run.sh 内部调用 启动前后端
scripts/mock_data.sh  mock数据 -> mysql
scripts/ingest.sh  mysql->ES

2025-11-13 14:18:07 +0800

1f6d15fa 重构：SPU级别索引、统一索引架构和API响应格式优化 ... Browse Dir »

主要变更：
1. 去掉数据源应用结构配置化，我们只针对店匠的spu sku表设计索引，数据灌入流程是写死的(只是满足测试需求，后面外层应用负责数据全量+增量灌入)。搜索系统主要关注如何适配外部搜索需求
目前有两个数据灌入脚本，一种是之前的，一种是现在的从两个店匠的表sku表+spu表读取并且以spu为单位组织doc。
   - 配置只关注ES搜索相关配置，提高可维护性
   - 创建base配置（店匠通用配置）

2. 索引结构重构（SPU维度）
   - 所有客户共享search_products索引，通过tenant_id隔离
   - 支持嵌套variants字段（SKU变体数组）
   - 创建SPUTransformer用于SPU数据转换

3. API响应格式优化
   - 约定一套搜索结果的格式，而不是直接暴露ES doc的结构(_id _score _source内的字段）
   - 添加ProductResult和VariantResult模型
   - 添加suggestions和related_searches字段 (预留接口，逻辑暂未实现)

4. 数据导入流程
   - 创建店匠数据导入脚本（ingest_shoplazza.py）
   - Pipeline层决定数据源，配置不包含数据源信息
   - 创建测试数据生成和导入脚本

5. 文档更新
   - 更新设计文档，反映新架构
   - 创建BASE_CONFIG_GUIDE.md使用指南

2025-11-13 11:42:27 +0800

12 Nov, 2025

1 commit

6aa246be 问题：Pydantic 应该能自动转换字典到模型，但如果字典结构不完全匹配或验证失败，可能导致字段为空或验证错误被忽略。 Browse Dir »

tangwang
2025-11-12 11:21:41 +0800

11 Nov, 2025

3 commits

1f071951 补充调试信息，记录包括各个阶段的比如query分析结果检索表达式各阶段耗时 ES搜索的检索表达式 Browse Dir »

tangwang
2025-11-11 22:39:15 +0800
c86c8237 支持聚合。过滤项补充了逻辑，但是有问题 Browse Dir »

tangwang
2025-11-11 20:46:04 +0800

16c42787 feat: implement request-scoped context management with structured logging ... Browse Dir »

## 🎯 Major Features
- Request context management system for complete request visibility
- Structured JSON logging with automatic daily rotation
- Performance monitoring with detailed stage timing breakdowns
- Query analysis result storage and intermediate result tracking
- Error and warning collection with context correlation

## 🔧 Technical Improvements
- **Context Management**: Request-level context with reqid/uid correlation
- **Performance Monitoring**: Automatic timing for all search pipeline stages
- **Structured Logging**: JSON format logs with request context injection
- **Query Enhancement**: Complete query analysis tracking and storage
- **Error Handling**: Enhanced error tracking with context information

## 🐛 Bug Fixes
- Fixed DeepL API endpoint (paid vs free API confusion)
- Fixed vector generation (GPU memory cleanup)
- Fixed logger parameter passing format (reqid/uid handling)
- Fixed translation and embedding functionality

## 🌟 API Improvements
- Simplified API interface (8→5 parameters, 37.5% reduction)
- Made internal functionality transparent to users
- Added performance info to API responses
- Enhanced request correlation and tracking

## 📁 New Infrastructure
- Comprehensive test suite (unit, integration, API tests)
- CI/CD pipeline with automated quality checks
- Performance monitoring and testing tools
- Documentation and example usage guides

## 🔒 Security & Reliability
- Thread-safe context management for concurrent requests
- Automatic log rotation and structured output
- Error isolation with detailed context information
- Complete request lifecycle tracking

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-11 12:29:10 +0800

10 Nov, 2025

2 commits

bb3c5ef8 灌入数据流程跑通 Browse Dir »

tangwang
2025-11-10 23:11:40 +0800
a406638e up Browse Dir »

tangwang
2025-11-10 15:35:42 +0800

08 Nov, 2025

1 commit

be52af70 first commit Browse Dir »

tangwang
2025-11-08 00:07:09 +0800