tangwang / SearchEngine

05 Dec, 2025

1 commit

8c503501 补充基于阿里云的embedding Browse Dir »

tangwang
2025-12-05 16:58:11 +0800

29 Nov, 2025

1 commit

a10a89a3 构造测试数据用于测试分类和三种属性的分面。 Browse Dir »

tangwang
2025-11-29 09:53:31 +0800

28 Nov, 2025

1 commit

acf1349c fake 批量导入数据的脚步（多款式 ) ... Browse Dir »

脚本：scripts/csv_to_excel_multi_variant.py

主要功能：
单一款式商品（S 类型）- 30%
商品属性为 S
不填写 option1/option2/option3
包含所有商品信息（标题、描述、价格、库存等）
多款式商品（M+P 类型）- 70%
M 行（商品主体）：
商品属性为 M
填写商品主体信息（标题、描述、SEO、分类等）
option1="color", option2="size", option3="material"
不填写价格、库存、SKU 等子款式信息
P 行（子款式）：
商品属性为 P
商品标题与 M 行一致
option1/2/3 填写具体值（color、size、material 的笛卡尔积）
每个 SKU 有独立的价格、库存、SKU 编码等
多款式商品生成规则：
Color（颜色）：从 color1 到 color30 中随机选择 2-10 个
Size（尺寸）：从 1-30 中随机选择 4-8 个
Material（材质）：从商品标题按空格分割后的最后一个字符串提取（去掉特殊字符）
笛卡尔积：生成所有组合的 P 行（例如：3 个颜色 × 5 个尺寸 × 1 个材质 = 15 个 SKU）

2025-11-28 13:33:20 +0800

26 Nov, 2025

1 commit

bf89b597 feat(search): adapt engine to new SPU-level index, mapping and facets Browse Dir »

tangwang
2025-11-26 21:18:58 +0800

25 Nov, 2025

1 commit

59b0a342 创建手写 mapping JSON ... Browse Dir »

mappings/search_products.json - 完整的ES索引配置（settings + mappings）
基于 docs/索引字段说明v2-mapping结构.md
简化 mapping_generator.py
移除所有config依赖
直接使用 load_mapping() 从JSON文件加载
保留工具函数：create_index_if_not_exists, delete_index_if_exists, update_mapping
更新数据导入脚本
scripts/ingest_shoplazza.py - 移除ConfigLoader依赖
直接使用 load_mapping() 和 DEFAULT_INDEX_NAME
更新indexer模块
indexer/__init__.py - 更新导出
indexer/bulk_indexer.py - 简化IndexingPipeline，移除config依赖
创建查询配置常量
search/query_config.py - 硬编码字段列表和配置项

使用方式
创建索引：
from indexer.mapping_generator import load_mapping, create_index_if_not_existsfrom utils.es_client import ESClientes_client = ESClient(hosts=["http://localhost:9200"])mapping = load_mapping()create_index_if_not_exists(es_client, "search_products", mapping)
数据导入：
python scripts/ingest_shoplazza.py \    --db-host localhost \    --db-database saas \    --db-username root \    --db-password password \    --tenant-id "1" \    --es-host http://localhost:9200 \    --recreate

注意事项
修改mapping：直接编辑 mappings/search_products.json
字段映射：spu_transformer.py 中硬编码，与mapping保持一致
config目录：保留但不再使用，可后续清理
search模块：仍依赖config

2025-11-25 22:46:51 +0800

18 Nov, 2025

1 commit

15e63baf 索引文档修改 Browse Dir »

tangwang
2025-11-18 14:03:15 +0800

14 Nov, 2025

6 commits

cadc77b6 索引字段名、变量名、API数据结构字段名都对齐spu/sku表 Browse Dir »

tangwang
2025-11-14 18:51:31 +0800
38f530ff 文档完善 Browse Dir »

tangwang
2025-11-14 13:36:14 +0800
d586fd1f tenant=2测试数据灌入的字段修复 Browse Dir »

tangwang
2025-11-14 13:30:12 +0800
cd3799c6 tenant2 1w测试数据 mock -> mysql and mysql->ES ok, search kw ok Browse Dir »

tangwang
2025-11-14 11:57:48 +0800
8cff1628 tenant2 1w测试数据 mock -> mysql and mysql->ES ok Browse Dir »

tangwang
2025-11-14 11:19:29 +0800
325eec03 1. 日志、配置基础设施，使用优化 ... Browse Dir »
```
2. 向量服务不用本地预估，改用网络服务
```
tangwang
2025-11-14 10:39:49 +0800

13 Nov, 2025

8 commits

a5a3856d 店匠体系数据的搜索:mock data -> mysql, mysql->ES Browse Dir »

tangwang
2025-11-13 15:44:25 +0800
ae5a294d 命名修改、代码清理 Browse Dir »

tangwang
2025-11-13 15:20:38 +0800
41e1f8df 店匠体系数据的搜索:mock data -> mysql, mysql->ES Browse Dir »

tangwang
2025-11-13 15:02:15 +0800
362d43b6 店匠体系数据的搜索 Browse Dir »

tangwang
2025-11-13 14:43:48 +0800

4d824a77 所有租户共用一套统一配置.tenantID只在请求层级.服务层级没有tenantID相关的独立配置. ... Browse Dir »

创建统一配置文件 config/config.yaml（从 base 配置迁移，移除 customer_name）

创建脚本体系
启动、停止、重启、moc数据到mysql、从mysql灌入数据到ES 这些脚本
restart.sh
run.sh 内部调用 启动前后端
scripts/mock_data.sh  mock数据 -> mysql
scripts/ingest.sh  mysql->ES

2025-11-13 14:18:07 +0800

fb68a0ef 配置优化 Browse Dir »

tangwang
2025-11-13 12:34:24 +0800

1852e3e3 添加Base配置演示流程和数据库配置 ... Browse Dir »

主要变更：
1. 创建.env文件，添加MySQL数据库配置（Shoplazza生产环境）
2. 更新config/env_config.py，添加DB_CONFIG配置
3. 创建demo_base.sh脚本，完整的演示流程：
   - 生成测试数据
   - 导入MySQL
   - 导入Elasticsearch
   - 启动后端服务
   - 启动前端服务
4. 创建create_base_frontend.py，生成base配置专用的前端JS
5. 创建frontend/base.html，base配置专用前端页面
6. 更新frontend_server.py，支持base.html路由和PORT环境变量
7. 创建stop_base.sh，停止演示服务脚本

使用方式：
  bash scripts/demo_base.sh [tenant_id]

访问地址：
  http://localhost:6003/base

2025-11-13 12:09:36 +0800

1f6d15fa 重构：SPU级别索引、统一索引架构和API响应格式优化 ... Browse Dir »

主要变更：
1. 去掉数据源应用结构配置化，我们只针对店匠的spu sku表设计索引，数据灌入流程是写死的(只是满足测试需求，后面外层应用负责数据全量+增量灌入)。搜索系统主要关注如何适配外部搜索需求
目前有两个数据灌入脚本，一种是之前的，一种是现在的从两个店匠的表sku表+spu表读取并且以spu为单位组织doc。
   - 配置只关注ES搜索相关配置，提高可维护性
   - 创建base配置（店匠通用配置）

2. 索引结构重构（SPU维度）
   - 所有客户共享search_products索引，通过tenant_id隔离
   - 支持嵌套variants字段（SKU变体数组）
   - 创建SPUTransformer用于SPU数据转换

3. API响应格式优化
   - 约定一套搜索结果的格式，而不是直接暴露ES doc的结构(_id _score _source内的字段）
   - 添加ProductResult和VariantResult模型
   - 添加suggestions和related_searches字段 (预留接口，逻辑暂未实现)

4. 数据导入流程
   - 创建店匠数据导入脚本（ingest_shoplazza.py）
   - Pipeline层决定数据源，配置不包含数据源信息
   - 创建测试数据生成和导入脚本

5. 文档更新
   - 更新设计文档，反映新架构
   - 创建BASE_CONFIG_GUIDE.md使用指南

2025-11-13 11:42:27 +0800

11 Nov, 2025

4 commits

a77693fe 调整配置目录结构 Browse Dir »

tangwang
2025-11-11 22:49:06 +0800
a7a8c6cb 测试过滤、聚合、排序 Browse Dir »

tangwang
2025-11-11 22:09:21 +0800
a7653f3c 补充脚本 Browse Dir »

tangwang
2025-11-11 16:34:18 +0800

16c42787 feat: implement request-scoped context management with structured logging ... Browse Dir »

## 🎯 Major Features
- Request context management system for complete request visibility
- Structured JSON logging with automatic daily rotation
- Performance monitoring with detailed stage timing breakdowns
- Query analysis result storage and intermediate result tracking
- Error and warning collection with context correlation

## 🔧 Technical Improvements
- **Context Management**: Request-level context with reqid/uid correlation
- **Performance Monitoring**: Automatic timing for all search pipeline stages
- **Structured Logging**: JSON format logs with request context injection
- **Query Enhancement**: Complete query analysis tracking and storage
- **Error Handling**: Enhanced error tracking with context information

## 🐛 Bug Fixes
- Fixed DeepL API endpoint (paid vs free API confusion)
- Fixed vector generation (GPU memory cleanup)
- Fixed logger parameter passing format (reqid/uid handling)
- Fixed translation and embedding functionality

## 🌟 API Improvements
- Simplified API interface (8→5 parameters, 37.5% reduction)
- Made internal functionality transparent to users
- Added performance info to API responses
- Enhanced request correlation and tracking

## 📁 New Infrastructure
- Comprehensive test suite (unit, integration, API tests)
- CI/CD pipeline with automated quality checks
- Performance monitoring and testing tools
- Documentation and example usage guides

## 🔒 Security & Reliability
- Thread-safe context management for concurrent requests
- Automatic log rotation and structured output
- Error isolation with detailed context information
- Complete request lifecycle tracking

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-11 12:29:10 +0800

10 Nov, 2025

2 commits

bb3c5ef8 灌入数据流程跑通 Browse Dir »

tangwang
2025-11-10 23:11:40 +0800
a406638e up Browse Dir »

tangwang
2025-11-10 15:35:42 +0800

08 Nov, 2025

2 commits

2a76641e config Browse Dir »

tangwang
2025-11-08 08:47:33 +0800
115047ee 为一个租户灌入测试数据；实例的启动代码（包括前后端） Browse Dir »

tangwang
2025-11-08 00:24:52 +0800