ai-saas / saas-search

13 Mar, 2026

2 commits

d4cadc13 翻译重构 Browse File »

tangwang
2026-03-13 20:28:08 +0800
985752f5 1. 前端调试功能 ... Browse File »
```
2. 翻译限速 对应处理（qwen-mt限速）
```
tangwang
2026-03-13 16:15:06 +0800

12 Mar, 2026

1 commit

5f7d7f09 性能测试报告.md Browse File »

tangwang
2026-03-12 08:44:55 +0800

11 Mar, 2026

1 commit

28e57bb1 日志体系优化 Browse File »

tangwang
2026-03-11 23:04:17 +0800

10 Mar, 2026

3 commits

ff9efda0 suggest Browse File »

tangwang
2026-03-10 20:14:55 +0800

bd96cead 1. 动态多语言字段与统一策略配置 ... Browse File »

- 配置改为“字段基名 + 动态语言后缀”方案，已不再依赖旧 `indexes`。
[config.yaml](/data/saas-search/config/config.yaml#L17)
- `search_fields` / `text_query_strategy` 已进入强校验与解析流程。
[config_loader.py](/data/saas-search/config/config_loader.py#L254)

2. 查询语言计划与翻译等待策略
- `QueryParser` 现在产出
  `query_text_by_lang`、`search_langs`、`source_in_index_languages`。
[query_parser.py](/data/saas-search/query/query_parser.py#L41)
- 你要求的两种翻译路径都在：
  - 源语言不在店铺 `index_languages`：`translate_multi_async` + 等待
    future
  - 源语言在 `index_languages`：`translate_multi(...,
    async_mode=True)`，尽量走缓存
[query_parser.py](/data/saas-search/query/query_parser.py#L284)

3. ES 查询统一文本策略（无 AST 分支）
- 主召回按 `search_langs` 动态拼 `field.{lang}`，翻译语种做次权重
  `should`。
[es_query_builder.py](/data/saas-search/search/es_query_builder.py#L454)
- 布尔 AST 路径已删除，仅保留统一文本策略。
[es_query_builder.py](/data/saas-search/search/es_query_builder.py#L185)

4. LanguageDetector 优化
- 从“拉丁字母默认英文”升级为：脚本优先 +
  拉丁语系打分（词典/变音/后缀）。
[language_detector.py](/data/saas-search/query/language_detector.py#L68)

5. 布尔能力清理（补充）
- 已删除废弃模块：
[boolean_parser.py](/data/saas-search/search/boolean_parser.py)
- `search/__init__` 已无相关导出。
[search/__init__.py](/data/saas-search/search/__init__.py)

6. `indexes` 过时收口（补充）
- 兼容函数改为基于动态字段生成，不再依赖 `config.indexes`。
[utils.py](/data/saas-search/config/utils.py#L24)
- Admin 配置接口改为返回动态字段配置，不再暴露 `num_indexes`。
[admin.py](/data/saas-search/api/routes/admin.py#L52)

7. suggest

2026-03-10 16:06:31 +0800

26b910bd refactor service init and tighten multi-tenant search contracts Browse File »

tangwang
2026-03-10 13:09:24 +0800

02 Mar, 2026

2 commits

316c97c4 feat: 完整落地多租户 suggestion 能力 ... Browse File »

- 新增 suggestion 模块（mapping/builder/service），支持按租户构建 `search_suggestions_tenant_{tenant_id}` 索引
- 新增 `main.py build-suggestions` CLI 与 `scripts/build_suggestions.sh`，支持基于商品 title/qanchors 与近 365 天搜索日志的全量构建
- 实现 `/search/suggestions` 接口（多语言 + 结果直达），并接入前端自动补全使用新的后端 API
- 为 suggestion 增加 `README` / `RUNBOOK` / `TROUBLESHOOTING` 文档，更新搜索 API 对接指南与速查表
- 补充 `tests/test_suggestions.py` 单元测试，覆盖语言解析和 SuggestionService 查询流程

Made-with: Cursor

2026-03-02 22:21:19 +0800

ded6f29e 补充suggestion模块 ... Browse File »

- 新增 `suggestion` 模块：
  - `suggestion/mapping.py`：`search_suggestions` mapping 生成（多语言 `completion` + `search_as_you_type`）
  - `suggestion/builder.py`：全量构建程序（扫描 `search_products` 的 `title/qanchors` + MySQL `shoplazza_search_log`）
  - `suggestion/service.py`：在线查询服务（suggestion 检索 + 结果直达商品二次查询）
  - `suggestion/__init__.py`

- 接入 API 服务初始化：
  - `api/app.py` 新增 `SuggestionService` 初始化和 `get_suggestion_service()`

- 接口实现：
  - `api/routes/search.py` 的 `GET /search/suggestions` 从“空框架”改为真实调用
  - 支持参数：
    - `q`, `size`, `language`
    - `with_results`（是否直达商品）
    - `result_size`（每条 suggestion 商品数）
    - `debug`
  - 继续要求 `X-Tenant-ID`（或 query 的 `tenant_id`）

- 模型补充：
  - `api/models.py` 增加 suggestion 请求/响应字段（`language`, `resolved_language`, `with_results`, `result_size`）

- CLI 全量构建命令：
  - `main.py` 新增 `build-suggestions`
  - 使用方式：
    - `python main.py build-suggestions --tenant-id 1 --recreate`
    - 可选：`--days 30 --batch-size 500 --min-query-len 1 --es-host ...`

---

 关键实现逻辑（已编码）

- 语言归属优先级（按你要求）：
  - `shoplazza_search_log.language` > `request_params.language` > 脚本/模型兜底
- 候选词聚合键：
  - `(tenant_id, lang, text_norm)`（文档唯一）
- 评分：
  - 基于 `query_count_30d/7d + qanchor_doc_count + title_doc_count` 的离线分
- 结果直达：
  - 对每条 suggestion 在 `search_products_tenant_{id}` 做二次查询（`qanchors/title` 组合）

---

 变更文件

- `api/app.py`
- `api/models.py`
- `api/routes/search.py`
- `main.py`
- `suggestion/__init__.py`
- `suggestion/mapping.py`
- `suggestion/builder.py`
- `suggestion/service.py`

2026-03-02 20:35:05 +0800

05 Feb, 2026

2 commits

ff32d894 rerank Browse File »

tangwang
2026-02-05 16:13:46 +0800

506c39b7 feat(search): 统一重排逻辑，仅由 ai_search 控制并调用外部 BGE 重排服务 ... Browse File »

- API：新增请求参数 ai_search，开启时在窗口内走重排流程
- 配置：RerankConfig 移除 enabled/expression/description，仅保留 rerank_window 及
  service_url/timeout_sec/weight_es/weight_ai；默认超时 15s
- 重排流程：ai_search 且 from+size<=rerank_window 时，ES 取前 rerank_window 条，
  调用外部 /rerank 服务，融合 ES 与重排分数后按 from/size 分页；否则不重排
- search/rerank_client：新增模块，封装 build_docs、call_rerank_service、
  fuse_scores_and_resort、run_rerank；超时单独捕获并简短日志
- search/searcher：移除 RerankEngine，enable_rerank=ai_search，使用 config.rerank 参数
- 删除 search/rerank_engine.py（本地表达式重排），统一为外部服务一种实现
- 文档：搜索 API 对接指南补充 ai_search 与 relevance_score 说明
- 测试：conftest 中 rerank 配置改为新结构

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-05 14:13:41 +0800

26 Jan, 2026

1 commit

3cd09b3b 翻译接口改为调用qwen-mt-flash ... Browse File »
```
文档： 翻译模块说明.md
```
tangwang
2026-01-26 13:31:41 +0800

27 Dec, 2025

1 commit

e4a39cc8 索引隔离。不同的tenant_id用不同的索引 Browse File »

tangwang
2025-12-27 15:02:31 +0800

01 Dec, 2025

1 commit

99bea633 add logs Browse File »

tangwang
2025-12-01 15:21:22 +0800

27 Nov, 2025

1 commit

ca91352a 更新文档 ... Browse File »

1. 搜索API对接指南.md
在“精确匹配过滤器”部分添加了 specifications 嵌套过滤说明
支持单个规格过滤和多个规格过滤（OR 逻辑）
在“分面配置”部分完善了 specifications 分面说明
添加了两种分面模式：所有规格名称和指定规格名称
在“常见场景示例”部分添加了场景5-8，包含规格过滤和分面的完整示例
2. 搜索API速查表.md
在“精确匹配过滤”部分添加了 specifications 过滤的快速参考
在“分面搜索”部分添加了 specifications 分面的快速参考
更新了完整示例，包含 specifications 的使用
3. Search-API-Examples.md
在“过滤器使用”部分添加了示例4-6，展示 specifications 过滤
在“分面搜索”部分添加了示例2-3，展示 specifications 分面
更新了 Python 和 JavaScript 完整示例，包含 specifications 的使用
在“常见使用场景”部分添加了场景2.1，展示带规格过滤的搜索结果页
4. 索引字段说明v2.md
更新了 specifications 字段的查询示例，包含 API 格式和 ES 查询结构
添加了两种分面模式的说明和示例
更新了“分面字段”说明，明确支持指定规格名称的分面

5. 补充参数
参数说明：sku_filter_dimension 是可选参数，用于按指定维度过滤每个SPU下的SKU
支持的维度：
直接选项字段：option1、option2、option3
规格名称：通过 option1_name、option2_name、option3_name 匹配（如 color、size）

2025-11-27 12:13:55 +0800

26 Nov, 2025

1 commit

577ec972 返回给前端的字段、格式适配。主要包括字段配置、前端补充一个语言字段处理title_en title_zh等语言选择、分面信息的提取等 Browse File »

tangwang
2025-11-26 22:35:07 +0800

13 Nov, 2025

1 commit

1f6d15fa 重构：SPU级别索引、统一索引架构和API响应格式优化 ... Browse File »

主要变更：
1. 去掉数据源应用结构配置化，我们只针对店匠的spu sku表设计索引，数据灌入流程是写死的(只是满足测试需求，后面外层应用负责数据全量+增量灌入)。搜索系统主要关注如何适配外部搜索需求
目前有两个数据灌入脚本，一种是之前的，一种是现在的从两个店匠的表sku表+spu表读取并且以spu为单位组织doc。
   - 配置只关注ES搜索相关配置，提高可维护性
   - 创建base配置（店匠通用配置）

2. 索引结构重构（SPU维度）
   - 所有客户共享search_products索引，通过tenant_id隔离
   - 支持嵌套variants字段（SKU变体数组）
   - 创建SPUTransformer用于SPU数据转换

3. API响应格式优化
   - 约定一套搜索结果的格式，而不是直接暴露ES doc的结构(_id _score _source内的字段）
   - 添加ProductResult和VariantResult模型
   - 添加suggestions和related_searches字段 (预留接口，逻辑暂未实现)

4. 数据导入流程
   - 创建店匠数据导入脚本（ingest_shoplazza.py）
   - Pipeline层决定数据源，配置不包含数据源信息
   - 创建测试数据生成和导入脚本

5. 文档更新
   - 更新设计文档，反映新架构
   - 创建BASE_CONFIG_GUIDE.md使用指南

2025-11-13 11:42:27 +0800

12 Nov, 2025

1 commit

6aa246be 问题：Pydantic 应该能自动转换字典到模型，但如果字典结构不完全匹配或验证失败，可能导致字段为空或验证错误被忽略。 Browse File »

tangwang
2025-11-12 11:21:41 +0800

11 Nov, 2025

3 commits

1f071951 补充调试信息，记录包括各个阶段的比如query分析结果检索表达式各阶段耗时 ES搜索的检索表达式 Browse File »

tangwang
2025-11-11 22:39:15 +0800
c86c8237 支持聚合。过滤项补充了逻辑，但是有问题 Browse File »

tangwang
2025-11-11 20:46:04 +0800

16c42787 feat: implement request-scoped context management with structured logging ... Browse File »

## 🎯 Major Features
- Request context management system for complete request visibility
- Structured JSON logging with automatic daily rotation
- Performance monitoring with detailed stage timing breakdowns
- Query analysis result storage and intermediate result tracking
- Error and warning collection with context correlation

## 🔧 Technical Improvements
- **Context Management**: Request-level context with reqid/uid correlation
- **Performance Monitoring**: Automatic timing for all search pipeline stages
- **Structured Logging**: JSON format logs with request context injection
- **Query Enhancement**: Complete query analysis tracking and storage
- **Error Handling**: Enhanced error tracking with context information

## 🐛 Bug Fixes
- Fixed DeepL API endpoint (paid vs free API confusion)
- Fixed vector generation (GPU memory cleanup)
- Fixed logger parameter passing format (reqid/uid handling)
- Fixed translation and embedding functionality

## 🌟 API Improvements
- Simplified API interface (8→5 parameters, 37.5% reduction)
- Made internal functionality transparent to users
- Added performance info to API responses
- Enhanced request correlation and tracking

## 📁 New Infrastructure
- Comprehensive test suite (unit, integration, API tests)
- CI/CD pipeline with automated quality checks
- Performance monitoring and testing tools
- Documentation and example usage guides

## 🔒 Security & Reliability
- Thread-safe context management for concurrent requests
- Automatic log rotation and structured output
- Error isolation with detailed context information
- Complete request lifecycle tracking

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-11 12:29:10 +0800

10 Nov, 2025

1 commit

bb3c5ef8 灌入数据流程跑通 Browse File »

tangwang
2025-11-10 23:11:40 +0800

08 Nov, 2025

1 commit

be52af70 first commit Browse File »

tangwang
2025-11-08 00:07:09 +0800