From 133771994732aa04c41b87cac7bf9e08f9d7d483 Mon Sep 17 00:00:00 2001 From: tangwang Date: Wed, 12 Nov 2025 21:27:07 +0800 Subject: [PATCH] 接口优化 --- .cursor/plans/spu-index-b5a93a00.plan.md | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ SHOPLAZZA_INTEGRATION_GUIDE.md | 3226 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ config/config_loader.py | 7 +++++++ search/es_query_builder.py | 11 ++++++++++- search/multilang_query_builder.py | 13 +++++++++++-- search/searcher.py | 9 ++++++++- test_search_with_source_fields.py | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ test_source_fields.py | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 当前开发进度.md | 536 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 设计文档.md | 536 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 10 files changed, 4379 insertions(+), 540 deletions(-) create mode 100644 .cursor/plans/spu-index-b5a93a00.plan.md create mode 100644 SHOPLAZZA_INTEGRATION_GUIDE.md create mode 100644 test_search_with_source_fields.py create mode 100644 test_source_fields.py delete mode 100644 当前开发进度.md create mode 100644 设计文档.md diff --git a/.cursor/plans/spu-index-b5a93a00.plan.md b/.cursor/plans/spu-index-b5a93a00.plan.md new file mode 100644 index 0000000..693ae80 --- /dev/null +++ b/.cursor/plans/spu-index-b5a93a00.plan.md @@ -0,0 +1,302 @@ + +# SPU-Level Indexing with Shoplazza API Format & BASE Configuration + +## Phase 1: Schema Analysis & Design + +### 1.1 Analyze SPU/SKU Fields for Indexing + +**SPU Fields** (from `shoplazza_product_spu`): + +- **Index + Store**: `id`, `shop_id`, `handle`, `title`, `brief`, `vendor`, `product_type`, `tags`, `category`, `image_src`, `published`, `published_at` +- **Index only**: `seo_title`, `seo_description`, `seo_keywords` +- **Store only**: `description` (HTML, no tokenization) + +**SKU Fields** (from `shoplazza_product_sku`, as nested array): + +- **Index + Store**: `id`, `title`, `sku`, `barcode`, `price`, `compare_at_price`, `option1`, `option2`, `option3`, `inventory_quantity`, `image_src` + +**Price Strategy (CONFIRMED - Option B)**: + +- Flatten `min_price`, `max_price`, `compare_at_price` at SPU level for fast filtering/sorting +- Keep full variant prices in nested array for display + +### 1.2 Design BASE Schema + +**Index**: `search_products` (shared by all tenants) + +**Key fields**: + +- `tenant_id` (KEYWORD, **REQUIRED**) - always filtered, never optional +- SPU-level flattened fields +- `variants` (NESTED array) - SKU data +- Flattened: `min_price`, `max_price`, `compare_at_price` +- Multi-language: `title_zh`, `title_en`, `title_ru`, etc. +- Embeddings: `title_embedding`, `image_embedding` + +## Phase 2: BASE Configuration (Universal Standard) + +### 2.1 Create BASE Config + +**File**: [`config/schema/base/config.yaml`](config/schema/base/config.yaml) + +**This is the universal configuration for ALL merchants using Shoplazza tables.** + +Key points: + +- Index name: `search_products` (shared) +- Required field: `tenant_id` (always filtered in queries) +- SPU-level fields with multi-language support +- Nested variants structure +- Flattened price fields: `min_price`, `max_price`, `compare_at_price` +- Function_score configuration + +### 2.2 Update Field Types & Mapping + +**Files**: [`config/field_types.py`](config/field_types.py), [`indexer/mapping_generator.py`](indexer/mapping_generator.py) + +- Add `NESTED` field type +- Handle nested mapping generation for variants +- Auto-generate flattened price fields + +## Phase 3: Data Ingestion for BASE + +### 3.1 SPU-Level Data Transformer + +**File**: [`indexer/spu_data_transformer.py`](indexer/spu_data_transformer.py) + +Features: + +- Load SPU from `shoplazza_product_spu` +- Join SKU from `shoplazza_product_sku` (grouped by spu_id) +- Create nested variants array +- Calculate `min_price`, `max_price`, `compare_at_price` +- Generate title & image embeddings +- Inject `tenant_id` from config + +### 3.2 Test Data Generator + +**File**: [`scripts/generate_shoplazza_test_data.py`](scripts/generate_shoplazza_test_data.py) + +Generate 100 SPU records with: + +- 10 categories, multiple vendors +- Multi-language (zh/en/ru) +- Price range: $5-$500 +- 1-5 variants per SPU (color, size options) +- Insert into MySQL Shoplazza tables + +### 3.3 BASE Ingestion Script + +**File**: [`scripts/ingest_base.py`](scripts/ingest_base.py) + +- Load from MySQL `shoplazza_product_spu` + `shoplazza_product_sku` +- Use `SPUDataTransformer` +- Index into `search_products` with configured `tenant_id` + +## Phase 4: Query Updates + +### 4.1 Query Builder Enhancements + +**File**: [`search/multilang_query_builder.py`](search/multilang_query_builder.py) + +- **Auto-inject `tenant_id` filter** (from config, always applied) +- Support nested queries for variants +- Use flattened price fields for filters: `min_price`, `max_price` + +### 4.2 Searcher Updates + +**File**: [`search/searcher.py`](search/searcher.py) + +- Enforce `tenant_id` filtering +- Handle nested inner_hits for variants + +## Phase 5: API Response Transformation + +### 5.1 Response Transformer + +**File**: [`api/response_transformer.py`](api/response_transformer.py) + +Transform ES response to Shoplazza format: + +- Extract variants from nested array +- Map fields: `product_id`, `title`, `handle`, `vendor`, `product_type`, `tags`, `price`, `variants`, etc. +- Calculate `in_stock` from variants + +### 5.2 Update API Models + +**File**: [`api/models.py`](api/models.py) + +New models: + +- `VariantOption`, `ProductVariant`, `ProductResult` +- Updated `SearchResponse` with `results: List[ProductResult]` +- Add placeholders: `suggestions: List[str] = []`, `related_searches: List[str] = []` + +### 5.3 Update Search Routes + +**File**: [`api/routes/search.py`](api/routes/search.py) + +- Use `ResponseTransformer` to convert ES hits +- Return new Shoplazza-compatible format +- **Ensure `tenant_id` is required in request** + +## Phase 6: Legacy Migration + +### 6.1 Rename Customer1 to Legacy + +- Rename [`config/schema/customer1/`](config/schema/customer1/) to [`config/schema/customer1_legacy/`](config/schema/customer1_legacy/) +- Update config to use old index `search_customer1` (preserve for backward compatibility) +- Mark as deprecated in comments + +### 6.2 Update Scripts for BASE + +- [`run.sh`](run.sh): Use BASE config, `search_products` index +- [`restart.sh`](restart.sh): Use BASE config +- [`test_all.sh`](test_all.sh): Test BASE config +- Legacy scripts: Rename with `_legacy` suffix (e.g., `run_legacy.sh`) + +### 6.3 Update Frontend + +**Files**: [`frontend/`](frontend/) HTML/JS files + +- Change index name references from `search_customer1` to `search_products` +- Use BASE config endpoints +- Archive old frontend as `frontend_legacy/` if needed + +## Phase 7: API Documentation Updates + +### 7.1 Update API Docs + +**File**: [`API_DOCUMENTATION.md`](API_DOCUMENTATION.md) + +**Critical additions**: + +- **Document `tenant_id` as REQUIRED parameter** in all search requests +- Explain that `tenant_id` filter is always applied +- Update all response examples to new Shoplazza format +- Document `suggestions` and `related_searches` (not yet implemented) +- Add nested variant query examples +- Multi-tenant isolation guarantees + +### 7.2 Update Request Models + +**File**: [`api/models.py`](api/models.py) + +Add `tenant_id` to `SearchRequest`: + +```python +class SearchRequest(BaseModel): + tenant_id: str = Field(..., description="租户ID (必需)") + query: str = Field(...) + # ... other fields +``` + +## Phase 8: Design Documentation + +### 8.1 Update Design Doc + +**File**: [`设计文档.md`](设计文档.md) + +Updates: + +- **索引粒度**: 改为 SPU 维度(非SKU) +- **统一索引**: 所有租户共用 `search_products`,通过 `tenant_id` 隔离 +- **BASE配置**: 说明BASE配置为通用标准,所有新商户使用 +- **API响应格式**: 采用 Shoplazza 标准格式 +- **Price扁平化**: 说明高频字段的性能优化策略 +- **Nested变体**: 详细说明 variants 数组结构 +- **Legacy配置**: customer1等为遗留配置,仅用于兼容 + +### 8.2 Create BASE Guide + +**File**: [`docs/BASE_CONFIG_GUIDE.md`](docs/BASE_CONFIG_GUIDE.md) + +Contents: + +- BASE configuration overview +- How to generate test data +- How to run ingestion for new tenant +- Search examples +- Response format examples +- Multi-tenant isolation + +### 8.3 Create Migration Guide + +**File**: [`docs/MIGRATION_TO_BASE.md`](docs/MIGRATION_TO_BASE.md) + +- Breaking changes from SKU-level to SPU-level +- Response format changes +- How existing deployments should migrate +- Legacy config deprecation timeline + +## Phase 9: Testing + +### 9.1 Create Test Script + +**File**: [`scripts/test_base.sh`](scripts/test_base.sh) + +Steps: + +1. Generate 100 test SPU records +2. Run BASE ingestion with tenant_id="test_tenant" +3. Run searches, verify response format +4. Test faceted search +5. Verify multi-tenant isolation +6. Verify `tenant_id` filtering + +### 9.2 Integration Tests + +**File**: [`tests/test_base_integration.py`](tests/test_base_integration.py) + +- Test SPU-level indexing +- Test nested variant queries +- Test price filtering with flattened fields +- Test tenant_id isolation +- Test response transformation + +## Key Architectural Decisions + +### BASE Configuration Philosophy + +**BASE = Universal Standard**: All new merchants use BASE config with Shoplazza tables. No per-customer schema customization. Customization happens through: + +- Configuration parameters (analyzers, function_score, etc.) +- Extension tables (if needed for additional fields) +- NOT through separate schemas + +### Tenant Isolation + +**tenant_id is SACRED**: + +- Always present in queries (enforced at query builder) +- Never optional +- Guarantees data isolation between tenants +- Documented prominently in API docs + +### Price Flattening Rationale + +High-frequency operations (filtering, sorting) on price require optimal performance. Nested queries add overhead. Solution: Duplicate price data at SPU level (flattened) while maintaining full variant details in nested array. + +### Legacy vs BASE + +- **BASE**: New standard, all future merchants +- **Legacy (customer1_legacy)**: Deprecated, exists only for backward compatibility +- All scripts/frontend default to BASE +- Legacy access requires explicit suffix (`_legacy`) + +### To-dos + +- [ ] Analyze SPU/SKU fields and design unified schema with tenant_id, nested variants, and flattened price fields +- [ ] Create unified schema config for multi-tenant SPU-level indexing +- [ ] Add NESTED field type support to field_types.py and mapping generator +- [ ] Create SPU-level data transformer that joins SPU+SKU tables and creates nested variant array +- [ ] Create script to generate 100 realistic SPU+SKU test records in Shoplazza tables +- [ ] Create customer2 configuration using unified schema and Shoplazza tables only +- [ ] Create customer2 ingestion script that loads from MySQL Shoplazza tables +- [ ] Update query builder to support tenant_id filtering and nested variant queries +- [ ] Create response transformer to convert ES format to Shoplazza-compatible format +- [ ] Update API models with new Shoplazza response format (ProductResult, variants, suggestions, etc.) +- [ ] Update search routes to use response transformer and return new format +- [ ] Migrate customer1 configuration to unified schema and SPU-level indexing +- [ ] Create customer2 guide, update design docs, API docs, and create migration guide +- [ ] Create comprehensive test script for customer2 with data generation, ingestion, and search validation \ No newline at end of file diff --git a/SHOPLAZZA_INTEGRATION_GUIDE.md b/SHOPLAZZA_INTEGRATION_GUIDE.md new file mode 100644 index 0000000..71fffeb --- /dev/null +++ b/SHOPLAZZA_INTEGRATION_GUIDE.md @@ -0,0 +1,3226 @@ +# 店匠平台技术对接指南 + +## 1. 概述 + +### 1.1 店匠平台介绍 + +[店匠(Shoplazza)](https://www.shoplazza.com) 是一个专为跨境电商设计的独立站建站平台,类似于 Shopify。商家可以快速搭建自己的品牌独立站,进行商品销售、订单管理、客户管理等运营。 + +店匠提供了开放的应用生态系统,第三方开发者可以开发应用插件(APP)并发布到店匠应用市场,为商家提供增值服务。 + +**核心特性:** +- 独立站建站和主题装修 +- 商品、订单、客户管理 +- 多语言和多货币支持 +- 开放的 Admin API +- Webhook 事件通知 +- OAuth 2.0 授权机制 + +### 1.2 对接目标 + +本文档旨在帮助开发团队将**搜索 SaaS** 接入店匠生态,作为应用市场的搜索插件上线。 + +**对接目标:** +1. 在店匠应用市场发布搜索 APP +2. 商家可以安装 APP 并授权访问店铺数据 +3. 自动同步商家的商品、订单、客户数据 +4. 提供前端搜索扩展,嵌入商家的店铺主题 +5. 为商家提供智能搜索服务(多语言、语义搜索、AI 搜索) +6. 统计分析商家的搜索行为数据 + +### 1.3 系统架构 + +```mermaid +graph TB + subgraph "店匠平台" + A[店匠应用市场] + B[商家店铺] + C[店匠 Admin API] + D[店匠 Webhook] + end + + subgraph "搜索 SaaS 平台" + E[OAuth 服务] + F[数据同步服务] + G[Webhook 接收服务] + H[搜索 API 服务] + I[管理后台] + J[数据库
MySQL] + K[搜索引擎
Elasticsearch] + end + + subgraph "前端扩展" + L[搜索入口组件] + M[搜索结果页] + end + + A -->|商家安装| E + B -->|OAuth授权| E + E -->|获取Token| F + F -->|调用API| C + C -->|返回数据| F + F -->|存储| J + F -->|索引| K + D -->|推送事件| G + G -->|增量更新| J + G -->|增量索引| K + B -->|装修主题| L + L -->|搜索请求| H + M -->|搜索请求| H + H -->|查询| K + I -->|管理| J +``` + +### 1.4 技术栈要求 + +**后端服务:** +- Java(Spring Boot):OAuth、数据同步、API 网关 +- Python(FastAPI):搜索服务、向量检索 +- MySQL:存储店铺、商品、订单等数据 +- Elasticsearch:商品索引和搜索 + +**前端扩展:** +- Liquid 模板语言(店匠主题) +- JavaScript/TypeScript +- HTML/CSS + +**基础设施:** +- 公网域名(支持 HTTPS) +- SSL 证书 +- 服务器(支持 Docker 部署) + +### 1.5 前置条件 + +在开始对接之前,请确保: + +1. ✅ 已注册店匠 Partner 账号 +2. ✅ 拥有公网域名和 HTTPS 证书 +3. ✅ 已部署搜索 SaaS 后端服务 +4. ✅ 拥有测试店铺(用于开发和调试) +5. ✅ 熟悉 OAuth 2.0 授权流程 +6. ✅ 熟悉 RESTful API 开发 + +--- + +## 2. 开发者准备 + +### 2.1 注册店匠 Partner 账号 + +1. 访问 [店匠合作伙伴中心](https://partners.shoplazza.com) +2. 点击"注册"按钮,填写公司信息 +3. 完成邮箱验证和资质审核 +4. 登录 Partner 后台 + +### 2.2 创建 APP 应用 + +1. 登录 [店匠 Partner 后台](https://partners.shoplazza.com) +2. 在左侧导航栏选择"Apps" +3. 点击"Create App"按钮 +4. 填写 APP 基本信息: + - **App Name**:搜索 SaaS(或自定义名称) + - **App Type**:Public App(公开应用) + - **Category**:Search & Discovery(搜索与发现) + +5. 系统自动生成: + - **Client ID**:应用的唯一标识 + - **Client Secret**:应用密钥(请妥善保管) + +### 2.3 配置 APP 信息 + +在 APP 设置页面,配置以下关键信息: + +#### 2.3.1 OAuth 配置 + +```yaml +Client ID: m8F9PrPnxpyrlz4ONBWRoINsa5xyNT4Qd-Fh_h7o1es +Client Secret: m2cDNrBqAa8TKeridXd4eXnhi9E7pda2gKXet_72rjo +``` + +**重定向 URI(Redirect URI):** +``` +https://your-domain.com/oauth/callback +``` + +**注意事项:** +- Redirect URI 必须使用 HTTPS 协议 +- 必须是公网可访问的地址 +- 开发环境可以使用 ngrok 等工具暴露本地服务 + +#### 2.3.2 应用权限(Scopes) + +根据业务需求,申请以下权限: + +| 权限 Scope | 说明 | 是否必需 | +|------------|------|----------| +| `read_shop` | 读取店铺信息 | ✅ 必需 | +| `write_shop` | 修改店铺信息 | ❌ 可选 | +| `read_product` | 读取商品信息 | ✅ 必需 | +| `write_product` | 修改商品信息 | ❌ 可选 | +| `read_order` | 读取订单信息 | ✅ 必需 | +| `read_customer` | 读取客户信息 | ✅ 必需 | +| `read_app_proxy` | APP 代理访问 | ✅ 必需 | +| `write_cart_transform` | 购物车转换(如需价格调整) | ❌ 可选 | + +**配置示例:** +```go +Scopes: []string{ + "read_shop", + "read_product", + "read_order", + "read_customer", + "read_app_proxy", +} +``` + +#### 2.3.3 Webhook 配置(后续注册) + +Webhook 地址(后续在代码中动态注册): +``` +https://your-domain.com/webhook/shoplazza +``` + +### 2.4 准备测试店铺 + +1. 在店匠平台注册一个测试店铺 +2. 在店铺中添加测试商品、客户、订单数据 +3. 记录店铺域名:`{shop-name}.myshoplaza.com` + +**注意:** 部分功能(如 Webhook 注册)需要店铺激活后才能使用。 + +--- + +## 3. OAuth 2.0 认证实现 + +### 3.1 OAuth 授权流程 + +店匠使用标准的 OAuth 2.0 授权码(Authorization Code)流程: + +```mermaid +sequenceDiagram + participant 商家 + participant 店匠平台 + participant 搜索SaaS + + 商家->>店匠平台: 1. 在应用市场点击"安装" + 店匠平台->>搜索SaaS: 2. 跳转到 APP URI + 搜索SaaS->>店匠平台: 3. 重定向到授权页面 + 店匠平台->>商家: 4. 显示授权确认页 + 商家->>店匠平台: 5. 点击"授权" + 店匠平台->>搜索SaaS: 6. 回调 Redirect URI(带 code) + 搜索SaaS->>店匠平台: 7. 用 code 换取 Access Token + 店匠平台->>搜索SaaS: 8. 返回 Access Token + 搜索SaaS->>搜索SaaS: 9. 保存 Token 到数据库 + 搜索SaaS->>商家: 10. 显示安装成功页面 +``` + +### 3.2 实现步骤 + +#### 3.2.1 配置 OAuth 客户端 + +在应用启动时,初始化 OAuth 配置: + +```go +// OAuth 配置 +type OAuthConfig struct { + ClientID string + ClientSecret string + RedirectURI string + Scopes []string + AuthURL string + TokenURL string +} + +// 初始化配置 +config := &OAuthConfig{ + ClientID: "m8F9PrPnxpyrlz4ONBWRoINsa5xyNT4Qd-Fh_h7o1es", + ClientSecret: "m2cDNrBqAa8TKeridXd4eXnhi9E7pda2gKXet_72rjo", + RedirectURI: "https://your-domain.com/oauth/callback", + Scopes: []string{ + "read_shop", + "read_product", + "read_order", + "read_customer", + "read_app_proxy", + }, + AuthURL: "https://partners.shoplazza.com/partner/oauth/authorize", + TokenURL: "https://partners.shoplazza.com/partner/oauth/token", +} +``` + +#### 3.2.2 处理 APP URI 请求 + +当商家在应用市场点击"安装"时,店匠会跳转到你配置的 APP URI: + +``` +GET https://your-domain.com/oauth/install?shop={shop_domain} +``` + +**处理逻辑:** + +```http +GET /oauth/install +Query Parameters: + - shop: 店铺域名,例如 47167113-1.myshoplaza.com + +Response: + 302 Redirect to Authorization URL +``` + +**生成授权 URL:** + +```go +// 构建授权 URL +func buildAuthURL(config *OAuthConfig, shop string) string { + params := url.Values{} + params.Add("client_id", config.ClientID) + params.Add("redirect_uri", config.RedirectURI) + params.Add("scope", strings.Join(config.Scopes, " ")) + params.Add("state", shop) // 使用 shop 作为 state + + return config.AuthURL + "?" + params.Encode() +} +``` + +**授权 URL 示例:** +``` +https://partners.shoplazza.com/partner/oauth/authorize? + client_id=m8F9PrPnxpyrlz4ONBWRoINsa5xyNT4Qd-Fh_h7o1es + &redirect_uri=https://your-domain.com/oauth/callback + &scope=read_shop read_product read_order read_customer read_app_proxy + &state=47167113-1.myshoplaza.com +``` + +#### 3.2.3 处理授权回调 + +商家授权后,店匠会回调你的 Redirect URI: + +``` +GET https://your-domain.com/oauth/callback?code={auth_code}&shop={shop_domain}&state={state} +``` + +**回调参数:** +- `code`:授权码(用于换取 Access Token) +- `shop`:店铺域名 +- `state`:之前传递的 state 参数 + +**处理逻辑:** + +```http +GET /oauth/callback +Query Parameters: + - code: 授权码 + - shop: 店铺域名 + - state: state 参数 + +Response: + 200 OK (HTML 页面显示安装成功) +``` + +#### 3.2.4 换取 Access Token + +使用授权码换取 Access Token: + +**请求示例(curl):** + +```bash +curl --request POST \ + --url https://partners.shoplazza.com/partner/oauth/token \ + --header 'Content-Type: application/json' \ + --data '{ + "client_id": "m8F9PrPnxpyrlz4ONBWRoINsa5xyNT4Qd-Fh_h7o1es", + "client_secret": "m2cDNrBqAa8TKeridXd4eXnhi9E7pda2gKXet_72rjo", + "code": "{authorization_code}", + "grant_type": "authorization_code", + "redirect_uri": "https://your-domain.com/oauth/callback" + }' +``` + +**响应示例:** + +```json +{ + "access_token": "V2WDYgkTvrN68QCESZ9eHb3EjpR6EBrPyAKe-m_JwYY", + "token_type": "Bearer", + "refresh_token": "-QP6o5YpsqC47q5D2M3xHJ0YP4SPcybhm5oYlPaMUOo", + "expires_in": 31556951, + "created_at": 1740793402, + "store_id": "2286274", + "store_name": "47167113-1", + "expires_at": 1772350354, + "locale": "zh-CN" +} +``` + +**响应字段说明:** +- `access_token`:访问令牌(调用 Admin API 时使用) +- `refresh_token`:刷新令牌(用于刷新 Access Token) +- `expires_in`:过期时间(秒) +- `expires_at`:过期时间戳 +- `store_id`:店铺 ID +- `store_name`:店铺名称 +- `locale`:店铺语言 + +#### 3.2.5 保存 Token 到数据库 + +将 Token 信息保存到数据库(详见第 4 章数据模型): + +```sql +INSERT INTO shoplazza_shop_config ( + store_id, + store_name, + store_domain, + access_token, + refresh_token, + token_expires_at, + locale, + status, + created_at, + updated_at +) VALUES ( + '2286274', + '47167113-1', + '47167113-1.myshoplaza.com', + 'V2WDYgkTvrN68QCESZ9eHb3EjpR6EBrPyAKe-m_JwYY', + '-QP6o5YpsqC47q5D2M3xHJ0YP4SPcybhm5oYlPaMUOo', + '2026-11-02 23:21:14', + 'zh-CN', + 'active', + NOW(), + NOW() +); +``` + +### 3.3 Token 刷新机制 + +Access Token 会过期(通常为 1 年),过期后需要使用 Refresh Token 刷新。 + +**刷新 Token 请求:** + +```bash +curl --request POST \ + --url https://partners.shoplazza.com/partner/oauth/token \ + --header 'Content-Type: application/json' \ + --data '{ + "client_id": "m8F9PrPnxpyrlz4ONBWRoINsa5xyNT4Qd-Fh_h7o1es", + "client_secret": "m2cDNrBqAa8TKeridXd4eXnhi9E7pda2gKXet_72rjo", + "refresh_token": "-QP6o5YpsqC47q5D2M3xHJ0YP4SPcybhm5oYlPaMUOo", + "grant_type": "refresh_token" + }' +``` + +**响应格式与获取 Token 时相同。** + +**刷新策略:** +1. 在 Token 过期前 7 天开始尝试刷新 +2. API 调用返回 401 Unauthorized 时立即刷新 +3. 刷新成功后更新数据库中的 Token 信息 + +### 3.4 安装成功页面 + +OAuth 回调处理完成后,返回一个 HTML 页面告知商家安装成功: + +```html + + + + + 安装成功 - 搜索 SaaS + + + +
+

安装成功!

+

搜索 SaaS 已成功安装到您的店铺

+

店铺名称:{{store_name}}

+ +
+

下一步操作:

+
    +
  1. 进入店铺后台 → 主题装修
  2. +
  3. 点击"添加卡片" → 选择"APPS" → 找到"搜索 SaaS"
  4. +
  5. 拖拽搜索组件到页面中
  6. +
  7. 保存并发布主题
  8. +
+
+ + 前往店铺后台 + + +``` + +--- + +## 4. 租户和店铺管理 + +### 4.1 数据模型设计 + +#### 4.1.1 租户表(system_tenant) + +每个店铺在 SaaS 平台都是一个独立的租户。 + +```sql +CREATE TABLE `system_tenant` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '租户ID', + `name` VARCHAR(255) NOT NULL COMMENT '租户名称', + `package_id` BIGINT DEFAULT NULL COMMENT '套餐ID', + `status` TINYINT NOT NULL DEFAULT 1 COMMENT '状态:0-禁用,1-启用', + `expire_time` DATETIME DEFAULT NULL COMMENT '过期时间', + `account_count` INT DEFAULT 0 COMMENT '账号数量', + `creator` VARCHAR(64) DEFAULT '' COMMENT '创建者', + `create_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updater` VARCHAR(64) DEFAULT '' COMMENT '更新者', + `update_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + KEY `idx_name` (`name`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='租户表'; +``` + +#### 4.1.2 店铺配置表(shoplazza_shop_config) + +存储店铺的基本信息和 OAuth Token。 + +```sql +CREATE TABLE `shoplazza_shop_config` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店匠店铺ID', + `store_name` VARCHAR(255) NOT NULL COMMENT '店铺名称', + `store_domain` VARCHAR(255) NOT NULL COMMENT '店铺域名', + `access_token` VARCHAR(512) NOT NULL COMMENT 'Access Token', + `refresh_token` VARCHAR(512) DEFAULT NULL COMMENT 'Refresh Token', + `token_expires_at` DATETIME NOT NULL COMMENT 'Token过期时间', + `locale` VARCHAR(16) DEFAULT 'zh-CN' COMMENT '店铺语言', + `currency` VARCHAR(16) DEFAULT 'USD' COMMENT '店铺货币', + `timezone` VARCHAR(64) DEFAULT 'Asia/Shanghai' COMMENT '店铺时区', + `status` VARCHAR(32) NOT NULL DEFAULT 'active' COMMENT '状态:active-激活,inactive-未激活,suspended-暂停', + `install_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '安装时间', + `last_sync_time` DATETIME DEFAULT NULL COMMENT '最后同步时间', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + UNIQUE KEY `uk_store_id` (`store_id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_store_domain` (`store_domain`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠店铺配置表'; +``` + +### 4.2 Token 管理策略 + +#### 4.2.1 Token 存储 + +- ✅ 加密存储 Access Token 和 Refresh Token +- ✅ 记录 Token 过期时间 +- ✅ 记录最后刷新时间 + +#### 4.2.2 Token 自动刷新 + +```java +public class TokenRefreshService { + + /** + * 检查并刷新即将过期的 Token + */ + @Scheduled(cron = "0 0 2 * * ?") // 每天凌晨2点执行 + public void refreshExpiringTokens() { + // 查询7天内过期的 Token + DateTime sevenDaysLater = DateTime.now().plusDays(7); + List shops = shopConfigMapper.selectExpiringTokens(sevenDaysLater); + + for (ShopConfig shop : shops) { + try { + // 刷新 Token + TokenResponse newToken = oauthClient.refreshToken(shop.getRefreshToken()); + + // 更新数据库 + shop.setAccessToken(newToken.getAccessToken()); + shop.setRefreshToken(newToken.getRefreshToken()); + shop.setTokenExpiresAt(newToken.getExpiresAt()); + shopConfigMapper.updateById(shop); + + log.info("Token refreshed for shop: {}", shop.getStoreName()); + } catch (Exception e) { + log.error("Failed to refresh token for shop: {}", shop.getStoreName(), e); + // 可选:发送告警通知 + } + } + } + + /** + * API 调用时检查 Token 是否过期 + */ + public String getValidAccessToken(String storeId) { + ShopConfig shop = shopConfigMapper.selectByStoreId(storeId); + + if (shop == null) { + throw new BusinessException("Shop not found: " + storeId); + } + + // 检查是否即将过期(提前1小时) + if (shop.getTokenExpiresAt().isBefore(DateTime.now().plusHours(1))) { + // 刷新 Token + TokenResponse newToken = oauthClient.refreshToken(shop.getRefreshToken()); + + // 更新数据库 + shop.setAccessToken(newToken.getAccessToken()); + shop.setRefreshToken(newToken.getRefreshToken()); + shop.setTokenExpiresAt(newToken.getExpiresAt()); + shopConfigMapper.updateById(shop); + } + + return shop.getAccessToken(); + } +} +``` + +### 4.3 租户创建流程 + +当商家完成 OAuth 授权后,自动创建租户和店铺配置: + +```java +@Transactional +public void handleOAuthCallback(TokenResponse tokenResponse) { + String storeId = tokenResponse.getStoreId(); + String storeName = tokenResponse.getStoreName(); + + // 1. 检查租户是否已存在 + Tenant tenant = tenantMapper.selectByStoreId(storeId); + if (tenant == null) { + // 创建新租户 + tenant = new Tenant(); + tenant.setName(storeName); + tenant.setStatus(1); // 启用 + tenant.setPackageId(1L); // 默认套餐 + tenantMapper.insert(tenant); + } + + // 2. 创建或更新店铺配置 + ShopConfig shop = shopConfigMapper.selectByStoreId(storeId); + if (shop == null) { + shop = new ShopConfig(); + shop.setTenantId(tenant.getId()); + shop.setStoreId(storeId); + shop.setStoreName(storeName); + } + + // 更新 Token 信息 + shop.setAccessToken(tokenResponse.getAccessToken()); + shop.setRefreshToken(tokenResponse.getRefreshToken()); + shop.setTokenExpiresAt(tokenResponse.getExpiresAt()); + shop.setLocale(tokenResponse.getLocale()); + shop.setStatus("active"); + shop.setInstallTime(new Date()); + + if (shop.getId() == null) { + shopConfigMapper.insert(shop); + } else { + shopConfigMapper.updateById(shop); + } + + // 3. 触发首次数据同步 + dataSyncService.syncAllData(shop.getId()); +} +``` + +--- + +## 5. 店匠 Admin API 调用 + +### 5.1 API 认证方式 + +调用店匠 Admin API 时,需要在请求头中携带 Access Token: + +```http +GET /openapi/2022-01/products +Host: {shop-domain}.myshoplaza.com +access-token: V2WDYgkTvrN68QCESZ9eHb3EjpR6EBrPyAKe-m_JwYY +Content-Type: application/json +``` + +**注意:** 请求头字段名是 `access-token`,不是 `Authorization`。 + +### 5.2 API 端点基础 URL + +店匠 API 的基础 URL 格式: + +``` +https://{shop-domain}.myshoplaza.com/openapi/{version}/{resource} +``` + +**参数说明:** +- `{shop-domain}`:店铺域名,例如 `47167113-1` +- `{version}`:API 版本,目前为 `2022-01` +- `{resource}`:资源路径,例如 `products`、`orders` + +**示例:** +``` +https://47167113-1.myshoplaza.com/openapi/2022-01/products +``` + +### 5.3 常用 API 端点 + +#### 5.3.1 店铺信息 + +```bash +# 获取店铺详情 +GET /openapi/2022-01/shop +``` + +#### 5.3.2 商品管理 + +```bash +# 获取商品列表 +GET /openapi/2022-01/products?page=1&limit=50 + +# 获取商品详情 +GET /openapi/2022-01/products/{product_id} + +# 获取商品总数 +GET /openapi/2022-01/products/count +``` + +#### 5.3.3 订单管理 + +```bash +# 获取订单列表 +GET /openapi/2022-01/orders?page=1&limit=50 + +# 获取订单详情 +GET /openapi/2022-01/orders/{order_id} + +# 获取订单总数 +GET /openapi/2022-01/orders/count +``` + +#### 5.3.4 客户管理 + +```bash +# 获取客户列表 +GET /openapi/2022-01/customers?page=1&limit=50 + +# 获取客户详情 +GET /openapi/2022-01/customers/{customer_id} + +# 获取客户总数 +GET /openapi/2022-01/customers/count +``` + +### 5.4 请求和响应格式 + +#### 5.4.1 分页查询 + +店匠 API 使用基于页码的分页: + +```http +GET /openapi/2022-01/products?page=1&limit=50&status=active +``` + +**分页参数:** +- `page`:页码,从 1 开始 +- `limit`:每页数量,最大 250 + +**响应格式:** +```json +{ + "products": [ + { + "id": "123456", + "title": "Product Name", + "variants": [...], + ... + } + ] +} +``` + +#### 5.4.2 错误响应 + +API 调用失败时返回错误信息: + +```json +{ + "error": "Unauthorized", + "error_description": "Invalid access token" +} +``` + +**常见错误码:** +- `400 Bad Request`:请求参数错误 +- `401 Unauthorized`:Token 无效或过期 +- `403 Forbidden`:权限不足 +- `404 Not Found`:资源不存在 +- `429 Too Many Requests`:触发速率限制 +- `500 Internal Server Error`:服务器错误 + +### 5.5 错误处理和重试策略 + +```java +public class ShoplazzaApiClient { + + private static final int MAX_RETRIES = 3; + private static final int RETRY_DELAY_MS = 1000; + + /** + * 调用 API 并处理错误 + */ + public T callApi(String storeId, String endpoint, Class responseType) { + int retries = 0; + Exception lastException = null; + + while (retries < MAX_RETRIES) { + try { + // 获取有效的 Access Token + String accessToken = tokenManager.getValidAccessToken(storeId); + + // 构建请求 + HttpHeaders headers = new HttpHeaders(); + headers.set("access-token", accessToken); + headers.setContentType(MediaType.APPLICATION_JSON); + + HttpEntity entity = new HttpEntity<>(headers); + + // 发送请求 + ResponseEntity response = restTemplate.exchange( + endpoint, + HttpMethod.GET, + entity, + responseType + ); + + return response.getBody(); + + } catch (HttpClientErrorException e) { + if (e.getStatusCode() == HttpStatus.UNAUTHORIZED) { + // Token 过期,刷新后重试 + tokenManager.forceRefreshToken(storeId); + retries++; + continue; + } else if (e.getStatusCode() == HttpStatus.TOO_MANY_REQUESTS) { + // 触发速率限制,等待后重试 + sleep(RETRY_DELAY_MS * (retries + 1)); + retries++; + continue; + } else { + throw new BusinessException("API call failed: " + e.getMessage()); + } + } catch (Exception e) { + lastException = e; + retries++; + sleep(RETRY_DELAY_MS); + } + } + + throw new BusinessException("API call failed after retries", lastException); + } + + private void sleep(long millis) { + try { + Thread.sleep(millis); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + } +} +``` + +### 5.6 速率限制处理 + +店匠 API 有速率限制(Rate Limit),需要遵守以下规则: + +**限制说明:** +- 每个店铺每秒最多 10 个请求 +- 响应头中包含速率限制信息 + +**响应头示例:** +``` +X-RateLimit-Limit: 10 +X-RateLimit-Remaining: 8 +X-RateLimit-Reset: 1699800060 +``` + +**处理策略:** +1. 解析响应头中的速率限制信息 +2. 如果 `X-RateLimit-Remaining` 为 0,等待到 `X-RateLimit-Reset` 时间 +3. 收到 429 错误时,使用指数退避重试 + +--- + +## 6. 数据同步实现 + +### 6.1 商品数据同步 + +#### 6.1.1 API 调用 + +**获取商品列表:** + +```bash +curl --request GET \ + --url 'https://47167113-1.myshoplaza.com/openapi/2022-01/products?page=1&limit=50' \ + --header 'access-token: V2WDYgkTvrN68QCESZ9eHb3EjpR6EBrPyAKe-m_JwYY' \ + --header 'accept: application/json' +``` + +**响应示例:** + +```json +{ + "products": [ + { + "id": "193817395", + "title": "蓝牙耳机", + "body_html": "

高品质蓝牙耳机

", + "vendor": "Sony", + "product_type": "Electronics", + "handle": "bluetooth-headphone", + "published_at": "2024-01-15T10:00:00Z", + "created_at": "2024-01-15T09:00:00Z", + "updated_at": "2024-01-20T14:30:00Z", + "status": "active", + "tags": "electronics, audio, bluetooth", + "variants": [ + { + "id": "819403847", + "product_id": "193817395", + "title": "Black / Standard", + "price": "99.99", + "compare_at_price": "129.99", + "sku": "BT-HP-001", + "inventory_quantity": 100, + "weight": "0.25", + "weight_unit": "kg", + "requires_shipping": true, + "option1": "Black", + "option2": "Standard", + "option3": null + } + ], + "images": [ + { + "id": "638746512", + "product_id": "193817395", + "src": "https://cdn.shoplazza.com/image1.jpg", + "position": 1, + "width": 800, + "height": 800 + } + ], + "options": [ + { + "id": "123456", + "name": "Color", + "values": ["Black", "White", "Blue"] + }, + { + "id": "123457", + "name": "Size", + "values": ["Standard"] + } + ] + } + ] +} +``` + +#### 6.1.2 数据表设计 + +**SPU 表(shoplazza_product_spu):** + +```sql +CREATE TABLE `shoplazza_product_spu` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店铺ID', + `product_id` VARCHAR(64) NOT NULL COMMENT '店匠商品ID', + `title` VARCHAR(512) NOT NULL COMMENT '商品标题', + `body_html` TEXT COMMENT '商品描述HTML', + `vendor` VARCHAR(255) DEFAULT NULL COMMENT '供应商/品牌', + `product_type` VARCHAR(255) DEFAULT NULL COMMENT '商品类型', + `handle` VARCHAR(255) DEFAULT NULL COMMENT '商品URL handle', + `tags` VARCHAR(1024) DEFAULT NULL COMMENT '标签(逗号分隔)', + `status` VARCHAR(32) DEFAULT 'active' COMMENT '状态:active, draft, archived', + `published_at` DATETIME DEFAULT NULL COMMENT '发布时间', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + UNIQUE KEY `uk_store_product` (`store_id`, `product_id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_product_type` (`product_type`), + KEY `idx_vendor` (`vendor`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠商品SPU表'; +``` + +**SKU 表(shoplazza_product_sku):** + +```sql +CREATE TABLE `shoplazza_product_sku` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店铺ID', + `product_id` VARCHAR(64) NOT NULL COMMENT '店匠商品ID', + `variant_id` VARCHAR(64) NOT NULL COMMENT '店匠变体ID', + `sku` VARCHAR(255) DEFAULT NULL COMMENT 'SKU编码', + `title` VARCHAR(512) NOT NULL COMMENT '变体标题', + `price` DECIMAL(12,2) NOT NULL COMMENT '价格', + `compare_at_price` DECIMAL(12,2) DEFAULT NULL COMMENT '对比价格', + `inventory_quantity` INT DEFAULT 0 COMMENT '库存数量', + `weight` DECIMAL(10,3) DEFAULT NULL COMMENT '重量', + `weight_unit` VARCHAR(16) DEFAULT NULL COMMENT '重量单位', + `option1` VARCHAR(255) DEFAULT NULL COMMENT '选项1值', + `option2` VARCHAR(255) DEFAULT NULL COMMENT '选项2值', + `option3` VARCHAR(255) DEFAULT NULL COMMENT '选项3值', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + UNIQUE KEY `uk_store_variant` (`store_id`, `variant_id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_product_id` (`product_id`), + KEY `idx_sku` (`sku`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠商品SKU表'; +``` + +**图片表(shoplazza_product_image):** + +```sql +CREATE TABLE `shoplazza_product_image` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店铺ID', + `product_id` VARCHAR(64) NOT NULL COMMENT '店匠商品ID', + `image_id` VARCHAR(64) NOT NULL COMMENT '店匠图片ID', + `src` VARCHAR(1024) NOT NULL COMMENT '图片URL', + `position` INT DEFAULT 1 COMMENT '排序位置', + `width` INT DEFAULT NULL COMMENT '图片宽度', + `height` INT DEFAULT NULL COMMENT '图片高度', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + UNIQUE KEY `uk_store_image` (`store_id`, `image_id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_product_id` (`product_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠商品图片表'; +``` + +#### 6.1.3 同步逻辑实现 + +```java +@Service +public class ProductSyncService { + + @Autowired + private ShoplazzaApiClient apiClient; + + @Autowired + private ProductSpuMapper spuMapper; + + @Autowired + private ProductSkuMapper skuMapper; + + @Autowired + private ProductImageMapper imageMapper; + + /** + * 同步单个店铺的所有商品 + */ + public void syncProducts(Long shopConfigId) { + ShopConfig shop = shopConfigMapper.selectById(shopConfigId); + if (shop == null) { + throw new BusinessException("Shop not found"); + } + + int page = 1; + int limit = 50; + boolean hasMore = true; + int totalSynced = 0; + + while (hasMore) { + try { + // 调用 API 获取商品列表 + String endpoint = String.format( + "https://%s.myshoplaza.com/openapi/2022-01/products?page=%d&limit=%d", + shop.getStoreDomain().split("\\.")[0], + page, + limit + ); + + ProductListResponse response = apiClient.callApi( + shop.getStoreId(), + endpoint, + ProductListResponse.class + ); + + if (response.getProducts() == null || response.getProducts().isEmpty()) { + hasMore = false; + break; + } + + // 保存商品数据 + for (ProductDto product : response.getProducts()) { + saveProduct(shop.getTenantId(), shop.getStoreId(), product); + totalSynced++; + } + + log.info("Synced page {} for shop {}, total: {}", page, shop.getStoreName(), totalSynced); + + // 下一页 + page++; + + // 避免触发速率限制 + Thread.sleep(100); + + } catch (Exception e) { + log.error("Failed to sync products for shop: {}", shop.getStoreName(), e); + throw new BusinessException("Product sync failed", e); + } + } + + // 更新最后同步时间 + shop.setLastSyncTime(new Date()); + shopConfigMapper.updateById(shop); + + log.info("Product sync completed for shop: {}, total synced: {}", shop.getStoreName(), totalSynced); + } + + /** + * 保存单个商品及其SKU和图片 + */ + @Transactional + private void saveProduct(Long tenantId, String storeId, ProductDto product) { + // 1. 保存 SPU + ProductSpu spu = spuMapper.selectByStoreAndProductId(storeId, product.getId()); + if (spu == null) { + spu = new ProductSpu(); + spu.setTenantId(tenantId); + spu.setStoreId(storeId); + spu.setProductId(product.getId()); + } + + spu.setTitle(product.getTitle()); + spu.setBodyHtml(product.getBodyHtml()); + spu.setVendor(product.getVendor()); + spu.setProductType(product.getProductType()); + spu.setHandle(product.getHandle()); + spu.setTags(product.getTags()); + spu.setStatus(product.getStatus()); + spu.setPublishedAt(product.getPublishedAt()); + + if (spu.getId() == null) { + spuMapper.insert(spu); + } else { + spuMapper.updateById(spu); + } + + // 2. 保存 SKU + if (product.getVariants() != null) { + for (VariantDto variant : product.getVariants()) { + ProductSku sku = skuMapper.selectByStoreAndVariantId(storeId, variant.getId()); + if (sku == null) { + sku = new ProductSku(); + sku.setTenantId(tenantId); + sku.setStoreId(storeId); + sku.setProductId(product.getId()); + sku.setVariantId(variant.getId()); + } + + sku.setSku(variant.getSku()); + sku.setTitle(variant.getTitle()); + sku.setPrice(new BigDecimal(variant.getPrice())); + sku.setCompareAtPrice(variant.getCompareAtPrice() != null ? + new BigDecimal(variant.getCompareAtPrice()) : null); + sku.setInventoryQuantity(variant.getInventoryQuantity()); + sku.setWeight(variant.getWeight()); + sku.setWeightUnit(variant.getWeightUnit()); + sku.setOption1(variant.getOption1()); + sku.setOption2(variant.getOption2()); + sku.setOption3(variant.getOption3()); + + if (sku.getId() == null) { + skuMapper.insert(sku); + } else { + skuMapper.updateById(sku); + } + } + } + + // 3. 保存图片 + if (product.getImages() != null) { + for (ImageDto image : product.getImages()) { + ProductImage img = imageMapper.selectByStoreAndImageId(storeId, image.getId()); + if (img == null) { + img = new ProductImage(); + img.setTenantId(tenantId); + img.setStoreId(storeId); + img.setProductId(product.getId()); + img.setImageId(image.getId()); + } + + img.setSrc(image.getSrc()); + img.setPosition(image.getPosition()); + img.setWidth(image.getWidth()); + img.setHeight(image.getHeight()); + + if (img.getId() == null) { + imageMapper.insert(img); + } else { + imageMapper.updateById(img); + } + } + } + } +} +``` + +### 6.2 客户数据同步 + +#### 6.2.1 数据表设计 + +**客户表(shoplazza_customer):** + +```sql +CREATE TABLE `shoplazza_customer` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店铺ID', + `customer_id` VARCHAR(64) NOT NULL COMMENT '店匠客户ID', + `email` VARCHAR(255) DEFAULT NULL COMMENT '邮箱', + `phone` VARCHAR(64) DEFAULT NULL COMMENT '电话', + `first_name` VARCHAR(128) DEFAULT NULL COMMENT '名', + `last_name` VARCHAR(128) DEFAULT NULL COMMENT '姓', + `orders_count` INT DEFAULT 0 COMMENT '订单数量', + `total_spent` DECIMAL(12,2) DEFAULT 0.00 COMMENT '累计消费', + `state` VARCHAR(32) DEFAULT NULL COMMENT '状态', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + UNIQUE KEY `uk_store_customer` (`store_id`, `customer_id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_email` (`email`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠客户表'; +``` + +**客户地址表(shoplazza_customer_address):** + +```sql +CREATE TABLE `shoplazza_customer_address` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店铺ID', + `customer_id` VARCHAR(64) NOT NULL COMMENT '店匠客户ID', + `address_id` VARCHAR(64) NOT NULL COMMENT '店匠地址ID', + `first_name` VARCHAR(128) DEFAULT NULL COMMENT '名', + `last_name` VARCHAR(128) DEFAULT NULL COMMENT '姓', + `address1` VARCHAR(512) DEFAULT NULL COMMENT '地址行1', + `address2` VARCHAR(512) DEFAULT NULL COMMENT '地址行2', + `city` VARCHAR(128) DEFAULT NULL COMMENT '城市', + `province` VARCHAR(128) DEFAULT NULL COMMENT '省份', + `country` VARCHAR(128) DEFAULT NULL COMMENT '国家', + `zip` VARCHAR(32) DEFAULT NULL COMMENT '邮编', + `phone` VARCHAR(64) DEFAULT NULL COMMENT '电话', + `is_default` BIT(1) DEFAULT b'0' COMMENT '是否默认地址', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + UNIQUE KEY `uk_store_address` (`store_id`, `address_id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_customer_id` (`customer_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠客户地址表'; +``` + +#### 6.2.2 API 调用示例 + +```bash +curl --request GET \ + --url 'https://47167113-1.myshoplaza.com/openapi/2022-01/customers?page=1&limit=50' \ + --header 'access-token: V2WDYgkTvrN68QCESZ9eHb3EjpR6EBrPyAKe-m_JwYY' \ + --header 'accept: application/json' +``` + +### 6.3 订单数据同步 + +#### 6.3.1 数据表设计 + +**订单表(shoplazza_order):** + +```sql +CREATE TABLE `shoplazza_order` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店铺ID', + `order_id` VARCHAR(64) NOT NULL COMMENT '店匠订单ID', + `order_number` VARCHAR(128) NOT NULL COMMENT '订单号', + `customer_id` VARCHAR(64) DEFAULT NULL COMMENT '客户ID', + `email` VARCHAR(255) DEFAULT NULL COMMENT '客户邮箱', + `total_price` DECIMAL(12,2) NOT NULL COMMENT '订单总价', + `subtotal_price` DECIMAL(12,2) DEFAULT NULL COMMENT '小计', + `total_tax` DECIMAL(12,2) DEFAULT NULL COMMENT '税费', + `total_shipping` DECIMAL(12,2) DEFAULT NULL COMMENT '运费', + `currency` VARCHAR(16) DEFAULT 'USD' COMMENT '货币', + `financial_status` VARCHAR(32) DEFAULT NULL COMMENT '支付状态', + `fulfillment_status` VARCHAR(32) DEFAULT NULL COMMENT '配送状态', + `order_status` VARCHAR(32) DEFAULT NULL COMMENT '订单状态', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + UNIQUE KEY `uk_store_order` (`store_id`, `order_id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_customer_id` (`customer_id`), + KEY `idx_order_number` (`order_number`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠订单表'; +``` + +**订单明细表(shoplazza_order_item):** + +```sql +CREATE TABLE `shoplazza_order_item` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店铺ID', + `order_id` VARCHAR(64) NOT NULL COMMENT '店匠订单ID', + `line_item_id` VARCHAR(64) NOT NULL COMMENT '店匠明细ID', + `product_id` VARCHAR(64) DEFAULT NULL COMMENT '商品ID', + `variant_id` VARCHAR(64) DEFAULT NULL COMMENT '变体ID', + `sku` VARCHAR(255) DEFAULT NULL COMMENT 'SKU', + `title` VARCHAR(512) DEFAULT NULL COMMENT '商品标题', + `quantity` INT NOT NULL COMMENT '数量', + `price` DECIMAL(12,2) NOT NULL COMMENT '单价', + `total_price` DECIMAL(12,2) NOT NULL COMMENT '总价', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', + `deleted` BIT(1) NOT NULL DEFAULT b'0' COMMENT '是否删除', + PRIMARY KEY (`id`), + UNIQUE KEY `uk_store_line_item` (`store_id`, `line_item_id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_order_id` (`order_id`), + KEY `idx_product_id` (`product_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠订单明细表'; +``` + +### 6.4 同步调度策略 + +#### 6.4.1 首次全量同步 + +商家安装 APP 后,触发首次全量同步: + +```java +@Service +public class DataSyncService { + + @Async + public void syncAllData(Long shopConfigId) { + log.info("Starting full data sync for shop: {}", shopConfigId); + + try { + // 1. 同步商品(优先级最高) + productSyncService.syncProducts(shopConfigId); + + // 2. 同步客户 + customerSyncService.syncCustomers(shopConfigId); + + // 3. 同步订单 + orderSyncService.syncOrders(shopConfigId); + + // 4. 注册 Webhook + webhookService.registerWebhooks(shopConfigId); + + // 5. 索引商品到 ES + esIndexService.indexProducts(shopConfigId); + + log.info("Full data sync completed for shop: {}", shopConfigId); + + } catch (Exception e) { + log.error("Full data sync failed for shop: {}", shopConfigId, e); + // 可选:发送告警通知 + } + } +} +``` + +#### 6.4.2 定时增量同步 + +配置定时任务,定期同步数据: + +```java +@Component +public class ScheduledSyncTask { + + @Autowired + private DataSyncService dataSyncService; + + @Autowired + private ShopConfigMapper shopConfigMapper; + + /** + * 每小时同步一次商品数据 + */ + @Scheduled(cron = "0 0 * * * ?") + public void syncProductsHourly() { + List activeShops = shopConfigMapper.selectActiveShops(); + + for (ShopConfig shop : activeShops) { + try { + productSyncService.syncProducts(shop.getId()); + } catch (Exception e) { + log.error("Scheduled product sync failed for shop: {}", shop.getStoreName(), e); + } + } + } + + /** + * 每天同步一次客户和订单数据 + */ + @Scheduled(cron = "0 0 3 * * ?") + public void syncCustomersAndOrdersDaily() { + List activeShops = shopConfigMapper.selectActiveShops(); + + for (ShopConfig shop : activeShops) { + try { + customerSyncService.syncCustomers(shop.getId()); + orderSyncService.syncOrders(shop.getId()); + } catch (Exception e) { + log.error("Scheduled sync failed for shop: {}", shop.getStoreName(), e); + } + } + } +} +``` + +#### 6.4.3 失败重试机制 + +使用 Spring Retry 实现失败重试: + +```java +@Service +public class RobustSyncService { + + @Retryable( + value = {ApiException.class, HttpClientErrorException.class}, + maxAttempts = 3, + backoff = @Backoff(delay = 2000, multiplier = 2) + ) + public void syncWithRetry(Long shopConfigId, String syncType) { + switch (syncType) { + case "products": + productSyncService.syncProducts(shopConfigId); + break; + case "customers": + customerSyncService.syncCustomers(shopConfigId); + break; + case "orders": + orderSyncService.syncOrders(shopConfigId); + break; + default: + throw new IllegalArgumentException("Unknown sync type: " + syncType); + } + } + + @Recover + public void recoverFromSyncFailure(Exception e, Long shopConfigId, String syncType) { + log.error("Sync failed after retries: shop={}, type={}", shopConfigId, syncType, e); + // 记录失败日志,发送告警 + alertService.sendAlert("Data sync failed", + String.format("Shop: %d, Type: %s, Error: %s", shopConfigId, syncType, e.getMessage())); + } +} +``` + +--- + +## 7. Webhook 集成 + +### 7.1 Webhook 概述 + +Webhook 是店匠平台的事件通知机制,当店铺发生特定事件(如商品更新、订单创建)时,店匠会主动向你注册的 Webhook 地址发送 HTTP POST 请求,实现实时数据同步。 + +**优势:** +- ✅ 实时性:事件发生后立即通知 +- ✅ 减少 API 调用:避免频繁轮询 +- ✅ 精准更新:只更新变化的数据 + +### 7.2 支持的 Webhook Topic + +店匠支持以下 Webhook 事件类型: + +#### 7.2.1 商品相关 + +| Topic | 说明 | 触发时机 | +|-------|------|----------| +| `products/create` | 商品创建 | 商家创建新商品时 | +| `products/update` | 商品更新 | 商家修改商品信息时 | +| `products/delete` | 商品删除 | 商家删除商品时 | + +#### 7.2.2 订单相关 + +| Topic | 说明 | 触发时机 | +|-------|------|----------| +| `orders/create` | 订单创建 | 买家下单时 | +| `orders/updated` | 订单更新 | 订单状态变化时 | +| `orders/paid` | 订单支付 | 订单支付成功时 | +| `orders/cancelled` | 订单取消 | 订单被取消时 | + +#### 7.2.3 客户相关 + +| Topic | 说明 | 触发时机 | +|-------|------|----------| +| `customers/create` | 客户创建 | 新客户注册时 | +| `customers/update` | 客户更新 | 客户信息更新时 | +| `customers/delete` | 客户删除 | 客户被删除时 | + +### 7.3 注册 Webhook + +#### 7.3.1 API 调用 + +店铺激活后,自动注册所需的 Webhook: + +```bash +curl --request POST \ + --url 'https://47167113-1.myshoplaza.com/openapi/2022-01/webhooks' \ + --header 'access-token: V2WDYgkTvrN68QCESZ9eHb3EjpR6EBrPyAKe-m_JwYY' \ + --header 'accept: application/json' \ + --header 'content-type: application/json' \ + --data '{ + "address": "https://your-domain.com/webhook/shoplazza", + "topic": "products/update" + }' +``` + +**响应示例:** + +```json +{ + "webhook": { + "id": "123456", + "address": "https://your-domain.com/webhook/shoplazza", + "topic": "products/update", + "created_at": "2024-01-15T10:00:00Z", + "updated_at": "2024-01-15T10:00:00Z" + } +} +``` + +#### 7.3.2 批量注册实现 + +```java +@Service +public class WebhookService { + + private static final List WEBHOOK_TOPICS = Arrays.asList( + "products/create", + "products/update", + "products/delete", + "orders/create", + "orders/updated", + "orders/paid", + "customers/create", + "customers/update" + ); + + /** + * 为店铺注册所有 Webhook + */ + public void registerWebhooks(Long shopConfigId) { + ShopConfig shop = shopConfigMapper.selectById(shopConfigId); + if (shop == null) { + throw new BusinessException("Shop not found"); + } + + String webhookUrl = buildWebhookUrl(shop.getStoreId()); + + for (String topic : WEBHOOK_TOPICS) { + try { + registerSingleWebhook(shop, webhookUrl, topic); + log.info("Registered webhook for shop: {}, topic: {}", shop.getStoreName(), topic); + } catch (Exception e) { + log.error("Failed to register webhook: shop={}, topic={}", shop.getStoreName(), topic, e); + // 继续注册其他 Webhook + } + } + } + + private void registerSingleWebhook(ShopConfig shop, String webhookUrl, String topic) { + String endpoint = String.format( + "https://%s/openapi/2022-01/webhooks", + shop.getStoreDomain() + ); + + WebhookRequest request = new WebhookRequest(); + request.setAddress(webhookUrl); + request.setTopic(topic); + + apiClient.post(shop.getStoreId(), endpoint, request, WebhookResponse.class); + } + + private String buildWebhookUrl(String storeId) { + return String.format("%s/webhook/shoplazza/%s", + appConfig.getBaseUrl(), + storeId); + } +} +``` + +### 7.4 接收和处理 Webhook + +#### 7.4.1 Webhook 请求格式 + +店匠发送的 Webhook 请求格式: + +```http +POST /webhook/shoplazza/{store_id} +Content-Type: application/json +X-Shoplazza-Hmac-Sha256: {signature} +X-Shoplazza-Topic: products/update +X-Shoplazza-Shop-Domain: 47167113-1.myshoplaza.com + +{ + "id": "193817395", + "title": "蓝牙耳机", + "variants": [...], + "images": [...], + ... +} +``` + +**请求头说明:** +- `X-Shoplazza-Hmac-Sha256`:HMAC-SHA256 签名(用于验证请求真实性) +- `X-Shoplazza-Topic`:事件类型 +- `X-Shoplazza-Shop-Domain`:店铺域名 + +#### 7.4.2 签名验证 + +为了确保 Webhook 请求来自店匠平台,需要验证签名: + +```java +@RestController +@RequestMapping("/webhook/shoplazza") +public class WebhookController { + + @Autowired + private WebhookService webhookService; + + @PostMapping("/{storeId}") + public ResponseEntity handleWebhook( + @PathVariable String storeId, + @RequestHeader("X-Shoplazza-Hmac-Sha256") String signature, + @RequestHeader("X-Shoplazza-Topic") String topic, + @RequestHeader("X-Shoplazza-Shop-Domain") String shopDomain, + @RequestBody String payload) { + + try { + // 1. 验证签名 + if (!webhookService.verifySignature(storeId, payload, signature)) { + log.warn("Invalid webhook signature: store={}, topic={}", storeId, topic); + return ResponseEntity.status(HttpStatus.UNAUTHORIZED).body("Invalid signature"); + } + + // 2. 处理事件(异步) + webhookService.processWebhookAsync(storeId, topic, payload); + + // 3. 立即返回 200(店匠要求3秒内响应) + return ResponseEntity.ok("OK"); + + } catch (Exception e) { + log.error("Failed to handle webhook: store={}, topic={}", storeId, topic, e); + return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body("Error"); + } + } +} + +@Service +public class WebhookService { + + /** + * 验证 Webhook 签名 + */ + public boolean verifySignature(String storeId, String payload, String signature) { + ShopConfig shop = shopConfigMapper.selectByStoreId(storeId); + if (shop == null) { + return false; + } + + // 使用 Client Secret 作为签名密钥 + String clientSecret = appConfig.getClientSecret(); + + try { + Mac mac = Mac.getInstance("HmacSHA256"); + SecretKeySpec secretKey = new SecretKeySpec( + clientSecret.getBytes(StandardCharsets.UTF_8), + "HmacSHA256" + ); + mac.init(secretKey); + + byte[] hash = mac.doFinal(payload.getBytes(StandardCharsets.UTF_8)); + String computedSignature = Base64.getEncoder().encodeToString(hash); + + return computedSignature.equals(signature); + + } catch (Exception e) { + log.error("Failed to verify signature", e); + return false; + } + } + + /** + * 异步处理 Webhook 事件 + */ + @Async + public void processWebhookAsync(String storeId, String topic, String payload) { + try { + log.info("Processing webhook: store={}, topic={}", storeId, topic); + + switch (topic) { + case "products/create": + case "products/update": + handleProductUpdate(storeId, payload); + break; + case "products/delete": + handleProductDelete(storeId, payload); + break; + case "orders/create": + case "orders/updated": + case "orders/paid": + handleOrderUpdate(storeId, payload); + break; + case "orders/cancelled": + handleOrderCancel(storeId, payload); + break; + case "customers/create": + case "customers/update": + handleCustomerUpdate(storeId, payload); + break; + case "customers/delete": + handleCustomerDelete(storeId, payload); + break; + default: + log.warn("Unknown webhook topic: {}", topic); + } + + } catch (Exception e) { + log.error("Failed to process webhook: store={}, topic={}", storeId, topic, e); + } + } + + private void handleProductUpdate(String storeId, String payload) { + ProductDto product = JSON.parseObject(payload, ProductDto.class); + ShopConfig shop = shopConfigMapper.selectByStoreId(storeId); + + // 更新数据库 + productSyncService.saveProduct(shop.getTenantId(), storeId, product); + + // 更新 ES 索引 + esIndexService.indexSingleProduct(shop.getTenantId(), product.getId()); + } + + private void handleProductDelete(String storeId, String payload) { + ProductDto product = JSON.parseObject(payload, ProductDto.class); + ShopConfig shop = shopConfigMapper.selectByStoreId(storeId); + + // 软删除数据库记录 + productSpuMapper.softDeleteByProductId(storeId, product.getId()); + + // 从 ES 中删除 + esIndexService.deleteProduct(shop.getTenantId(), product.getId()); + } + + // ... 其他事件处理方法 +} +``` + +### 7.5 幂等性保证 + +为了避免重复处理同一个事件,需要实现幂等性: + +```java +@Service +public class WebhookEventService { + + @Autowired + private RedisTemplate redisTemplate; + + /** + * 检查事件是否已处理(使用 Redis 去重) + */ + public boolean isEventProcessed(String storeId, String topic, String eventId) { + String key = String.format("webhook:processed:%s:%s:%s", storeId, topic, eventId); + return Boolean.TRUE.equals(redisTemplate.hasKey(key)); + } + + /** + * 标记事件已处理(保留24小时) + */ + public void markEventProcessed(String storeId, String topic, String eventId) { + String key = String.format("webhook:processed:%s:%s:%s", storeId, topic, eventId); + redisTemplate.opsForValue().set(key, "1", 24, TimeUnit.HOURS); + } + + /** + * 处理事件(带幂等性保证) + */ + @Transactional + public void processEventIdempotent(String storeId, String topic, String eventId, Runnable handler) { + // 检查是否已处理 + if (isEventProcessed(storeId, topic, eventId)) { + log.info("Event already processed: store={}, topic={}, eventId={}", storeId, topic, eventId); + return; + } + + // 处理事件 + handler.run(); + + // 标记已处理 + markEventProcessed(storeId, topic, eventId); + } +} +``` + +--- + +## 8. Elasticsearch 索引 + +### 8.1 索引结构设计 + +基于店匠商品结构,设计 Elasticsearch mapping: + +```json +{ + "settings": { + "number_of_shards": 3, + "number_of_replicas": 1, + "analysis": { + "analyzer": { + "chinese_ecommerce": { + "type": "custom", + "tokenizer": "ik_max_word", + "filter": ["lowercase"] + } + } + } + }, + "mappings": { + "properties": { + "tenant_id": { + "type": "keyword" + }, + "store_id": { + "type": "keyword" + }, + "product_id": { + "type": "keyword" + }, + "title": { + "type": "text", + "analyzer": "chinese_ecommerce", + "fields": { + "keyword": { + "type": "keyword" + }, + "en": { + "type": "text", + "analyzer": "english" + } + } + }, + "title_embedding": { + "type": "dense_vector", + "dims": 1024, + "index": true, + "similarity": "cosine" + }, + "body_html": { + "type": "text", + "analyzer": "chinese_ecommerce" + }, + "vendor": { + "type": "keyword" + }, + "product_type": { + "type": "keyword" + }, + "tags": { + "type": "keyword" + }, + "price": { + "type": "float" + }, + "compare_at_price": { + "type": "float" + }, + "inventory_quantity": { + "type": "integer" + }, + "image_url": { + "type": "keyword", + "index": false + }, + "image_embedding": { + "type": "dense_vector", + "dims": 1024, + "index": true, + "similarity": "cosine" + }, + "variants": { + "type": "nested", + "properties": { + "variant_id": {"type": "keyword"}, + "sku": {"type": "keyword"}, + "title": {"type": "text", "analyzer": "chinese_ecommerce"}, + "price": {"type": "float"}, + "inventory_quantity": {"type": "integer"}, + "option1": {"type": "keyword"}, + "option2": {"type": "keyword"}, + "option3": {"type": "keyword"} + } + }, + "status": { + "type": "keyword" + }, + "created_at": { + "type": "date" + }, + "updated_at": { + "type": "date" + } + } + } +} +``` + +### 8.2 索引命名规范 + +使用租户隔离的索引命名: + +``` +shoplazza_products_{tenant_id} +``` + +例如: +- `shoplazza_products_1` +- `shoplazza_products_2` + +### 8.3 数据索引流程 + +#### 8.3.1 从数据库读取商品 + +```java +@Service +public class EsIndexService { + + @Autowired + private ProductSpuMapper spuMapper; + + @Autowired + private ProductSkuMapper skuMapper; + + @Autowired + private ProductImageMapper imageMapper; + + @Autowired + private EmbeddingService embeddingService; + + @Autowired + private RestHighLevelClient esClient; + + /** + * 为店铺的所有商品建立索引 + */ + public void indexProducts(Long shopConfigId) { + ShopConfig shop = shopConfigMapper.selectById(shopConfigId); + if (shop == null) { + throw new BusinessException("Shop not found"); + } + + String indexName = String.format("shoplazza_products_%d", shop.getTenantId()); + + // 1. 创建索引(如果不存在) + createIndexIfNotExists(indexName); + + // 2. 查询所有商品 + List products = spuMapper.selectByStoreId(shop.getStoreId()); + + // 3. 批量索引 + BulkRequest bulkRequest = new BulkRequest(); + + for (ProductSpu spu : products) { + try { + // 构建 ES 文档 + Map doc = buildEsDocument(shop.getTenantId(), spu); + + // 添加到批量请求 + IndexRequest indexRequest = new IndexRequest(indexName) + .id(spu.getProductId()) + .source(doc); + bulkRequest.add(indexRequest); + + // 每500条提交一次 + if (bulkRequest.numberOfActions() >= 500) { + BulkResponse bulkResponse = esClient.bulk(bulkRequest, RequestOptions.DEFAULT); + if (bulkResponse.hasFailures()) { + log.error("Bulk index has failures: {}", bulkResponse.buildFailureMessage()); + } + bulkRequest = new BulkRequest(); + } + + } catch (Exception e) { + log.error("Failed to index product: {}", spu.getProductId(), e); + } + } + + // 4. 提交剩余的文档 + if (bulkRequest.numberOfActions() > 0) { + BulkResponse bulkResponse = esClient.bulk(bulkRequest, RequestOptions.DEFAULT); + if (bulkResponse.hasFailures()) { + log.error("Bulk index has failures: {}", bulkResponse.buildFailureMessage()); + } + } + + log.info("Indexed {} products for shop: {}", products.size(), shop.getStoreName()); + } + + /** + * 构建 ES 文档 + */ + private Map buildEsDocument(Long tenantId, ProductSpu spu) { + Map doc = new HashMap<>(); + + // 基本字段 + doc.put("tenant_id", tenantId.toString()); + doc.put("store_id", spu.getStoreId()); + doc.put("product_id", spu.getProductId()); + doc.put("title", spu.getTitle()); + doc.put("body_html", spu.getBodyHtml()); + doc.put("vendor", spu.getVendor()); + doc.put("product_type", spu.getProductType()); + doc.put("status", spu.getStatus()); + doc.put("created_at", spu.getCreatedAt()); + doc.put("updated_at", spu.getUpdatedAt()); + + // 标签 + if (StringUtils.isNotEmpty(spu.getTags())) { + doc.put("tags", Arrays.asList(spu.getTags().split(","))); + } + + // 变体(SKU) + List skus = skuMapper.selectByProductId(spu.getProductId()); + if (CollectionUtils.isNotEmpty(skus)) { + List> variants = new ArrayList<>(); + for (ProductSku sku : skus) { + Map variant = new HashMap<>(); + variant.put("variant_id", sku.getVariantId()); + variant.put("sku", sku.getSku()); + variant.put("title", sku.getTitle()); + variant.put("price", sku.getPrice()); + variant.put("inventory_quantity", sku.getInventoryQuantity()); + variant.put("option1", sku.getOption1()); + variant.put("option2", sku.getOption2()); + variant.put("option3", sku.getOption3()); + variants.add(variant); + } + doc.put("variants", variants); + + // 使用第一个 SKU 的价格和库存 + ProductSku firstSku = skus.get(0); + doc.put("price", firstSku.getPrice()); + doc.put("inventory_quantity", firstSku.getInventoryQuantity()); + } + + // 图片 + List images = imageMapper.selectByProductId(spu.getProductId()); + if (CollectionUtils.isNotEmpty(images)) { + ProductImage firstImage = images.get(0); + doc.put("image_url", firstImage.getSrc()); + + // 生成图片向量 + try { + float[] imageEmbedding = embeddingService.encodeImage(firstImage.getSrc()); + doc.put("image_embedding", imageEmbedding); + } catch (Exception e) { + log.warn("Failed to encode image: {}", firstImage.getSrc(), e); + } + } + + // 生成标题向量 + try { + float[] titleEmbedding = embeddingService.encodeText(spu.getTitle()); + doc.put("title_embedding", titleEmbedding); + } catch (Exception e) { + log.warn("Failed to encode title: {}", spu.getTitle(), e); + } + + return doc; + } +} +``` + +#### 8.3.2 调用 Python 向量服务 + +向量生成需要调用 Python 服务: + +```java +@Service +public class EmbeddingService { + + @Autowired + private RestTemplate restTemplate; + + @Value("${embedding.service.url}") + private String embeddingServiceUrl; + + /** + * 生成文本向量 + */ + public float[] encodeText(String text) { + try { + String url = embeddingServiceUrl + "/encode/text"; + + Map request = new HashMap<>(); + request.put("text", text); + + ResponseEntity response = restTemplate.postForEntity( + url, + request, + EmbeddingResponse.class + ); + + if (response.getStatusCode().is2xxSuccessful() && response.getBody() != null) { + return response.getBody().getEmbedding(); + } + + throw new BusinessException("Failed to encode text"); + + } catch (Exception e) { + log.error("Failed to call embedding service", e); + throw new BusinessException("Embedding service error", e); + } + } + + /** + * 生成图片向量 + */ + public float[] encodeImage(String imageUrl) { + try { + String url = embeddingServiceUrl + "/encode/image"; + + Map request = new HashMap<>(); + request.put("image_url", imageUrl); + + ResponseEntity response = restTemplate.postForEntity( + url, + request, + EmbeddingResponse.class + ); + + if (response.getStatusCode().is2xxSuccessful() && response.getBody() != null) { + return response.getBody().getEmbedding(); + } + + throw new BusinessException("Failed to encode image"); + + } catch (Exception e) { + log.error("Failed to call embedding service", e); + throw new BusinessException("Embedding service error", e); + } + } +} +``` + +### 8.4 增量索引更新 + +Webhook 触发增量更新: + +```java +public void indexSingleProduct(Long tenantId, String productId) { + String indexName = String.format("shoplazza_products_%d", tenantId); + + ProductSpu spu = spuMapper.selectByProductId(productId); + if (spu == null) { + log.warn("Product not found: {}", productId); + return; + } + + try { + // 构建文档 + Map doc = buildEsDocument(tenantId, spu); + + // 索引文档 + IndexRequest request = new IndexRequest(indexName) + .id(productId) + .source(doc); + + esClient.index(request, RequestOptions.DEFAULT); + + log.info("Indexed product: {}", productId); + + } catch (Exception e) { + log.error("Failed to index product: {}", productId, e); + } +} + +public void deleteProduct(Long tenantId, String productId) { + String indexName = String.format("shoplazza_products_%d", tenantId); + + try { + DeleteRequest request = new DeleteRequest(indexName, productId); + esClient.delete(request, RequestOptions.DEFAULT); + + log.info("Deleted product from ES: {}", productId); + + } catch (Exception e) { + log.error("Failed to delete product from ES: {}", productId, e); + } +} +``` + +--- + +## 9. 搜索服务集成 + +### 9.1 搜索 API 调用 + +Java 后端接收前端搜索请求后,转发给 Python 搜索服务: + +```java +@RestController +@RequestMapping("/api/search") +public class SearchController { + + @Autowired + private SearchService searchService; + + @PostMapping("/products") + public ResponseEntity searchProducts( + @RequestParam String storeId, + @RequestBody SearchRequest request) { + + try { + // 查询店铺配置,获取 tenant_id + ShopConfig shop = shopConfigMapper.selectByStoreId(storeId); + if (shop == null) { + return ResponseEntity.status(HttpStatus.NOT_FOUND).body(null); + } + + // 调用 Python 搜索服务 + SearchResponse response = searchService.search(shop.getTenantId(), request); + + // 记录搜索日志 + searchLogService.logSearch(shop.getId(), request.getQuery(), response.getTotal()); + + return ResponseEntity.ok(response); + + } catch (Exception e) { + log.error("Search failed: storeId={}, query={}", storeId, request.getQuery(), e); + return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(null); + } + } +} + +@Service +public class SearchService { + + @Autowired + private RestTemplate restTemplate; + + @Value("${search.service.url}") + private String searchServiceUrl; + + /** + * 调用 Python 搜索服务 + */ + public SearchResponse search(Long tenantId, SearchRequest request) { + try { + String url = searchServiceUrl + "/search/"; + + // 添加租户隔离参数 + request.setCustomer("tenant_" + tenantId); + + ResponseEntity response = restTemplate.postForEntity( + url, + request, + SearchResponse.class + ); + + if (response.getStatusCode().is2xxSuccessful()) { + return response.getBody(); + } + + throw new BusinessException("Search service returned error: " + response.getStatusCode()); + + } catch (Exception e) { + log.error("Failed to call search service", e); + throw new BusinessException("Search service error", e); + } + } +} +``` + +### 9.2 店铺隔离 + +每个店铺对应一个租户,使用不同的 ES 索引: + +```python +# Python 搜索服务 +@app.post("/search/") +async def search_products(request: SearchRequest): + # 根据 customer 参数确定租户 ID + tenant_id = extract_tenant_id(request.customer) + + # 使用租户专属索引 + index_name = f"shoplazza_products_{tenant_id}" + + # 构建 ES 查询 + es_query = build_es_query(request) + + # 执行搜索 + response = es_client.search( + index=index_name, + body=es_query + ) + + # 返回结果 + return format_search_response(response) +``` + +### 9.3 搜索行为统计 + +#### 9.3.1 日志表设计 + +```sql +CREATE TABLE `shoplazza_search_log` ( + `id` BIGINT NOT NULL AUTO_INCREMENT COMMENT '主键ID', + `tenant_id` BIGINT NOT NULL COMMENT '租户ID', + `store_id` VARCHAR(64) NOT NULL COMMENT '店铺ID', + `customer_id` VARCHAR(64) DEFAULT NULL COMMENT '客户ID', + `session_id` VARCHAR(128) DEFAULT NULL COMMENT '会话ID', + `query` VARCHAR(512) NOT NULL COMMENT '搜索关键词', + `results_count` INT DEFAULT 0 COMMENT '结果数量', + `search_type` VARCHAR(32) DEFAULT 'text' COMMENT '搜索类型:text, image, ai', + `language` VARCHAR(16) DEFAULT NULL COMMENT '搜索语言', + `has_results` BIT(1) DEFAULT b'1' COMMENT '是否有结果', + `response_time_ms` INT DEFAULT NULL COMMENT '响应时间(毫秒)', + `ip_address` VARCHAR(64) DEFAULT NULL COMMENT 'IP地址', + `user_agent` VARCHAR(512) DEFAULT NULL COMMENT 'User Agent', + `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', + PRIMARY KEY (`id`), + KEY `idx_tenant_id` (`tenant_id`), + KEY `idx_store_id` (`store_id`), + KEY `idx_query` (`query`), + KEY `idx_created_at` (`created_at`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='店匠搜索日志表'; +``` + +#### 9.3.2 日志记录实现 + +```java +@Service +public class SearchLogService { + + @Autowired + private SearchLogMapper searchLogMapper; + + /** + * 记录搜索日志 + */ + @Async + public void logSearch(Long shopConfigId, SearchRequest request, SearchResponse response, + long responseTime, HttpServletRequest httpRequest) { + try { + ShopConfig shop = shopConfigMapper.selectById(shopConfigId); + + SearchLog log = new SearchLog(); + log.setTenantId(shop.getTenantId()); + log.setStoreId(shop.getStoreId()); + log.setCustomerId(request.getCustomerId()); + log.setSessionId(request.getSessionId()); + log.setQuery(request.getQuery()); + log.setResultsCount(response.getTotal()); + log.setSearchType(request.getSearchType()); + log.setLanguage(request.getLanguage()); + log.setHasResults(response.getTotal() > 0); + log.setResponseTimeMs((int) responseTime); + log.setIpAddress(getClientIp(httpRequest)); + log.setUserAgent(httpRequest.getHeader("User-Agent")); + + searchLogMapper.insert(log); + + } catch (Exception e) { + log.error("Failed to log search", e); + } + } + + /** + * 统计分析:热门搜索词 + */ + public List getHotQueries(String storeId, int limit) { + return searchLogMapper.selectHotQueries(storeId, limit); + } + + /** + * 统计分析:无结果搜索 + */ + public List getNoResultQueries(String storeId, int limit) { + return searchLogMapper.selectNoResultQueries(storeId, limit); + } +} +``` + +--- + +## 10. 前端扩展开发 + +### 10.1 主题扩展开发 + +店匠使用 Liquid 模板语言开发主题扩展。 + +#### 10.1.1 创建扩展项目 + +```bash +mkdir shoplazza-ai-search-app +cd shoplazza-ai-search-app + +# 目录结构 +├── app-blocks/ +│ ├── search-box.liquid # 搜索框组件 +│ ├── search-results.liquid # 搜索结果组件 +│ └── settings.json # 组件配置 +├── assets/ +│ ├── search-box.js # JavaScript +│ ├── search-box.css # 样式 +│ └── search-results.js +├── locales/ +│ ├── en.json # 英文翻译 +│ ├── zh-CN.json # 中文翻译 +│ └── es.json # 西班牙语翻译 +└── config.json # APP 配置 +``` + +#### 10.1.2 搜索框组件(search-box.liquid) + +```liquid + + + + + + +``` + +#### 10.1.3 搜索框 JavaScript(search-box.js) + +```javascript +// 搜索框功能 +(function() { + const config = window.AI_SEARCH_CONFIG || {}; + let searchTimeout; + + function handleSearch(event) { + event.preventDefault(); + const query = event.target.q.value.trim(); + + if (!query) return false; + + // 跳转到搜索结果页 + window.location.href = `/pages/search-results?q=${encodeURIComponent(query)}`; + return false; + } + + // 搜索建议(自动补全) + function setupAutocomplete() { + const input = document.querySelector('.search-input'); + const suggestionsContainer = document.getElementById('search-suggestions'); + + if (!input || !suggestionsContainer) return; + + input.addEventListener('input', function(e) { + clearTimeout(searchTimeout); + const query = e.target.value.trim(); + + if (query.length < 2) { + suggestionsContainer.innerHTML = ''; + suggestionsContainer.style.display = 'none'; + return; + } + + searchTimeout = setTimeout(() => { + fetchSuggestions(query); + }, 300); + }); + + // 点击外部关闭建议 + document.addEventListener('click', function(e) { + if (!e.target.closest('.ai-search-box')) { + suggestionsContainer.style.display = 'none'; + } + }); + } + + async function fetchSuggestions(query) { + try { + const response = await fetch(`${config.apiEndpoint}/suggestions?q=${encodeURIComponent(query)}&store_id=${config.storeId}`); + const data = await response.json(); + + if (data.suggestions && data.suggestions.length > 0) { + renderSuggestions(data.suggestions); + } + } catch (error) { + console.error('Failed to fetch suggestions:', error); + } + } + + function renderSuggestions(suggestions) { + const container = document.getElementById('search-suggestions'); + + const html = suggestions.map(item => ` +
+ ${item.text} +
+ `).join(''); + + container.innerHTML = html; + container.style.display = 'block'; + } + + window.selectSuggestion = function(text) { + document.querySelector('.search-input').value = text; + document.getElementById('search-suggestions').style.display = 'none'; + document.querySelector('.search-form').submit(); + }; + + window.handleSearch = handleSearch; + + // 初始化 + if (document.readyState === 'loading') { + document.addEventListener('DOMContentLoaded', setupAutocomplete); + } else { + setupAutocomplete(); + } +})(); +``` + +#### 10.1.4 搜索结果页(search-results.liquid) + +```liquid +
+
+

{{ 'search.title' | t }}

+
+ {{ 'search.results_for' | t }}: +
+
+ +
+ +
+ +
+
{{ 'search.loading' | t }}
+
+ +
+
+ + + + + +``` + +#### 10.1.5 搜索结果 JavaScript(search-results.js) + +```javascript +(function() { + const config = window.AI_SEARCH_CONFIG || {}; + let currentPage = 1; + let currentQuery = ''; + let currentFilters = {}; + + // 从 URL 获取搜索参数 + function getSearchParams() { + const params = new URLSearchParams(window.location.search); + return { + query: params.get('q') || '', + page: parseInt(params.get('page')) || 1 + }; + } + + // 执行搜索 + async function performSearch() { + const params = getSearchParams(); + currentQuery = params.query; + currentPage = params.page; + + if (!currentQuery) { + showError('Please enter a search query'); + return; + } + + document.getElementById('current-query').textContent = currentQuery; + showLoading(); + + try { + const response = await fetch(config.apiEndpoint, { + method: 'POST', + headers: { + 'Content-Type': 'application/json' + }, + body: JSON.stringify({ + query: currentQuery, + page: currentPage, + size: 24, + filters: currentFilters, + facets: ['product_type', 'vendor', 'tags'], + customer: `tenant_${config.storeId}` + }) + }); + + const data = await response.json(); + + if (data.results) { + renderResults(data.results); + renderFacets(data.facets); + renderPagination(data.total, currentPage, 24); + } else { + showError('No results found'); + } + } catch (error) { + console.error('Search failed:', error); + showError('Search failed. Please try again.'); + } + } + + // 渲染搜索结果 + function renderResults(results) { + const container = document.getElementById('search-results'); + + if (results.length === 0) { + container.innerHTML = '
No products found
'; + return; + } + + const html = results.map(product => ` + + `).join(''); + + container.innerHTML = html; + } + + // 渲染分面过滤器 + function renderFacets(facets) { + const container = document.getElementById('search-filters'); + + if (!facets || Object.keys(facets).length === 0) { + container.innerHTML = ''; + return; + } + + let html = '
Filters
'; + + for (const [field, values] of Object.entries(facets)) { + if (values.length === 0) continue; + + html += ` +
+

${formatFieldName(field)}

+
+ ${values.map(item => ` + + `).join('')} +
+
+ `; + } + + container.innerHTML = html; + } + + // 切换过滤器 + window.toggleFilter = function(field, value) { + if (!currentFilters[field]) { + currentFilters[field] = []; + } + + const index = currentFilters[field].indexOf(value); + if (index > -1) { + currentFilters[field].splice(index, 1); + if (currentFilters[field].length === 0) { + delete currentFilters[field]; + } + } else { + currentFilters[field].push(value); + } + + currentPage = 1; + performSearch(); + }; + + // 渲染分页 + function renderPagination(total, page, pageSize) { + const container = document.getElementById('search-pagination'); + const totalPages = Math.ceil(total / pageSize); + + if (totalPages <= 1) { + container.innerHTML = ''; + return; + } + + let html = ''; + container.innerHTML = html; + } + + // 工具函数 + function formatPrice(price, currency) { + return new Intl.NumberFormat('en-US', { + style: 'currency', + currency: currency || 'USD' + }).format(price); + } + + function formatFieldName(field) { + return field.replace(/_/g, ' ').replace(/\b\w/g, l => l.toUpperCase()); + } + + function showLoading() { + document.getElementById('search-results').innerHTML = '
Loading...
'; + } + + function showError(message) { + document.getElementById('search-results').innerHTML = `
${message}
`; + } + + // 初始化 + if (document.readyState === 'loading') { + document.addEventListener('DOMContentLoaded', performSearch); + } else { + performSearch(); + } +})(); +``` + +### 10.2 多语言支持 + +#### 10.2.1 中文翻译(locales/zh-CN.json) + +```json +{ + "search": { + "placeholder": "搜索商品...", + "title": "搜索结果", + "results_for": "搜索", + "loading": "加载中...", + "no_results": "未找到相关商品", + "filters": "筛选", + "clear_filters": "清除筛选" + } +} +``` + +#### 10.2.2 英文翻译(locales/en.json) + +```json +{ + "search": { + "placeholder": "Search products...", + "title": "Search Results", + "results_for": "Search results for", + "loading": "Loading...", + "no_results": "No products found", + "filters": "Filters", + "clear_filters": "Clear filters" + } +} +``` + +### 10.3 主题装修集成 + +商家可以在店铺后台的主题装修中添加搜索扩展: + +1. 进入店铺后台 → 主题 → 装修 +2. 点击"添加卡片" +3. 选择"APPS"分类 +4. 找到"AI 搜索" APP +5. 拖拽"搜索框"组件到导航栏或页面顶部 +6. 创建自定义页面"搜索结果",添加"搜索结果"组件 +7. 保存并发布主题 + +--- + +## 11. 部署和上线 + +### 11.1 域名和 SSL 配置 + +#### 11.1.1 域名申请 + +申请一个公网域名,例如: +``` +saas-ai-api.example.com +``` + +#### 11.1.2 SSL 证书配置 + +使用 Let's Encrypt 或其他 CA 颁发的 SSL 证书: + +```bash +# 使用 Certbot 申请证书 +sudo apt-get install certbot +sudo certbot certonly --standalone -d saas-ai-api.example.com +``` + +#### 11.1.3 Nginx 配置 + +```nginx +server { + listen 443 ssl http2; + server_name saas-ai-api.example.com; + + ssl_certificate /etc/letsencrypt/live/saas-ai-api.example.com/fullchain.pem; + ssl_certificate_key /etc/letsencrypt/live/saas-ai-api.example.com/privkey.pem; + + # OAuth 回调 + location /oauth/ { + proxy_pass http://localhost:8080; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + } + + # Webhook 接收 + location /webhook/ { + proxy_pass http://localhost:8080; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + } + + # 搜索 API + location /api/search/ { + proxy_pass http://localhost:8080; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + } +} +``` + +### 11.2 应用审核准备 + +#### 11.2.1 应用商店信息 + +在店匠 Partner 后台填写应用信息: + +**基本信息:** +- APP 名称:AI 智能搜索 +- APP 图标:上传 512x512 PNG 图标 +- APP 分类:Search & Discovery +- 短描述:为您的店铺提供多语言、语义搜索和 AI 推荐功能 +- 详细描述:(500-2000字,介绍功能特性、使用场景、优势) + +**应用截图:** +- 至少 3 张截图(1280x800 或 1920x1080) +- 搜索框界面截图 +- 搜索结果页截图 +- 后台管理界面截图 + +**演示视频:** +- 1-2分钟演示视频 +- 展示 APP 安装、配置、使用流程 + +**定价信息:** +- 免费试用期:14 天 +- 月费:$29.99/月 +- 年费:$299/年(节省 17%) + +#### 11.2.2 测试账号 + +提供测试账号供店匠审核团队测试: + +``` +测试店铺:test-shop-12345.myshoplaza.com +管理员账号:test@example.com +管理员密码:TestPassword123! +``` + +#### 11.2.3 文档准备 + +提供完整的文档: + +- **安装指南**:如何安装和配置 APP +- **使用手册**:如何使用搜索功能 +- **API 文档**:开发者集成文档 +- **FAQ**:常见问题解答 +- **支持联系方式**:support@example.com + +### 11.3 审核和发布 + +#### 11.3.1 提交审核 + +1. 在 Partner 后台点击"提交审核" +2. 填写审核说明 +3. 等待审核结果(通常 3-7 个工作日) + +#### 11.3.2 审核常见问题 + +店匠应用审核的常见拒绝原因: + +1. **功能问题:** + - 核心功能无法正常使用 + - 页面加载速度过慢 + - 移动端适配不良 + +2. **权限问题:** + - 申请了不必要的权限 + - 未说明权限用途 + +3. **UI/UX 问题:** + - 界面与店铺风格不一致 + - 缺少多语言支持 + - 操作流程不清晰 + +4. **文档问题:** + - 缺少必要的文档 + - 文档描述不清楚 + - 测试账号无法访问 + +#### 11.3.3 应用发布 + +审核通过后: + +1. 应用自动发布到店匠应用市场 +2. 商家可以搜索并安装你的 APP +3. 开始正式运营和推广 + +--- + +## 12. 附录 + +### 12.1 API 参考 + +#### 12.1.1 店匠 API 端点速查表 + +| API | 端点 | 方法 | 说明 | +|-----|------|------|------| +| **OAuth** | +| 授权 URL | `/partner/oauth/authorize` | GET | 获取授权 | +| 获取 Token | `/partner/oauth/token` | POST | 换取 Token | +| **商品** | +| 商品列表 | `/openapi/2022-01/products` | GET | 获取商品列表 | +| 商品详情 | `/openapi/2022-01/products/{id}` | GET | 获取单个商品 | +| 商品总数 | `/openapi/2022-01/products/count` | GET | 获取商品总数 | +| **订单** | +| 订单列表 | `/openapi/2022-01/orders` | GET | 获取订单列表 | +| 订单详情 | `/openapi/2022-01/orders/{id}` | GET | 获取单个订单 | +| **客户** | +| 客户列表 | `/openapi/2022-01/customers` | GET | 获取客户列表 | +| 客户详情 | `/openapi/2022-01/customers/{id}` | GET | 获取单个客户 | +| **Webhook** | +| 注册 Webhook | `/openapi/2022-01/webhooks` | POST | 注册事件通知 | +| Webhook 列表 | `/openapi/2022-01/webhooks` | GET | 获取已注册列表 | +| 删除 Webhook | `/openapi/2022-01/webhooks/{id}` | DELETE | 删除 Webhook | + +#### 12.1.2 搜索 API 请求示例 + +**文本搜索:** + +```bash +curl -X POST http://your-domain:6002/search/ \ + -H "Content-Type: application/json" \ + -d '{ + "query": "bluetooth headphone", + "customer": "tenant_1", + "size": 20, + "from": 0, + "filters": { + "product_type": "Electronics" + }, + "facets": ["vendor", "product_type", "tags"] + }' +``` + +**图片搜索:** + +```bash +curl -X POST http://your-domain:6002/search/image \ + -H "Content-Type: application/json" \ + -d '{ + "image_url": "https://example.com/image.jpg", + "customer": "tenant_1", + "size": 20 + }' +``` + +### 12.2 数据库表结构 DDL + +完整的数据库表创建脚本请参考第 4、6 章节中的 SQL 语句。 + +**核心表列表:** +- `system_tenant` - 租户表 +- `shoplazza_shop_config` - 店铺配置表 +- `shoplazza_product_spu` - 商品 SPU 表 +- `shoplazza_product_sku` - 商品 SKU 表 +- `shoplazza_product_image` - 商品图片表 +- `shoplazza_customer` - 客户表 +- `shoplazza_customer_address` - 客户地址表 +- `shoplazza_order` - 订单表 +- `shoplazza_order_item` - 订单明细表 +- `shoplazza_search_log` - 搜索日志表 + +### 12.3 配置示例 + +#### 12.3.1 application.yml 配置 + +```yaml +# OAuth 配置 +shoplazza: + oauth: + client-id: m8F9PrPnxpyrlz4ONBWRoINsa5xyNT4Qd-Fh_h7o1es + client-secret: m2cDNrBqAa8TKeridXd4eXnhi9E7pda2gKXet_72rjo + redirect-uri: https://your-domain.com/oauth/callback + scopes: + - read_shop + - read_product + - read_order + - read_customer + - read_app_proxy + + # Webhook 配置 + webhook: + base-url: https://your-domain.com/webhook/shoplazza + topics: + - products/create + - products/update + - products/delete + - orders/create + - customers/create + +# 搜索服务配置 +search: + service: + url: http://localhost:6002 + timeout: 30000 + +# 向量服务配置 +embedding: + service: + url: http://localhost:6003 + timeout: 60000 + +# Elasticsearch 配置 +elasticsearch: + hosts: localhost:9200 + username: elastic + password: changeme + +# 数据同步配置 +sync: + enabled: true + batch-size: 50 + schedule: + products: "0 0 */1 * * ?" # 每小时 + orders: "0 0 3 * * ?" # 每天凌晨3点 + customers: "0 0 4 * * ?" # 每天凌晨4点 +``` + +### 12.4 故障排查 + +#### 12.4.1 OAuth 认证失败 + +**问题:** 授权回调时报错 "Invalid redirect_uri" + +**解决:** +1. 检查 Partner 后台配置的 Redirect URI 是否与代码中一致 +2. 确保 Redirect URI 使用 HTTPS 协议 +3. 确保 Redirect URI 可公网访问 + +#### 12.4.2 Token 过期 + +**问题:** API 调用返回 401 Unauthorized + +**解决:** +1. 检查数据库中的 `token_expires_at` 字段 +2. 使用 Refresh Token 刷新 Access Token +3. 更新数据库中的 Token 信息 + +#### 12.4.3 API 调用速率限制 + +**问题:** API 返回 429 Too Many Requests + +**解决:** +1. 降低请求频率 +2. 实现指数退避重试 +3. 解析响应头中的 `X-RateLimit-Reset` 字段,等待到指定时间后再重试 + +#### 12.4.4 Webhook 接收失败 + +**问题:** Webhook 事件未收到或签名验证失败 + +**解决:** +1. 检查 Webhook 地址是否可公网访问 +2. 检查签名验证逻辑是否正确使用 Client Secret +3. 查看店匠后台的 Webhook 日志,确认发送状态 +4. 确保 Webhook 处理在 3 秒内返回 200 响应 + +#### 12.4.5 商品搜索无结果 + +**问题:** 搜索返回空结果 + +**解决:** +1. 检查 ES 索引是否存在:`GET /shoplazza_products_1/_count` +2. 检查商品是否已索引:`GET /shoplazza_products_1/_search` +3. 检查租户隔离参数是否正确 +4. 查看搜索服务日志,确认查询语句 + +#### 12.4.6 向量生成失败 + +**问题:** 图片或文本向量生成失败 + +**解决:** +1. 检查向量服务是否正常运行 +2. 检查向量服务的 GPU/CPU 资源是否充足 +3. 检查图片 URL 是否可访问 +4. 查看向量服务日志 + +--- + +## 13. 参考资料 + +### 13.1 官方文档 + +- [店匠开发者文档](https://www.shoplazza.dev/reference/overview-29) +- [店匠 OAuth 文档](https://www.shoplazza.dev/v2024.07/reference/authentication) +- [店匠 API 参考](https://www.shoplazza.dev/v2024.07/reference/overview) +- [店匠 Webhook 文档](https://www.shoplazza.dev/v2024.07/reference/webhooks) + +### 13.2 技术栈文档 + +- [OAuth 2.0 RFC 6749](https://tools.ietf.org/html/rfc6749) +- [Elasticsearch 官方文档](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html) +- [Liquid 模板语言](https://shopify.github.io/liquid/) +- [FastAPI 文档](https://fastapi.tiangolo.com/) + +### 13.3 联系支持 + +如有问题,请联系: + +- **技术支持邮箱**: support@example.com +- **开发者社区**: https://community.example.com +- **GitHub Issues**: https://github.com/your-org/search-saas/issues + +--- + +**文档版本**: v1.0 +**最后更新**: 2025-11-12 +**维护团队**: 搜索 SaaS 开发团队 + diff --git a/config/config_loader.py b/config/config_loader.py index 3b8e7d5..dd230ef 100644 --- a/config/config_loader.py +++ b/config/config_loader.py @@ -52,6 +52,13 @@ class QueryConfig: translation_api_key: Optional[str] = None translation_service: str = "deepl" # deepl, google, etc. + # ES source fields configuration - fields to return in search results + source_fields: List[str] = field(default_factory=lambda: [ + "id", "spuId", "skuNo", "spuNo", "title", "enSpuName", "brandId", + "brandName", "enBrandName", "categoryId", "categoryName", "enCategoryName", + "price", "originalPrice", "currency", "image", "status", "createdAt", "updatedAt" + ]) + @dataclass class SPUConfig: diff --git a/search/es_query_builder.py b/search/es_query_builder.py index 953b80a..4e1c07e 100644 --- a/search/es_query_builder.py +++ b/search/es_query_builder.py @@ -17,7 +17,8 @@ class ESQueryBuilder: index_name: str, match_fields: List[str], text_embedding_field: Optional[str] = None, - image_embedding_field: Optional[str] = None + image_embedding_field: Optional[str] = None, + source_fields: Optional[List[str]] = None ): """ Initialize query builder. @@ -27,11 +28,13 @@ class ESQueryBuilder: match_fields: Fields to search for text matching text_embedding_field: Field name for text embeddings image_embedding_field: Field name for image embeddings + source_fields: Fields to return in search results (_source includes) """ self.index_name = index_name self.match_fields = match_fields self.text_embedding_field = text_embedding_field self.image_embedding_field = image_embedding_field + self.source_fields = source_fields def build_query( self, @@ -71,6 +74,12 @@ class ESQueryBuilder: "from": from_ } + # Add _source filtering if source_fields are configured + if self.source_fields: + es_query["_source"] = { + "includes": self.source_fields + } + # Build main query if query_node and query_node.operator != 'TERM': # Complex boolean query diff --git a/search/multilang_query_builder.py b/search/multilang_query_builder.py index 571481f..a8d27dc 100644 --- a/search/multilang_query_builder.py +++ b/search/multilang_query_builder.py @@ -29,7 +29,8 @@ class MultiLanguageQueryBuilder(ESQueryBuilder): config: CustomerConfig, index_name: str, text_embedding_field: Optional[str] = None, - image_embedding_field: Optional[str] = None + image_embedding_field: Optional[str] = None, + source_fields: Optional[List[str]] = None ): """ Initialize multi-language query builder. @@ -39,6 +40,7 @@ class MultiLanguageQueryBuilder(ESQueryBuilder): index_name: ES index name text_embedding_field: Field name for text embeddings image_embedding_field: Field name for image embeddings + source_fields: Fields to return in search results (_source includes) """ self.config = config self.function_score_config = config.function_score @@ -50,7 +52,8 @@ class MultiLanguageQueryBuilder(ESQueryBuilder): index_name=index_name, match_fields=default_fields, text_embedding_field=text_embedding_field, - image_embedding_field=image_embedding_field + image_embedding_field=image_embedding_field, + source_fields=source_fields ) # Build domain configurations @@ -205,6 +208,12 @@ class MultiLanguageQueryBuilder(ESQueryBuilder): "query": function_score_query } + # Add _source filtering if source_fields are configured + if self.source_fields: + es_query["_source"] = { + "includes": self.source_fields + } + if min_score is not None: es_query["min_score"] = min_score diff --git a/search/searcher.py b/search/searcher.py index d8b4dba..8534d14 100644 --- a/search/searcher.py +++ b/search/searcher.py @@ -99,7 +99,8 @@ class Searcher: config=config, index_name=config.es_index_name, text_embedding_field=self.text_embedding_field, - image_embedding_field=self.image_embedding_field + image_embedding_field=self.image_embedding_field, + source_fields=config.query_config.source_fields ) def search( @@ -513,6 +514,12 @@ class Searcher: } } + # Add _source filtering if source_fields are configured + if self.config.query_config.source_fields: + es_query["_source"] = { + "includes": self.config.query_config.source_fields + } + if filters or range_filters: filter_clauses = self.query_builder._build_filters(filters, range_filters) if filter_clauses: diff --git a/test_search_with_source_fields.py b/test_search_with_source_fields.py new file mode 100644 index 0000000..b5c7897 --- /dev/null +++ b/test_search_with_source_fields.py @@ -0,0 +1,147 @@ +#!/usr/bin/env python3 +""" +测试实际搜索功能中的source_fields应用 +""" + +import sys +import os +import json +sys.path.append(os.path.dirname(os.path.abspath(__file__))) + +from config import ConfigLoader + +def test_search_query_structure(): + """测试搜索查询是否正确应用了source_fields""" + print("测试搜索查询中的source_fields应用...") + + try: + from search.searcher import Searcher + from utils.es_client import ESClient + + # 加载配置 + config_loader = ConfigLoader("config/schema") + config = config_loader.load_customer_config("customer1") + + print(f"✓ 配置加载成功: {config.customer_id}") + print(f" source_fields配置数量: {len(config.query_config.source_fields)}") + + # 创建ES客户端(使用模拟客户端避免实际连接) + class MockESClient: + def search(self, index_name, body, size=10, from_=0): + print(f"模拟ES搜索 - 索引: {index_name}") + print(f"查询body结构:") + print(json.dumps(body, indent=2, ensure_ascii=False)) + + # 检查_source配置 + if "_source" in body: + print("✓ 查询包含_source配置") + source_config = body["_source"] + if "includes" in source_config: + print(f"✓ source includes字段: {source_config['includes']}") + return { + 'took': 5, + 'hits': { + 'total': {'value': 0}, + 'max_score': 0.0, + 'hits': [] + } + } + else: + print("✗ _source配置中缺少includes") + return None + else: + print("✗ 查询中缺少_source配置") + return None + + def client(self): + return self + + # 创建Searcher实例 + es_client = MockESClient() + searcher = Searcher(config, es_client) + + print("\n测试文本搜索...") + result = searcher.search("test query", size=5) + + if result: + print("✓ 文本搜索测试成功") + else: + print("✗ 文本搜索测试失败") + + print("\n测试图像搜索...") + try: + result = searcher.search_by_image("http://example.com/image.jpg", size=3) + if result: + print("✓ 图像搜索测试成功") + else: + print("✗ 图像搜索测试失败") + except Exception as e: + print(f"✗ 图像搜索测试失败: {e}") + + return True + + except Exception as e: + print(f"✗ 搜索测试失败: {e}") + import traceback + traceback.print_exc() + return False + +def test_es_query_builder_integration(): + """测试ES查询构建器的集成""" + print("\n测试ES查询构建器集成...") + + try: + from search.es_query_builder import ESQueryBuilder + + # 创建构建器,传入空的source_fields列表 + builder = ESQueryBuilder( + index_name="test_index", + match_fields=["title", "content"], + source_fields=None # 测试空配置的情况 + ) + + query = builder.build_query("test query") + + if "_source" not in query: + print("✓ 空source_fields配置下,查询不包含_source过滤") + else: + print("⚠ 空source_fields配置下,查询仍然包含_source过滤") + + # 测试非空配置 + builder2 = ESQueryBuilder( + index_name="test_index", + match_fields=["title", "content"], + source_fields=["id", "title"] + ) + + query2 = builder2.build_query("test query") + + if "_source" in query2 and "includes" in query2["_source"]: + print("✓ 非空source_fields配置下,查询正确包含_source过滤") + else: + print("✗ 非空source_fields配置下,查询缺少_source过滤") + + return True + + except Exception as e: + print(f"✗ 查询构建器集成测试失败: {e}") + return False + +if __name__ == "__main__": + print("=" * 60) + print("搜索功能source_fields应用测试") + print("=" * 60) + + success = True + + # 运行所有测试 + success &= test_es_query_builder_integration() + success &= test_search_query_structure() + + print("\n" + "=" * 60) + if success: + print("✓ 所有测试通过!source_fields在搜索功能中正确应用。") + print("✓ ES现在只返回配置中指定的字段,减少了网络传输和响应大小。") + else: + print("✗ 部分测试失败,请检查实现。") + print("=" * 60) \ No newline at end of file diff --git a/test_source_fields.py b/test_source_fields.py new file mode 100644 index 0000000..b129bd6 --- /dev/null +++ b/test_source_fields.py @@ -0,0 +1,132 @@ +#!/usr/bin/env python3 +""" +测试ES source_fields配置的脚本 +""" + +import sys +import os +sys.path.append(os.path.dirname(os.path.abspath(__file__))) + +from config import ConfigLoader, CustomerConfig + +def test_source_fields_config(): + """测试source_fields配置是否正确加载""" + print("测试ES source_fields配置...") + + # 加载配置 + config_loader = ConfigLoader("config/schema") + + try: + # 加载customer1配置 + config = config_loader.load_customer_config("customer1") + print(f"✓ 成功加载配置: {config.customer_id}") + + # 检查source_fields配置 + source_fields = config.query_config.source_fields + print(f"✓ source_fields配置 ({len(source_fields)}个字段):") + for i, field in enumerate(source_fields, 1): + print(f" {i:2d}. {field}") + + # 检查默认字段列表是否包含预期字段 + expected_fields = ["id", "title", "brandName", "price", "image"] + for field in expected_fields: + if field in source_fields: + print(f"✓ 包含预期字段: {field}") + else: + print(f"⚠ 缺少预期字段: {field}") + + return True + + except Exception as e: + print(f"✗ 配置加载失败: {e}") + return False + +def test_es_query_builder(): + """测试ES查询构建器是否正确应用source_fields""" + print("\n测试ES查询构建器...") + + try: + from search.es_query_builder import ESQueryBuilder + + # 测试基础查询构建器 + builder = ESQueryBuilder( + index_name="test_index", + match_fields=["title", "content"], + source_fields=["id", "title", "price"] + ) + + # 构建查询 + query = builder.build_query("test query") + + print("✓ ES查询构建成功") + print(f"查询结构:") + print(f" size: {query.get('size')}") + print(f" _source: {query.get('_source')}") + + # 检查_source配置 + if "_source" in query: + source_config = query["_source"] + if "includes" in source_config: + print(f"✓ _source includes配置正确: {source_config['includes']}") + else: + print("✗ _source配置中缺少includes字段") + else: + print("✗ 查询中缺少_source配置") + + return True + + except Exception as e: + print(f"✗ ES查询构建器测试失败: {e}") + import traceback + traceback.print_exc() + return False + +def test_multilang_query_builder(): + """测试多语言查询构建器""" + print("\n测试多语言查询构建器...") + + try: + from search.multilang_query_builder import MultiLanguageQueryBuilder + + # 加载配置 + config_loader = ConfigLoader("config/schema") + config = config_loader.load_customer_config("customer1") + + # 创建多语言查询构建器 + builder = MultiLanguageQueryBuilder( + config=config, + index_name=config.es_index_name, + text_embedding_field="text_embedding", + image_embedding_field="image_embedding", + source_fields=config.query_config.source_fields + ) + + print("✓ 多语言查询构建器创建成功") + print(f" source_fields配置: {builder.source_fields}") + + return True + + except Exception as e: + print(f"✗ 多语言查询构建器测试失败: {e}") + import traceback + traceback.print_exc() + return False + +if __name__ == "__main__": + print("=" * 60) + print("ES Source Fields 配置测试") + print("=" * 60) + + success = True + + # 运行所有测试 + success &= test_source_fields_config() + success &= test_es_query_builder() + success &= test_multilang_query_builder() + + print("\n" + "=" * 60) + if success: + print("✓ 所有测试通过!source_fields配置已正确实现。") + else: + print("✗ 部分测试失败,请检查配置和代码。") + print("=" * 60) \ No newline at end of file diff --git a/当前开发进度.md b/当前开发进度.md deleted file mode 100644 index 7fb71f4..0000000 --- a/当前开发进度.md +++ /dev/null @@ -1,536 +0,0 @@ -# 搜索引擎通用化开发进度 - -## 项目概述 - -对后端搜索技术 做通用化。 -通用化的本质 是 对于各种业务数据、各种检索需求,都可以 用少量定制+配置化 来实现效果。 - - -**通用化的本质**:对于各种业务数据、各种检索需求,都可以用少量定制+配置化来实现效果。 - ---- - -## 1. 原始数据层的约定 - -所有租户共用主表、独立配置和扩展表,有自己独立的ES索引。 - -### 1.1 店匠主表 - -所有租户共用以下主表: -- `shoplazza_product_sku` - SKU级别商品数据 -- `shoplazza_product_spu` - SPU级别商品数据 - -### 1.2 每个租户的扩展表 - -各个租户有自己的扩展表,不同的租户根据不同的业务需要、以及不同的数据源,来定制自己的扩展表: -- 自定义属性体系 -- 多语言商品标题(中文、英文、俄文等) -- 品牌名、不同的类目和标签体系 -- 业务过滤和聚合字段 -- 权重(提权)字段 - -**数据关联方式**: -- 入索引时,商品主表 `shoplazza_product_sku` 的 `id` + `shopid` 与租户扩展表关联 -- 例如:`customer1_extension` 表存储 customer1 的自定义字段 - -### 1.3 配置化方案 - -统一通过配置文件定义: -1. ES 字段定义(字段类型、分析器、来源表/列) -2. ES mapping 结构生成 -3. 数据入库映射关系 - ---- - -## 2. 配置系统实现 - -### 2.1 应用结构配置(字段定义) - -**配置文件位置**:`config/schema/{customer_id}_config.yaml` - -**配置内容**:定义了 ES 的输入数据有哪些字段、关联 MySQL 的哪些字段。 - -**实现情况**: - -#### 字段类型支持 -- **TEXT**:文本字段,支持多语言分析器 -- **KEYWORD**:关键词字段,用于精确匹配和聚合 -- **TEXT_EMBEDDING**:文本向量字段(1024维,dot_product相似度) -- **IMAGE_EMBEDDING**:图片向量字段(1024维,dot_product相似度) -- **INT/LONG**:整数类型 -- **FLOAT/DOUBLE**:浮点数类型 -- **DATE**:日期类型 -- **BOOLEAN**:布尔类型 - -#### 分析器支持 -- **chinese_ecommerce**:中文电商分词器(index_ansj/query_ansj) -- **english**:英文分析器 -- **russian**:俄文分析器 -- **arabic**:阿拉伯文分析器 -- **spanish**:西班牙文分析器 -- **japanese**:日文分析器 -- **standard**:标准分析器 -- **keyword**:关键词分析器 - -#### 字段配置示例 - -```yaml -fields: - # 主键字段 - - name: "skuId" - type: "LONG" - source_table: "main" # 主表 - source_column: "id" - required: true - index: true - store: true - - # 多语言文本字段 - - name: "name" - type: "TEXT" - source_table: "extension" # 扩展表 - source_column: "name" - analyzer: "chinese_ecommerce" - boost: 2.0 - index: true - store: true - - - name: "enSpuName" - type: "TEXT" - source_table: "extension" - source_column: "enSpuName" - analyzer: "english" - boost: 2.0 - - - name: "ruSkuName" - type: "TEXT" - source_table: "extension" - source_column: "ruSkuName" - analyzer: "russian" - boost: 2.0 - - # 文本向量字段 - - name: "name_embedding" - type: "TEXT_EMBEDDING" - source_table: "extension" - source_column: "name" - embedding_dims: 1024 - embedding_similarity: "dot_product" - index: true - - # 图片向量字段 - - name: "image_embedding" - type: "IMAGE_EMBEDDING" - source_table: "extension" - source_column: "imageUrl" - embedding_dims: 1024 - embedding_similarity: "dot_product" - nested: false -``` - -**实现模块**: -- `config/config_loader.py` - 配置加载器 -- `config/field_types.py` - 字段类型定义 -- `indexer/mapping_generator.py` - ES mapping 生成器 -- `indexer/data_transformer.py` - 数据转换器 - -### 2.2 索引结构配置(查询域配置) - -**配置内容**:定义了 ES 的字段索引 mapping 配置,支持各个域的查询,包括默认域的查询。 - -**实现情况**: - -#### 域(Domain)配置 -每个域定义了: -- 域名称(如 `default`, `title`, `category`, `brand`) -- 域标签(中文描述) -- 搜索字段列表 -- 默认分析器 -- 权重(boost) -- **多语言字段映射**(`language_field_mapping`) - -#### 多语言字段映射 - -支持将不同语言的查询路由到对应的字段: - -```yaml -indexes: - - name: "default" - label: "默认索引" - fields: - - "name" - - "enSpuName" - - "ruSkuName" - - "categoryName" - - "brandName" - analyzer: "chinese_ecommerce" - boost: 1.0 - language_field_mapping: - zh: - - "name" - - "categoryName" - - "brandName" - en: - - "enSpuName" - ru: - - "ruSkuName" - - - name: "title" - label: "标题索引" - fields: - - "name" - - "enSpuName" - - "ruSkuName" - analyzer: "chinese_ecommerce" - boost: 2.0 - language_field_mapping: - zh: - - "name" - en: - - "enSpuName" - ru: - - "ruSkuName" -``` - -**工作原理**: -1. 检测查询语言(中文、英文、俄文等) -2. 如果查询语言在 `language_field_mapping` 中,使用原始查询搜索对应语言的字段 -3. 将查询翻译到其他支持的语言,分别搜索对应语言的字段 -4. 组合多个语言查询的结果,提高召回率 - -**实现模块**: -- `search/multilang_query_builder.py` - 多语言查询构建器 -- `query/query_parser.py` - 查询解析器(支持语言检测和翻译) - ---- - -## 3. 测试数据灌入 - -### 3.1 数据源 - -**主表**:`shoplazza_product_sku` -- 所有租户共用 -- 包含基础商品信息(id, shopid 等) - -**扩展表**:`customer1_extension` -- 每个租户独立 -- 包含自定义字段和多语言字段 - -### 3.2 数据灌入方式 - -**实现情况**: - -#### 命令行工具 -```bash -python main.py ingest \ - --customer customer1 \ - --csv-file data/customer1_data.csv \ - --es-host http://localhost:9200 \ - --recreate \ - --batch-size 100 -``` - -#### 数据流程 -1. **数据加载**:从 CSV 文件或 MySQL 数据库加载数据 -2. **数据转换**: - - 字段映射(根据配置将源字段映射到 ES 字段) - - 类型转换(字符串、数字、日期等) - - 向量生成(文本向量、图片向量) - - 向量缓存(避免重复计算) -3. **索引创建**: - - 根据配置生成 ES mapping - - 创建或更新索引 -4. **批量入库**: - - 批量写入 ES(默认每批 500 条) - - 错误处理和重试机制 - -#### 配置映射示例 - -**customer1_config.yaml** 配置: -```yaml -main_table: "shoplazza_product_sku" -extension_table: "customer1_extension" -es_index_name: "search_customer1" - -fields: - - name: "skuId" - source_table: "main" - source_column: "id" - - name: "name" - source_table: "extension" - source_column: "name" - - name: "enSpuName" - source_table: "extension" - source_column: "enSpuName" -``` - -**数据转换**: -- 主表字段:直接从 `shoplazza_product_sku` 表的 `id` 字段读取 -- 扩展表字段:从 `customer1_extension` 表的对应列读取 -- 向量字段:对源文本/图片生成向量并缓存 - -**实现模块**: -- `indexer/data_transformer.py` - 数据转换器 -- `indexer/bulk_indexer.py` - 批量索引器 -- `indexer/indexing_pipeline.py` - 索引流水线 -- `embeddings/bge_encoder.py` - 文本向量编码器 -- `embeddings/clip_image_encoder.py` - 图片向量编码器 - ---- - -## 4. QueryParser 实现 - - -### 4.1 查询改写(Query Rewriting) - -配置词典的key是query,value是改写后的查询表达式,比如。比如品牌词 改写为在brand|query OR name|query,类别词、标签词等都可以放进去。纠错、规范化、查询改写等 都可以通过这个词典来配置。 -**实现情况**: - -#### 配置方式 -在 `query_config.rewrite_dictionary` 中配置查询改写规则: - -```yaml -query_config: - enable_query_rewrite: true - rewrite_dictionary: - "芭比": "brand:芭比 OR name:芭比娃娃" - "玩具": "category:玩具" - "消防": "category:消防 OR name:消防" -``` - -#### 功能特性 -- **精确匹配**:查询完全匹配词典 key 时,替换为 value -- **部分匹配**:查询包含词典 key 时,替换该部分 -- **支持布尔表达式**:value 可以是复杂的布尔表达式(AND, OR, 域查询等) - -#### 实现模块 -- `query/query_rewriter.py` - 查询改写器 -- `query/query_parser.py` - 查询解析器(集成改写功能) - -### 4.2 翻译(Translation) - -**实现情况**: - -#### 配置方式 -```yaml -query_config: - supported_languages: - - "zh" - - "en" - - "ru" - default_language: "zh" - enable_translation: true - translation_service: "deepl" - translation_api_key: null # 通过环境变量设置 -``` - -#### 功能特性 -1. **语言检测**:自动检测查询语言 -2. **智能翻译**: - - 如果查询是中文,翻译为英文、俄文 - - 如果查询是英文,翻译为中文、俄文 - - 如果查询是其他语言,翻译为所有支持的语言 -3. **域感知翻译**: - - 如果域有 `language_field_mapping`,只翻译到映射中存在的语言 - - 避免不必要的翻译,提高效率 -4. **翻译缓存**:缓存翻译结果,避免重复调用 API - -#### 工作流程 -``` -查询输入 → 语言检测 → 确定目标语言 → 翻译 → 多语言查询构建 -``` - -#### 实现模块 -- `query/language_detector.py` - 语言检测器 -- `query/translator.py` - 翻译器(DeepL API) -- `query/query_parser.py` - 查询解析器(集成翻译功能) - -### 4.3 文本向量化(Text Embedding) - -如果配置打开了text_embedding查询,并且query 包含了default域的查询,那么要把default域的查询词转向量,后面searcher会用这个向量参与查询。 - -**实现情况**: - -#### 配置方式 -```yaml -query_config: - enable_text_embedding: true -``` - -#### 功能特性 -1. **条件生成**: - - 仅当 `enable_text_embedding=true` 时生成向量 - - 仅对 `default` 域查询生成向量 -2. **向量模型**:BGE-M3 模型(1024维向量) -3. **用途**:用于语义搜索(KNN 检索) - -#### 实现模块 -- `embeddings/bge_encoder.py` - BGE 文本编码器 -- `query/query_parser.py` - 查询解析器(集成向量生成) - ---- - -## 5. Searcher 实现 - -参考opensearch,他们自己定义的一套索引结构配置、支持自定义的一套检索表达式、排序表达式,这是各个客户进行配置化的基础,包括索引结构配置、排序策略配置。 -比如各种业务过滤策略 可以简单的通过表达式满足,比如brand|耐克 AND cate2|xxx。指定字段排序可以通过排序的表达式实现。 - -查询默认在default域,相也会对这个域的查询做一些相关性的重点优化,包括融合语义相关性、多语言相关性(可以基于配置 将查询翻译到指定语言并在对应的语言的字段进行查询)来弥补传统查询分析手段(比如查询改写 纠错 词权重等)的不足,也支持通过配置一些词表转为泛查询模式来优化相关性。 - -### 5.1 布尔表达式解析 - -**实现情况**: - -#### 支持的运算符 -- **AND**:所有项必须匹配 -- **OR**:任意项匹配 -- **RANK**:排序增强(类似 OR 但影响排序) -- **ANDNOT**:排除(第一项匹配,第二项不匹配) -- **()**:括号分组 - -#### 优先级(从高到低) -1. `()` - 括号 -2. `ANDNOT` - 排除 -3. `AND` - 与 -4. `OR` - 或 -5. `RANK` - 排序 - -#### 示例 -``` -laptop AND (gaming OR professional) ANDNOT cheap -``` - -#### 实现模块 -- `search/boolean_parser.py` - 布尔表达式解析器 -- `search/searcher.py` - 搜索器(集成布尔解析) - -### 5.2 多语言搜索 - -**实现情况**: - -#### 工作原理 -1. **查询解析**: - - 提取域(如 `title:查询` → 域=`title`,查询=`查询`) - - 检测查询语言 - - 生成翻译 -2. **多语言查询构建**: - - 如果域有 `language_field_mapping`: - - 使用检测到的语言查询对应字段(boost * 1.5) - - 使用翻译后的查询搜索其他语言字段(boost * 1.0) - - 如果域没有 `language_field_mapping`: - - 使用所有字段进行搜索 -3. **查询组合**: - - 多个语言查询组合为 `should` 子句 - - 提高召回率 - -#### 示例 -``` -查询: "芭比娃娃" -域: default -检测语言: zh - -生成的查询: -- 中文查询 "芭比娃娃" → 搜索 name, categoryName, brandName (boost * 1.5) -- 英文翻译 "Barbie doll" → 搜索 enSpuName (boost * 1.0) -- 俄文翻译 "Кукла Барби" → 搜索 ruSkuName (boost * 1.0) -``` - -#### 实现模块 -- `search/multilang_query_builder.py` - 多语言查询构建器 -- `search/searcher.py` - 搜索器(使用多语言构建器) - -### 5.3 相关性计算(Ranking) - -**实现情况**: - -#### 当前实现 -**公式**:`bm25() + 0.2 * text_embedding_relevance()` - -- **bm25()**:BM25 文本相关性得分 - - 包括多语言打分 - - 内部通过配置翻译为多种语言 - - 分别到对应的字段搜索 - - 中文字段使用中文分词器,英文字段使用英文分词器 -- **text_embedding_relevance()**:文本向量相关性得分(KNN 检索的打分) - - 权重:0.2 - -#### 配置方式 -```yaml -ranking: - expression: "bm25() + 0.2*text_embedding_relevance()" - description: "BM25 text relevance combined with semantic embedding similarity" -``` - -#### 扩展性 -- 支持表达式配置(未来可扩展) -- 支持自定义函数(如 `timeliness()`, `field_value()`) - -#### 实现模块 -- `search/ranking_engine.py` - 排序引擎 -- `search/searcher.py` - 搜索器(集成排序功能) - ---- - -## 6. 已完成功能总结 - -### 6.1 配置系统 -- ✅ 字段定义配置(类型、分析器、来源表/列) -- ✅ 索引域配置(多域查询、多语言映射) -- ✅ 查询配置(改写词典、翻译配置) -- ✅ 排序配置(表达式配置) -- ✅ 配置验证(字段存在性、类型检查、分析器匹配) - -### 6.2 数据索引 -- ✅ 数据转换(字段映射、类型转换) -- ✅ 向量生成(文本向量、图片向量) -- ✅ 向量缓存(避免重复计算) -- ✅ 批量索引(错误处理、重试机制) -- ✅ ES mapping 自动生成 - -### 6.3 查询处理 -- ✅ 查询改写(词典配置) -- ✅ 语言检测 -- ✅ 多语言翻译(DeepL API) -- ✅ 文本向量化(BGE-M3) -- ✅ 域提取(支持 `domain:query` 语法) - -### 6.4 搜索功能 -- ✅ 布尔表达式解析(AND, OR, RANK, ANDNOT, 括号) -- ✅ 多语言查询构建(语言路由、字段映射) -- ✅ 语义搜索(KNN 检索) -- ✅ 相关性排序(BM25 + 向量相似度) -- ✅ 结果聚合(Faceted Search) - -### 6.5 API 服务 -- ✅ RESTful API(FastAPI) -- ✅ 搜索接口(文本搜索、图片搜索) -- ✅ 文档查询接口 -- ✅ 前端界面(HTML + JavaScript) - ---- - -## 7. 技术栈 - -- **后端**:Python 3.6+ -- **搜索引擎**:Elasticsearch -- **数据库**:MySQL(Shoplazza) -- **向量模型**:BGE-M3(文本)、CN-CLIP(图片) -- **翻译服务**:DeepL API -- **API 框架**:FastAPI -- **前端**:HTML + JavaScript - ---- - -## 8. 配置文件示例 - -完整配置示例请参考:`config/schema/customer1_config.yaml` - ---- - -## 9. 相关文档 - -- `MULTILANG_FEATURE.md` - 多语言功能详细说明 -- `QUICKSTART.md` - 快速开始指南 -- `HighLevelDesign.md` - 高层设计文档 -- `IMPLEMENTATION_SUMMARY.md` - 实现总结 -- `商品数据源入ES配置规范.md` - 数据源配置规范 diff --git a/设计文档.md b/设计文档.md new file mode 100644 index 0000000..7fb71f4 --- /dev/null +++ b/设计文档.md @@ -0,0 +1,536 @@ +# 搜索引擎通用化开发进度 + +## 项目概述 + +对后端搜索技术 做通用化。 +通用化的本质 是 对于各种业务数据、各种检索需求,都可以 用少量定制+配置化 来实现效果。 + + +**通用化的本质**:对于各种业务数据、各种检索需求,都可以用少量定制+配置化来实现效果。 + +--- + +## 1. 原始数据层的约定 + +所有租户共用主表、独立配置和扩展表,有自己独立的ES索引。 + +### 1.1 店匠主表 + +所有租户共用以下主表: +- `shoplazza_product_sku` - SKU级别商品数据 +- `shoplazza_product_spu` - SPU级别商品数据 + +### 1.2 每个租户的扩展表 + +各个租户有自己的扩展表,不同的租户根据不同的业务需要、以及不同的数据源,来定制自己的扩展表: +- 自定义属性体系 +- 多语言商品标题(中文、英文、俄文等) +- 品牌名、不同的类目和标签体系 +- 业务过滤和聚合字段 +- 权重(提权)字段 + +**数据关联方式**: +- 入索引时,商品主表 `shoplazza_product_sku` 的 `id` + `shopid` 与租户扩展表关联 +- 例如:`customer1_extension` 表存储 customer1 的自定义字段 + +### 1.3 配置化方案 + +统一通过配置文件定义: +1. ES 字段定义(字段类型、分析器、来源表/列) +2. ES mapping 结构生成 +3. 数据入库映射关系 + +--- + +## 2. 配置系统实现 + +### 2.1 应用结构配置(字段定义) + +**配置文件位置**:`config/schema/{customer_id}_config.yaml` + +**配置内容**:定义了 ES 的输入数据有哪些字段、关联 MySQL 的哪些字段。 + +**实现情况**: + +#### 字段类型支持 +- **TEXT**:文本字段,支持多语言分析器 +- **KEYWORD**:关键词字段,用于精确匹配和聚合 +- **TEXT_EMBEDDING**:文本向量字段(1024维,dot_product相似度) +- **IMAGE_EMBEDDING**:图片向量字段(1024维,dot_product相似度) +- **INT/LONG**:整数类型 +- **FLOAT/DOUBLE**:浮点数类型 +- **DATE**:日期类型 +- **BOOLEAN**:布尔类型 + +#### 分析器支持 +- **chinese_ecommerce**:中文电商分词器(index_ansj/query_ansj) +- **english**:英文分析器 +- **russian**:俄文分析器 +- **arabic**:阿拉伯文分析器 +- **spanish**:西班牙文分析器 +- **japanese**:日文分析器 +- **standard**:标准分析器 +- **keyword**:关键词分析器 + +#### 字段配置示例 + +```yaml +fields: + # 主键字段 + - name: "skuId" + type: "LONG" + source_table: "main" # 主表 + source_column: "id" + required: true + index: true + store: true + + # 多语言文本字段 + - name: "name" + type: "TEXT" + source_table: "extension" # 扩展表 + source_column: "name" + analyzer: "chinese_ecommerce" + boost: 2.0 + index: true + store: true + + - name: "enSpuName" + type: "TEXT" + source_table: "extension" + source_column: "enSpuName" + analyzer: "english" + boost: 2.0 + + - name: "ruSkuName" + type: "TEXT" + source_table: "extension" + source_column: "ruSkuName" + analyzer: "russian" + boost: 2.0 + + # 文本向量字段 + - name: "name_embedding" + type: "TEXT_EMBEDDING" + source_table: "extension" + source_column: "name" + embedding_dims: 1024 + embedding_similarity: "dot_product" + index: true + + # 图片向量字段 + - name: "image_embedding" + type: "IMAGE_EMBEDDING" + source_table: "extension" + source_column: "imageUrl" + embedding_dims: 1024 + embedding_similarity: "dot_product" + nested: false +``` + +**实现模块**: +- `config/config_loader.py` - 配置加载器 +- `config/field_types.py` - 字段类型定义 +- `indexer/mapping_generator.py` - ES mapping 生成器 +- `indexer/data_transformer.py` - 数据转换器 + +### 2.2 索引结构配置(查询域配置) + +**配置内容**:定义了 ES 的字段索引 mapping 配置,支持各个域的查询,包括默认域的查询。 + +**实现情况**: + +#### 域(Domain)配置 +每个域定义了: +- 域名称(如 `default`, `title`, `category`, `brand`) +- 域标签(中文描述) +- 搜索字段列表 +- 默认分析器 +- 权重(boost) +- **多语言字段映射**(`language_field_mapping`) + +#### 多语言字段映射 + +支持将不同语言的查询路由到对应的字段: + +```yaml +indexes: + - name: "default" + label: "默认索引" + fields: + - "name" + - "enSpuName" + - "ruSkuName" + - "categoryName" + - "brandName" + analyzer: "chinese_ecommerce" + boost: 1.0 + language_field_mapping: + zh: + - "name" + - "categoryName" + - "brandName" + en: + - "enSpuName" + ru: + - "ruSkuName" + + - name: "title" + label: "标题索引" + fields: + - "name" + - "enSpuName" + - "ruSkuName" + analyzer: "chinese_ecommerce" + boost: 2.0 + language_field_mapping: + zh: + - "name" + en: + - "enSpuName" + ru: + - "ruSkuName" +``` + +**工作原理**: +1. 检测查询语言(中文、英文、俄文等) +2. 如果查询语言在 `language_field_mapping` 中,使用原始查询搜索对应语言的字段 +3. 将查询翻译到其他支持的语言,分别搜索对应语言的字段 +4. 组合多个语言查询的结果,提高召回率 + +**实现模块**: +- `search/multilang_query_builder.py` - 多语言查询构建器 +- `query/query_parser.py` - 查询解析器(支持语言检测和翻译) + +--- + +## 3. 测试数据灌入 + +### 3.1 数据源 + +**主表**:`shoplazza_product_sku` +- 所有租户共用 +- 包含基础商品信息(id, shopid 等) + +**扩展表**:`customer1_extension` +- 每个租户独立 +- 包含自定义字段和多语言字段 + +### 3.2 数据灌入方式 + +**实现情况**: + +#### 命令行工具 +```bash +python main.py ingest \ + --customer customer1 \ + --csv-file data/customer1_data.csv \ + --es-host http://localhost:9200 \ + --recreate \ + --batch-size 100 +``` + +#### 数据流程 +1. **数据加载**:从 CSV 文件或 MySQL 数据库加载数据 +2. **数据转换**: + - 字段映射(根据配置将源字段映射到 ES 字段) + - 类型转换(字符串、数字、日期等) + - 向量生成(文本向量、图片向量) + - 向量缓存(避免重复计算) +3. **索引创建**: + - 根据配置生成 ES mapping + - 创建或更新索引 +4. **批量入库**: + - 批量写入 ES(默认每批 500 条) + - 错误处理和重试机制 + +#### 配置映射示例 + +**customer1_config.yaml** 配置: +```yaml +main_table: "shoplazza_product_sku" +extension_table: "customer1_extension" +es_index_name: "search_customer1" + +fields: + - name: "skuId" + source_table: "main" + source_column: "id" + - name: "name" + source_table: "extension" + source_column: "name" + - name: "enSpuName" + source_table: "extension" + source_column: "enSpuName" +``` + +**数据转换**: +- 主表字段:直接从 `shoplazza_product_sku` 表的 `id` 字段读取 +- 扩展表字段:从 `customer1_extension` 表的对应列读取 +- 向量字段:对源文本/图片生成向量并缓存 + +**实现模块**: +- `indexer/data_transformer.py` - 数据转换器 +- `indexer/bulk_indexer.py` - 批量索引器 +- `indexer/indexing_pipeline.py` - 索引流水线 +- `embeddings/bge_encoder.py` - 文本向量编码器 +- `embeddings/clip_image_encoder.py` - 图片向量编码器 + +--- + +## 4. QueryParser 实现 + + +### 4.1 查询改写(Query Rewriting) + +配置词典的key是query,value是改写后的查询表达式,比如。比如品牌词 改写为在brand|query OR name|query,类别词、标签词等都可以放进去。纠错、规范化、查询改写等 都可以通过这个词典来配置。 +**实现情况**: + +#### 配置方式 +在 `query_config.rewrite_dictionary` 中配置查询改写规则: + +```yaml +query_config: + enable_query_rewrite: true + rewrite_dictionary: + "芭比": "brand:芭比 OR name:芭比娃娃" + "玩具": "category:玩具" + "消防": "category:消防 OR name:消防" +``` + +#### 功能特性 +- **精确匹配**:查询完全匹配词典 key 时,替换为 value +- **部分匹配**:查询包含词典 key 时,替换该部分 +- **支持布尔表达式**:value 可以是复杂的布尔表达式(AND, OR, 域查询等) + +#### 实现模块 +- `query/query_rewriter.py` - 查询改写器 +- `query/query_parser.py` - 查询解析器(集成改写功能) + +### 4.2 翻译(Translation) + +**实现情况**: + +#### 配置方式 +```yaml +query_config: + supported_languages: + - "zh" + - "en" + - "ru" + default_language: "zh" + enable_translation: true + translation_service: "deepl" + translation_api_key: null # 通过环境变量设置 +``` + +#### 功能特性 +1. **语言检测**:自动检测查询语言 +2. **智能翻译**: + - 如果查询是中文,翻译为英文、俄文 + - 如果查询是英文,翻译为中文、俄文 + - 如果查询是其他语言,翻译为所有支持的语言 +3. **域感知翻译**: + - 如果域有 `language_field_mapping`,只翻译到映射中存在的语言 + - 避免不必要的翻译,提高效率 +4. **翻译缓存**:缓存翻译结果,避免重复调用 API + +#### 工作流程 +``` +查询输入 → 语言检测 → 确定目标语言 → 翻译 → 多语言查询构建 +``` + +#### 实现模块 +- `query/language_detector.py` - 语言检测器 +- `query/translator.py` - 翻译器(DeepL API) +- `query/query_parser.py` - 查询解析器(集成翻译功能) + +### 4.3 文本向量化(Text Embedding) + +如果配置打开了text_embedding查询,并且query 包含了default域的查询,那么要把default域的查询词转向量,后面searcher会用这个向量参与查询。 + +**实现情况**: + +#### 配置方式 +```yaml +query_config: + enable_text_embedding: true +``` + +#### 功能特性 +1. **条件生成**: + - 仅当 `enable_text_embedding=true` 时生成向量 + - 仅对 `default` 域查询生成向量 +2. **向量模型**:BGE-M3 模型(1024维向量) +3. **用途**:用于语义搜索(KNN 检索) + +#### 实现模块 +- `embeddings/bge_encoder.py` - BGE 文本编码器 +- `query/query_parser.py` - 查询解析器(集成向量生成) + +--- + +## 5. Searcher 实现 + +参考opensearch,他们自己定义的一套索引结构配置、支持自定义的一套检索表达式、排序表达式,这是各个客户进行配置化的基础,包括索引结构配置、排序策略配置。 +比如各种业务过滤策略 可以简单的通过表达式满足,比如brand|耐克 AND cate2|xxx。指定字段排序可以通过排序的表达式实现。 + +查询默认在default域,相也会对这个域的查询做一些相关性的重点优化,包括融合语义相关性、多语言相关性(可以基于配置 将查询翻译到指定语言并在对应的语言的字段进行查询)来弥补传统查询分析手段(比如查询改写 纠错 词权重等)的不足,也支持通过配置一些词表转为泛查询模式来优化相关性。 + +### 5.1 布尔表达式解析 + +**实现情况**: + +#### 支持的运算符 +- **AND**:所有项必须匹配 +- **OR**:任意项匹配 +- **RANK**:排序增强(类似 OR 但影响排序) +- **ANDNOT**:排除(第一项匹配,第二项不匹配) +- **()**:括号分组 + +#### 优先级(从高到低) +1. `()` - 括号 +2. `ANDNOT` - 排除 +3. `AND` - 与 +4. `OR` - 或 +5. `RANK` - 排序 + +#### 示例 +``` +laptop AND (gaming OR professional) ANDNOT cheap +``` + +#### 实现模块 +- `search/boolean_parser.py` - 布尔表达式解析器 +- `search/searcher.py` - 搜索器(集成布尔解析) + +### 5.2 多语言搜索 + +**实现情况**: + +#### 工作原理 +1. **查询解析**: + - 提取域(如 `title:查询` → 域=`title`,查询=`查询`) + - 检测查询语言 + - 生成翻译 +2. **多语言查询构建**: + - 如果域有 `language_field_mapping`: + - 使用检测到的语言查询对应字段(boost * 1.5) + - 使用翻译后的查询搜索其他语言字段(boost * 1.0) + - 如果域没有 `language_field_mapping`: + - 使用所有字段进行搜索 +3. **查询组合**: + - 多个语言查询组合为 `should` 子句 + - 提高召回率 + +#### 示例 +``` +查询: "芭比娃娃" +域: default +检测语言: zh + +生成的查询: +- 中文查询 "芭比娃娃" → 搜索 name, categoryName, brandName (boost * 1.5) +- 英文翻译 "Barbie doll" → 搜索 enSpuName (boost * 1.0) +- 俄文翻译 "Кукла Барби" → 搜索 ruSkuName (boost * 1.0) +``` + +#### 实现模块 +- `search/multilang_query_builder.py` - 多语言查询构建器 +- `search/searcher.py` - 搜索器(使用多语言构建器) + +### 5.3 相关性计算(Ranking) + +**实现情况**: + +#### 当前实现 +**公式**:`bm25() + 0.2 * text_embedding_relevance()` + +- **bm25()**:BM25 文本相关性得分 + - 包括多语言打分 + - 内部通过配置翻译为多种语言 + - 分别到对应的字段搜索 + - 中文字段使用中文分词器,英文字段使用英文分词器 +- **text_embedding_relevance()**:文本向量相关性得分(KNN 检索的打分) + - 权重:0.2 + +#### 配置方式 +```yaml +ranking: + expression: "bm25() + 0.2*text_embedding_relevance()" + description: "BM25 text relevance combined with semantic embedding similarity" +``` + +#### 扩展性 +- 支持表达式配置(未来可扩展) +- 支持自定义函数(如 `timeliness()`, `field_value()`) + +#### 实现模块 +- `search/ranking_engine.py` - 排序引擎 +- `search/searcher.py` - 搜索器(集成排序功能) + +--- + +## 6. 已完成功能总结 + +### 6.1 配置系统 +- ✅ 字段定义配置(类型、分析器、来源表/列) +- ✅ 索引域配置(多域查询、多语言映射) +- ✅ 查询配置(改写词典、翻译配置) +- ✅ 排序配置(表达式配置) +- ✅ 配置验证(字段存在性、类型检查、分析器匹配) + +### 6.2 数据索引 +- ✅ 数据转换(字段映射、类型转换) +- ✅ 向量生成(文本向量、图片向量) +- ✅ 向量缓存(避免重复计算) +- ✅ 批量索引(错误处理、重试机制) +- ✅ ES mapping 自动生成 + +### 6.3 查询处理 +- ✅ 查询改写(词典配置) +- ✅ 语言检测 +- ✅ 多语言翻译(DeepL API) +- ✅ 文本向量化(BGE-M3) +- ✅ 域提取(支持 `domain:query` 语法) + +### 6.4 搜索功能 +- ✅ 布尔表达式解析(AND, OR, RANK, ANDNOT, 括号) +- ✅ 多语言查询构建(语言路由、字段映射) +- ✅ 语义搜索(KNN 检索) +- ✅ 相关性排序(BM25 + 向量相似度) +- ✅ 结果聚合(Faceted Search) + +### 6.5 API 服务 +- ✅ RESTful API(FastAPI) +- ✅ 搜索接口(文本搜索、图片搜索) +- ✅ 文档查询接口 +- ✅ 前端界面(HTML + JavaScript) + +--- + +## 7. 技术栈 + +- **后端**:Python 3.6+ +- **搜索引擎**:Elasticsearch +- **数据库**:MySQL(Shoplazza) +- **向量模型**:BGE-M3(文本)、CN-CLIP(图片) +- **翻译服务**:DeepL API +- **API 框架**:FastAPI +- **前端**:HTML + JavaScript + +--- + +## 8. 配置文件示例 + +完整配置示例请参考:`config/schema/customer1_config.yaml` + +--- + +## 9. 相关文档 + +- `MULTILANG_FEATURE.md` - 多语言功能详细说明 +- `QUICKSTART.md` - 快速开始指南 +- `HighLevelDesign.md` - 高层设计文档 +- `IMPLEMENTATION_SUMMARY.md` - 实现总结 +- `商品数据源入ES配置规范.md` - 数据源配置规范 -- libgit2 0.21.2