Commit 06e7908273c7d5a5cda8cec9830907945f0f8c56

Authored by tangwang
1 parent 13377199

接口优化

.cursor/plans/将数据pipeline相关配置从索引配置中剥离.md 0 → 100644
... ... @@ -0,0 +1,469 @@
  1 +<!-- b5a93a00-49d7-4266-8dbf-3d3f708334ed c9ba91cf-2b58-440d-86d1-35b805e5d3cf -->
  2 +# Configuration and Pipeline Separation Refactoring
  3 +
  4 +## Overview
  5 +
  6 +Implement clean separation between **Search Configuration** (customer-facing, ES/search focused) and **Data Pipeline** (internal ETL, script-controlled). Configuration files will only contain search engine settings, while data source and transformation logic will be controlled entirely by script parameters.
  7 +
  8 +## Phase 1: Configuration File Cleanup
  9 +
  10 +### 1.1 Clean BASE Configuration
  11 +
  12 +**File**: [`config/schema/base/config.yaml`](config/schema/base/config.yaml)
  13 +
  14 +**Remove** (data pipeline concerns):
  15 +
  16 +- `mysql_config` section
  17 +- `main_table` field
  18 +- `sku_table` field
  19 +- `extension_table` field
  20 +- `source_table` in field definitions
  21 +- `source_column` in field definitions
  22 +
  23 +**Keep** (search configuration):
  24 +
  25 +- `customer_name`
  26 +- `es_index_name`
  27 +- `es_settings`
  28 +- `fields` (simplified, no source mapping)
  29 +- `indexes` (search domains)
  30 +- `query_config`
  31 +- `function_score`
  32 +- `rerank`
  33 +- `spu_config`
  34 +- `tenant_config` (as template)
  35 +- `default_facets`
  36 +
  37 +**Simplify field definitions**:
  38 +
  39 +```yaml
  40 +fields:
  41 + - name: "title"
  42 + type: "TEXT"
  43 + analyzer: "chinese_ecommerce"
  44 + boost: 3.0
  45 + index: true
  46 + store: true
  47 + # NO source_table, NO source_column
  48 +```
  49 +
  50 +### 1.2 Update Legacy Configuration
  51 +
  52 +**File**: [`config/schema/customer1_legacy/config.yaml`](config/schema/customer1_legacy/config.yaml)
  53 +
  54 +Apply same cleanup as BASE config, marking it as legacy in comments.
  55 +
  56 +## Phase 2: Transformer Architecture Refactoring
  57 +
  58 +### 2.1 Create Base Transformer Class
  59 +
  60 +**File**: [`indexer/base_transformer.py`](indexer/base_transformer.py) (NEW)
  61 +
  62 +Create abstract base class with shared logic:
  63 +
  64 +- `__init__` with config, encoders, cache
  65 +- `_convert_value()` - type conversion (shared)
  66 +- `_generate_text_embeddings()` - text embedding (shared)
  67 +- `_generate_image_embeddings()` - image embedding (shared)
  68 +- `_inject_tenant_id()` - tenant_id injection (shared)
  69 +- `@abstractmethod transform()` - to be implemented by subclasses
  70 +
  71 +### 2.2 Refactor DataTransformer
  72 +
  73 +**File**: [`indexer/data_transformer.py`](indexer/data_transformer.py)
  74 +
  75 +Changes:
  76 +
  77 +- Inherit from `BaseDataTransformer`
  78 +- Remove dependency on `source_table`, `source_column` from config
  79 +- Accept field mapping as parameter (from script)
  80 +- Implement `transform(df, field_mapping)` method
  81 +
  82 +### 2.3 Refactor SPUDataTransformer
  83 +
  84 +**File**: [`indexer/spu_data_transformer.py`](indexer/spu_data_transformer.py)
  85 +
  86 +Changes:
  87 +
  88 +- Inherit from `BaseDataTransformer`
  89 +- Remove dependency on config's table names
  90 +- Accept field mapping as parameter
  91 +- Implement `transform(spu_df, sku_df, spu_field_mapping, sku_field_mapping)` method
  92 +
  93 +### 2.4 Create Transformer Factory
  94 +
  95 +**File**: [`indexer/transformer_factory.py`](indexer/transformer_factory.py) (NEW)
  96 +
  97 +Factory to create appropriate transformer based on parameters:
  98 +
  99 +```python
  100 +class TransformerFactory:
  101 + @staticmethod
  102 + def create(
  103 + transformer_type: str, # 'sku' or 'spu'
  104 + config: CustomerConfig,
  105 + text_encoder=None,
  106 + image_encoder=None
  107 + ) -> BaseDataTransformer:
  108 + if transformer_type == 'spu':
  109 + return SPUDataTransformer(config, text_encoder, image_encoder)
  110 + elif transformer_type == 'sku':
  111 + return DataTransformer(config, text_encoder, image_encoder)
  112 + else:
  113 + raise ValueError(f"Unknown transformer type: {transformer_type}")
  114 +```
  115 +
  116 +### 2.5 Update Package Exports
  117 +
  118 +**File**: [`indexer/__init__.py`](indexer/**init**.py)
  119 +
  120 +Export new structure:
  121 +
  122 +```python
  123 +from .base_transformer import BaseDataTransformer
  124 +from .data_transformer import DataTransformer
  125 +from .spu_data_transformer import SPUDataTransformer
  126 +from .transformer_factory import TransformerFactory
  127 +
  128 +__all__ = [
  129 + 'BaseDataTransformer',
  130 + 'DataTransformer',
  131 + 'SPUDataTransformer',
  132 + 'TransformerFactory', # Recommended for new code
  133 + 'BulkIndexer',
  134 + 'IndexingPipeline',
  135 +]
  136 +```
  137 +
  138 +## Phase 3: Script Refactoring
  139 +
  140 +### 3.1 Create Unified Ingestion Script
  141 +
  142 +**File**: [`scripts/ingest_universal.py`](scripts/ingest_universal.py) (NEW)
  143 +
  144 +Universal ingestion script with full parameter control:
  145 +
  146 +**Parameters**:
  147 +
  148 +```bash
  149 +# Search configuration (pure)
  150 +--config base # Which search config to use
  151 +
  152 +# Runtime parameters
  153 +--tenant-id shop_12345 # REQUIRED tenant identifier
  154 +--es-host http://localhost:9200
  155 +--es-username elastic
  156 +--es-password xxx
  157 +
  158 +# Data source parameters (pipeline concern)
  159 +--data-source mysql # mysql, csv, api, etc.
  160 +--mysql-host 120.79.247.228
  161 +--mysql-port 3316
  162 +--mysql-database saas
  163 +--mysql-username saas
  164 +--mysql-password xxx
  165 +
  166 +# Transformer parameters (pipeline concern)
  167 +--transformer spu # spu or sku
  168 +--spu-table shoplazza_product_spu
  169 +--sku-table shoplazza_product_sku
  170 +--shop-id 1 # Filter by shop_id
  171 +
  172 +# Field mapping (optional, uses defaults if not provided)
  173 +--field-mapping mapping.json
  174 +
  175 +# Processing parameters
  176 +--batch-size 100
  177 +--limit 1000
  178 +--skip-embeddings
  179 +--recreate-index
  180 +```
  181 +
  182 +**Logic**:
  183 +
  184 +1. Load search config (clean, no data source info)
  185 +2. Set tenant_id from parameter
  186 +3. Connect to data source based on `--data-source` parameter
  187 +4. Load data from tables specified by parameters
  188 +5. Create transformer based on `--transformer` parameter
  189 +6. Apply field mapping (default or custom)
  190 +7. Transform and index
  191 +
  192 +### 3.2 Update BASE Ingestion Script
  193 +
  194 +**File**: [`scripts/ingest_base.py`](scripts/ingest_base.py)
  195 +
  196 +Update to use script parameters instead of config values:
  197 +
  198 +- Remove dependency on `config.mysql_config`
  199 +- Remove dependency on `config.main_table`, `config.sku_table`
  200 +- Get all data source info from command-line arguments
  201 +- Use TransformerFactory
  202 +
  203 +### 3.3 Create Field Mapping Helper
  204 +
  205 +**File**: [`scripts/field_mapping_generator.py`](scripts/field_mapping_generator.py) (NEW)
  206 +
  207 +Helper script to generate default field mappings:
  208 +
  209 +```python
  210 +# Generate default mapping for Shoplazza SPU schema
  211 +python scripts/field_mapping_generator.py \
  212 + --source shoplazza \
  213 + --level spu \
  214 + --output mappings/shoplazza_spu.json
  215 +```
  216 +
  217 +Output example:
  218 +
  219 +```json
  220 +{
  221 + "spu_fields": {
  222 + "id": "id",
  223 + "title": "title",
  224 + "description": "description",
  225 + ...
  226 + },
  227 + "sku_fields": {
  228 + "id": "id",
  229 + "price": "price",
  230 + "sku": "sku",
  231 + ...
  232 + }
  233 +}
  234 +```
  235 +
  236 +## Phase 4: Configuration Loader Updates
  237 +
  238 +### 4.1 Simplify ConfigLoader
  239 +
  240 +**File**: [`config/config_loader.py`](config/config_loader.py)
  241 +
  242 +Changes:
  243 +
  244 +- Remove parsing of `mysql_config`
  245 +- Remove parsing of `main_table`, `sku_table`, `extension_table`
  246 +- Remove validation of source_table/source_column in fields
  247 +- Simplify field parsing (no source mapping)
  248 +- Keep validation of ES/search related config
  249 +
  250 +### 4.2 Update CustomerConfig Model
  251 +
  252 +**File**: [`config/__init__.py`](config/**init**.py) or wherever CustomerConfig is defined
  253 +
  254 +Remove attributes:
  255 +
  256 +- `mysql_config`
  257 +- `main_table`
  258 +- `sku_table`
  259 +- `extension_table`
  260 +
  261 +Add attributes:
  262 +
  263 +- `tenant_id` (runtime, default None)
  264 +
  265 +Simplify FieldConfig:
  266 +
  267 +- Remove `source_table`
  268 +- Remove `source_column`
  269 +
  270 +## Phase 5: Documentation Updates
  271 +
  272 +### 5.1 Create Pipeline Guide
  273 +
  274 +**File**: [`docs/DATA_PIPELINE_GUIDE.md`](docs/DATA_PIPELINE_GUIDE.md) (NEW)
  275 +
  276 +Document:
  277 +
  278 +- Separation of concerns (config vs pipeline)
  279 +- How to use `ingest_universal.py`
  280 +- Default field mappings for common sources
  281 +- Custom field mapping examples
  282 +- Transformer selection guide
  283 +
  284 +### 5.2 Update BASE Config Guide
  285 +
  286 +**File**: [`docs/BASE_CONFIG_GUIDE.md`](docs/BASE_CONFIG_GUIDE.md)
  287 +
  288 +Update to reflect:
  289 +
  290 +- Config only contains search settings
  291 +- No data source configuration
  292 +- How tenant_id is injected at runtime
  293 +- Examples of using same config with different data sources
  294 +
  295 +### 5.3 Update API Documentation
  296 +
  297 +**File**: [`API_DOCUMENTATION.md`](API_DOCUMENTATION.md)
  298 +
  299 +No changes needed (API layer doesn't know about data pipeline).
  300 +
  301 +### 5.4 Update Design Documentation
  302 +
  303 +**File**: [`设计文档.md`](设计文档.md)
  304 +
  305 +Add section on configuration architecture:
  306 +
  307 +- Clear separation between search config and pipeline
  308 +- Benefits of this approach
  309 +- How to extend for new data sources
  310 +
  311 +## Phase 6: Create Default Field Mappings
  312 +
  313 +### 6.1 Shoplazza SPU Mapping
  314 +
  315 +**File**: [`mappings/shoplazza_spu.json`](mappings/shoplazza_spu.json) (NEW)
  316 +
  317 +Default field mapping for Shoplazza SPU/SKU tables to BASE config fields.
  318 +
  319 +### 6.2 Shoplazza SKU Mapping (Legacy)
  320 +
  321 +**File**: [`mappings/shoplazza_sku_legacy.json`](mappings/shoplazza_sku_legacy.json) (NEW)
  322 +
  323 +Default field mapping for legacy SKU-level indexing.
  324 +
  325 +### 6.3 CSV Template Mapping
  326 +
  327 +**File**: [`mappings/csv_template.json`](mappings/csv_template.json) (NEW)
  328 +
  329 +Example mapping for CSV data sources.
  330 +
  331 +## Phase 7: Testing & Validation
  332 +
  333 +### 7.1 Test Script with Different Sources
  334 +
  335 +Test `ingest_universal.py` with:
  336 +
  337 +1. MySQL Shoplazza tables (SPU level)
  338 +2. MySQL Shoplazza tables (SKU level, legacy)
  339 +3. CSV files (if time permits)
  340 +
  341 +### 7.2 Verify Configuration Portability
  342 +
  343 +Test same BASE config with:
  344 +
  345 +- Different data sources
  346 +- Different field mappings
  347 +- Different transformers
  348 +
  349 +### 7.3 Update Test Scripts
  350 +
  351 +**File**: [`scripts/test_base.sh`](scripts/test_base.sh)
  352 +
  353 +Update to use new script parameters.
  354 +
  355 +## Phase 8: Migration & Cleanup
  356 +
  357 +### 8.1 Create Migration Guide
  358 +
  359 +**File**: [`docs/CONFIG_MIGRATION_GUIDE.md`](docs/CONFIG_MIGRATION_GUIDE.md) (NEW)
  360 +
  361 +Guide for migrating from old config format to new:
  362 +
  363 +- What changed
  364 +- How to update existing configs
  365 +- How to update ingestion scripts
  366 +- Breaking changes
  367 +
  368 +### 8.2 Update Example Configs
  369 +
  370 +Update all example configurations to new format.
  371 +
  372 +### 8.3 Mark Old Scripts as Deprecated
  373 +
  374 +Add deprecation warnings to scripts that still use old config format.
  375 +
  376 +## Key Design Principles
  377 +
  378 +### 1. Separation of Concerns
  379 +
  380 +**Search Configuration** (customer-facing):
  381 +
  382 +- What fields exist in ES
  383 +- How fields are analyzed/indexed
  384 +- Search strategies and ranking
  385 +- Facets and aggregations
  386 +- Query processing rules
  387 +
  388 +**Data Pipeline** (internal):
  389 +
  390 +- Where data comes from
  391 +- How to connect to data sources
  392 +- Which tables/files to read
  393 +- How to transform data
  394 +- Field mapping logic
  395 +
  396 +### 2. Configuration Portability
  397 +
  398 +Same search config can be used with:
  399 +
  400 +- Different data sources (MySQL, CSV, API)
  401 +- Different schemas (with appropriate mapping)
  402 +- Different transformation strategies
  403 +
  404 +### 3. Flexibility
  405 +
  406 +Pipeline decisions (transformer, data source, field mapping) made at runtime, not in config.
  407 +
  408 +## Migration Path
  409 +
  410 +### For Existing Users
  411 +
  412 +1. Update config files (remove data source settings)
  413 +2. Update ingestion commands (add new parameters)
  414 +3. Optionally create field mapping files for convenience
  415 +
  416 +### For New Users
  417 +
  418 +1. Copy BASE config (already clean)
  419 +2. Run `ingest_universal.py` with appropriate parameters
  420 +3. Provide custom field mapping if needed
  421 +
  422 +## Success Criteria
  423 +
  424 +- [ ] BASE config contains ZERO data source information
  425 +- [ ] Same config works with MySQL and CSV sources
  426 +- [ ] Pipeline fully controlled by script parameters
  427 +- [ ] Transformers work with external field mapping
  428 +- [ ] Documentation clearly separates concerns
  429 +- [ ] Tests validate portability
  430 +- [ ] Migration guide provided
  431 +
  432 +## Estimated Effort
  433 +
  434 +- Configuration cleanup: 2 hours
  435 +- Transformer refactoring: 4-5 hours
  436 +- Script refactoring: 3-4 hours
  437 +- Config loader updates: 2 hours
  438 +- Documentation: 2-3 hours
  439 +- Testing & validation: 2-3 hours
  440 +- **Total: 15-19 hours**
  441 +
  442 +## Benefits
  443 +
  444 +✅ **Clean separation of concerns**
  445 +
  446 +✅ **Configuration reusability across data sources**
  447 +
  448 +✅ **Customer doesn't need to understand ETL**
  449 +
  450 +✅ **Easier to add new data sources**
  451 +
  452 +✅ **More flexible pipeline control**
  453 +
  454 +✅ **Reduced configuration complexity**
  455 +
  456 +### To-dos
  457 +
  458 +- [ ] Clean BASE and legacy configs: remove mysql_config, table names, source_table/source_column from fields
  459 +- [ ] Create BaseDataTransformer abstract class with shared logic (type conversion, embeddings, tenant_id)
  460 +- [ ] Refactor DataTransformer and SPUDataTransformer to inherit from base, accept field mapping as parameter
  461 +- [ ] Create TransformerFactory for creating transformers based on type parameter
  462 +- [ ] Create ingest_universal.py with full parameter control for data source, transformer, field mapping
  463 +- [ ] Update scripts/ingest_base.py to use parameters instead of config for data source
  464 +- [ ] Create field_mapping_generator.py and default mapping files (shoplazza_spu.json, etc.)
  465 +- [ ] Simplify ConfigLoader to only parse search config, remove data source parsing
  466 +- [ ] Create DATA_PIPELINE_GUIDE.md documenting pipeline approach and config separation
  467 +- [ ] Update BASE_CONFIG_GUIDE.md to reflect config-only-search-settings approach
  468 +- [ ] Create CONFIG_MIGRATION_GUIDE.md for migrating from old to new config format
  469 +- [ ] Test same config with different data sources and validate portability
0 470 \ No newline at end of file
... ...
.cursor/plans/spu-index-b5a93a00.plan.md renamed to .cursor/plans/所有tenant按同一份所有_返回接口优化.md
api/SearchEngine.code-workspace 0 → 100644
... ... @@ -0,0 +1,8 @@
  1 +{
  2 + "folders": [
  3 + {
  4 + "path": ".."
  5 + }
  6 + ],
  7 + "settings": {}
  8 +}
0 9 \ No newline at end of file
... ...
config/schema/base/config.yaml 0 → 100644
SHOPLAZZA_INTEGRATION_GUIDE.md renamed to docs/店匠相关资料/SHOPLAZZA_INTEGRATION_GUIDE.md
docs/店匠相关资料/店匠官方参考文档.md 0 → 100644
... ... @@ -0,0 +1,13 @@
  1 +### 13.1 官方文档
  2 +
  3 +- [店匠开发者文档](https://www.shoplazza.dev/reference/overview-29)
  4 +- [店匠 OAuth 文档](https://www.shoplazza.dev/v2024.07/reference/authentication)
  5 +- [店匠 API 参考](https://www.shoplazza.dev/v2024.07/reference/overview)
  6 +- [店匠 Webhook 文档](https://www.shoplazza.dev/v2024.07/reference/webhooks)
  7 +
  8 +### 13.2 技术栈文档
  9 +
  10 +- [OAuth 2.0 RFC 6749](https://tools.ietf.org/html/rfc6749)
  11 +- [Elasticsearch 官方文档](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html)
  12 +- [Liquid 模板语言](https://shopify.github.io/liquid/)
  13 +- [FastAPI 文档](https://fastapi.tiangolo.com/)
... ...
docs/店匠相关资料/搜索web后端调用python搜索接口.md 0 → 100644
... ... @@ -0,0 +1,261 @@
  1 +非常好的问题!让我详细解释一下搜索应用的调用链路和 OAuth 的关系。
  2 +
  3 +## 调用链路说明
  4 +
  5 +### 1. **OAuth 认证的作用**
  6 +
  7 +OAuth **不是用于前端搜索调用**的,它主要用于:
  8 +
  9 +```mermaid
  10 +graph LR
  11 + A[商家安装APP] --> B[OAuth授权]
  12 + B --> C[获取Access Token]
  13 + C --> D[后端拉取商品数据]
  14 + D --> E[建立ES索引]
  15 + E --> F[搜索服务就绪]
  16 +```
  17 +
  18 +**OAuth Token 的用途:**
  19 +- ✅ 你的后端调用店匠 Admin API(拉取商品、订单、客户数据)
  20 +- ✅ 注册 Webhook(接收数据变更通知)
  21 +- ❌ **不用于**前端买家搜索时的调用
  22 +
  23 +### 2. **前端搜索的实际调用链路**
  24 +
  25 +当买家在店铺搜索商品时:
  26 +
  27 +```
  28 +买家浏览器 → 搜索框组件(Liquid/JS) → 你的搜索API → Elasticsearch → 返回结果
  29 +```
  30 +
  31 +**关键点:**
  32 +- 前端 JavaScript **直接调用**你的公开搜索 API
  33 +- 不需要 OAuth Token
  34 +- 需要传递 `store_id` 参数来标识是哪个店铺
  35 +
  36 +### 3. **搜索接口的两种方案**
  37 +
  38 +[2 tools called]
  39 +
  40 +## 详细解答
  41 +
  42 +### 📍 **方案 A:前端直接调用(推荐用于公开搜索)**
  43 +
  44 +**流程:**
  45 +
  46 +```javascript
  47 +// 在店铺前端页面(买家浏览器中)
  48 +const response = await fetch('https://your-domain.com/api/search/products', {
  49 + method: 'POST',
  50 + headers: {
  51 + 'Content-Type': 'application/json'
  52 + },
  53 + body: JSON.stringify({
  54 + query: "蓝牙耳机",
  55 + tenant: "tenant_47167113-1", // 店铺标识
  56 + size: 24,
  57 + filters: {},
  58 + facets: ['product_type', 'vendor']
  59 + })
  60 +});
  61 +```
  62 +
  63 +**你的搜索 API 需要:**
  64 +
  65 +1. **允许跨域访问(CORS)**:
  66 +```python
  67 +# Python FastAPI 示例
  68 +from fastapi.middleware.cors import CORSMiddleware
  69 +
  70 +app.add_middleware(
  71 + CORSMiddleware,
  72 + allow_origins=["*"], # 或指定店匠域名白名单
  73 + allow_credentials=True,
  74 + allow_methods=["POST"],
  75 + allow_headers=["*"],
  76 +)
  77 +```
  78 +
  79 +2. **根据 store_id 隔离数据**:
  80 +```python
  81 +@app.post("/api/search/products")
  82 +async def search(request: SearchRequest):
  83 + # 从 tenant 参数提取 tenant_id
  84 + tenant_id = extract_tenant_id(request.tenant)
  85 +
  86 + # 使用租户专属索引
  87 + index_name = f"shoplazza_products_{tenant_id}"
  88 +
  89 + # 执行搜索
  90 + results = es_client.search(index=index_name, body=query)
  91 + return results
  92 +```
  93 +
  94 +3. **不需要 OAuth Token 认证**(因为是公开查询)
  95 +
  96 +---
  97 +
  98 +### 📍 **方案 B:通过 Java 后端中转(更安全)**
  99 +
  100 +**流程:**
  101 +
  102 +```
  103 +买家浏览器 → Java后端(/api/search/products?storeId=xxx) → Python搜索服务 → ES
  104 +```
  105 +
  106 +**Java 后端代码:**
  107 +
  108 +```java
  109 +@RestController
  110 +@RequestMapping("/api/search")
  111 +public class SearchController {
  112 +
  113 + @PostMapping("/products")
  114 + public ResponseEntity<SearchResponse> search(
  115 + @RequestParam String storeId, // 从URL参数获取店铺ID
  116 + @RequestBody SearchRequest request) {
  117 +
  118 + // 1. 验证店铺ID(可选:检查域名白名单)
  119 + ShopConfig shop = shopConfigMapper.selectByStoreId(storeId);
  120 + if (shop == null) {
  121 + return ResponseEntity.notFound().build();
  122 + }
  123 +
  124 + // 2. 添加租户隔离参数
  125 + request.setTenant("tenant_" + shop.getTenantId());
  126 +
  127 + // 3. 调用 Python 搜索服务
  128 + SearchResponse response = restTemplate.postForObject(
  129 + "http://localhost:6002/search/",
  130 + request,
  131 + SearchResponse.class
  132 + );
  133 +
  134 + // 4. 记录搜索日志
  135 + searchLogService.logSearch(shop.getId(), request.getQuery(), response.getTotal());
  136 +
  137 + return ResponseEntity.ok(response);
  138 + }
  139 +}
  140 +```
  141 +
  142 +**前端调用(带 store_id):**
  143 +
  144 +```javascript
  145 +const response = await fetch(
  146 + `https://your-domain.com/api/search/products?storeId=${config.storeId}`,
  147 + {
  148 + method: 'POST',
  149 + headers: { 'Content-Type': 'application/json' },
  150 + body: JSON.stringify({
  151 + query: "蓝牙耳机",
  152 + size: 24,
  153 + filters: {},
  154 + facets: ['product_type', 'vendor']
  155 + })
  156 + }
  157 +);
  158 +```
  159 +
  160 +---
  161 +
  162 +## 🔐 OAuth 认证在整个系统中的位置
  163 +
  164 +```mermaid
  165 +graph TB
  166 + subgraph "1. 商家安装阶段(使用OAuth)"
  167 + A[商家安装APP] --> B[OAuth授权]
  168 + B --> C[获取Access Token]
  169 + C --> D[存储Token到数据库]
  170 + end
  171 +
  172 + subgraph "2. 数据准备阶段(使用OAuth Token)"
  173 + D --> E[定时任务启动]
  174 + E --> F[使用Token调用店匠API]
  175 + F --> G[拉取商品/订单数据]
  176 + G --> H[建立ES索引]
  177 + end
  178 +
  179 + subgraph "3. 买家搜索阶段(不需要OAuth)"
  180 + I[买家访问店铺] --> J[输入搜索词]
  181 + J --> K[前端JS直接调用搜索API]
  182 + K --> L[搜索ES索引]
  183 + L --> M[返回结果]
  184 + end
  185 +
  186 + H -.索引建立后.-> L
  187 +```
  188 +
  189 +**关键理解:**
  190 +- **OAuth Token** = 你的后端 ↔ 店匠 Admin API(拉数据用)
  191 +- **前端搜索** = 买家浏览器 ↔ 你的搜索 API(不需要 OAuth)
  192 +
  193 +---
  194 +
  195 +## ✅ 你需要做的事情
  196 +
  197 +### 1. **搜索 API 设计**
  198 +
  199 +当前你的 Python 搜索服务应该已经支持:
  200 +
  201 +```python
  202 +POST http://your-domain:6002/search/
  203 +Content-Type: application/json
  204 +
  205 +{
  206 + "query": "蓝牙耳机",
  207 + "tenant": "tenant_1", // 重要!用于隔离不同店铺
  208 + "size": 20,
  209 + "filters": {},
  210 + "facets": ["product_type", "vendor"]
  211 +}
  212 +```
  213 +
  214 +### 2. **CORS 配置**(如果前端直接调用)
  215 +
  216 +在 Python FastAPI 服务中添加:
  217 +
  218 +```python
  219 +from fastapi.middleware.cors import CORSMiddleware
  220 +
  221 +app.add_middleware(
  222 + CORSMiddleware,
  223 + allow_origins=[
  224 + "https://*.myshoplaza.com", # 店匠域名
  225 + "https://your-domain.com"
  226 + ],
  227 + allow_methods=["POST", "GET"],
  228 + allow_headers=["*"],
  229 +)
  230 +```
  231 +
  232 +### 3. **店铺标识传递**
  233 +
  234 +在前端 Liquid 模板中获取店铺域名:
  235 +
  236 +```liquid
  237 +<script>
  238 +window.AI_SEARCH_CONFIG = {
  239 + storeId: "{{ shop.domain }}", // 店匠自动注入
  240 + apiEndpoint: "https://your-domain.com/api/search/products"
  241 +};
  242 +</script>
  243 +```
  244 +
  245 +### 4. **安全性考虑**
  246 +
  247 +- ✅ 租户隔离:根据 `tenant` 参数查询不同索引
  248 +- ✅ 域名白名单:只允许店匠域名调用
  249 +- ✅ 速率限制:防止恶意请求
  250 +- ❌ **不需要**在每次搜索时验证 OAuth Token
  251 +
  252 +---
  253 +
  254 +## 🎯 总结
  255 +
  256 +1. **OAuth 只用于后台数据同步**,不用于前端搜索
  257 +2. **前端搜索直接调用你的公开 API**,通过 `storeId` 参数隔离数据
  258 +3. **你的搜索接口可以是无认证的**,但必须做好租户隔离
  259 +4. **推荐方案**:前端 → 你的 Java 后端 → Python 搜索服务(便于日志统计和安全控制)
  260 +
  261 +有其他疑问吗?
0 262 \ No newline at end of file
... ...
docs/店匠相关资料/记录tenant和token-获取商品信息.md 0 → 100644
... ... @@ -0,0 +1,426 @@
  1 +让我详细解释这些任务,并查看相关的代码实现:
  2 +
  3 +[3 tools called]
  4 +
  5 +## 📋 详细任务解释
  6 +
  7 +让我逐一解释这些开发任务:
  8 +
  9 +---
  10 +
  11 +## 1️⃣ **OAuth 的迁移 - 参考 Go 的实现**
  12 +
  13 +**任务目标:** 将现有的 Go 语言 OAuth 实现迁移到 Java 项目中
  14 +
  15 +**具体工作:**
  16 +```java
  17 +// 需要实现的 Java 端点:
  18 +GET /oauth/install?shop={shop_domain} // 处理APP安装请求
  19 +GET /oauth/callback?code=xxx&shop=xxx // 处理OAuth授权回调
  20 +POST /partner/oauth/token // 换取/刷新 Access Token
  21 +```
  22 +
  23 +**参考的 Go 代码功能:**
  24 +- 生成授权 URL 并重定向
  25 +- 处理授权回调
  26 +- 用 code 换取 Access Token
  27 +- 解析 Token 响应并存储
  28 +
  29 +---
  30 +
  31 +## 2️⃣ **AccessToken 的存储 - 存储到 shoplazza_shop_config**
  32 +
  33 +**任务目标:** OAuth 成功后,将 Token 信息保存到数据库
  34 +
  35 +**示意图:**
  36 +```
  37 +店匠平台 搜索SaaS平台
  38 +----------- ----------------
  39 +[应用市场]
  40 + ↓
  41 +[商家安装APP] --------→ OAuth授权流程
  42 + ↓ ↓
  43 +[商家授权成功] --------→ 【第2项】创建租户+存储Token
  44 + ↓
  45 + system_tenant (新建)
  46 + shoplazza_shop_config (新建)
  47 + 存储 AccessToken 和 RefreshToken
  48 + ↓
  49 + 【第3项】定时刷新Token
  50 +```
  51 +
  52 + Token 的获取和使用流程
  53 +
  54 +```mermaid
  55 +sequenceDiagram
  56 + participant 商家
  57 + participant 店匠
  58 + participant 你的后端
  59 + participant 数据库
  60 +
  61 + Note over 商家,你的后端: 1. OAuth 授权阶段
  62 + 商家->>店匠: 安装 APP
  63 + 店匠->>你的后端: 跳转授权
  64 + 商家->>店匠: 同意授权
  65 + 店匠->>你的后端: 回调带 code
  66 + 你的后端->>店匠: 用 code 换 Token
  67 + 店匠->>你的后端: 返回 Access Token
  68 + 你的后端->>数据库: 存储到 shoplazza_shop_config
  69 +
  70 + Note over 你的后端,数据库: 2. 注册 Webhook 阶段
  71 + 你的后端->>数据库: 读取 Access Token
  72 + 你的后端->>店匠: 注册 Webhook (带 Access Token)
  73 + 店匠->>你的后端: Webhook 注册成功
  74 +```
  75 +
  76 +**核心逻辑:**
  77 +```java
  78 +@Transactional
  79 +public void handleOAuthCallback(TokenResponse tokenResponse) {
  80 + // 1. 检查租户是否存在,不存在则创建
  81 + Tenant tenant = tenantMapper.selectByStoreId(storeId);
  82 + if (tenant == null) {
  83 + tenant = new Tenant();
  84 + tenant.setName(storeName);
  85 + tenantMapper.insert(tenant); // 👈 创建新租户
  86 + }
  87 +
  88 + // 2. 创建或更新店铺配置
  89 + ShopConfig shop = shopConfigMapper.selectByStoreId(storeId);
  90 + if (shop == null) {
  91 + shop = new ShopConfig();
  92 + shop.setTenantId(tenant.getId());
  93 + shop.setStoreId(storeId);
  94 + shop.setStoreName(storeName);
  95 + }
  96 +
  97 + // 3. 保存 Token 信息
  98 + shop.setAccessToken(tokenResponse.getAccessToken()); // 👈 存储
  99 + shop.setRefreshToken(tokenResponse.getRefreshToken()); // 👈 存储
  100 + shop.setTokenExpiresAt(tokenResponse.getExpiresAt()); // 👈 存储
  101 + shop.setLocale(tokenResponse.getLocale());
  102 + shop.setStatus("active");
  103 +
  104 + shopConfigMapper.insertOrUpdate(shop);
  105 +}
  106 +```
  107 +
  108 +**数据表:** `shoplazza_shop_config`(已设计在文档第4章)
  109 +
  110 +### 📊 token数据库表关系
  111 +
  112 +```sql
  113 +-- shoplazza_shop_config 表中存储的数据
  114 ++----------+----------------+----------------------------------------+
  115 +| store_id | store_name | access_token |
  116 ++----------+----------------+----------------------------------------+
  117 +| 2286274 | 47167113-1 | V2WDYgkTvrN68QCESZ9eHb3EjpR6EB... | 👈 OAuth时保存
  118 ++----------+----------------+----------------------------------------+
  119 + ↓
  120 + 注册 Webhook 时读取使用
  121 +```
  122 +
  123 +### 🔐 Token 的两种用途
  124 +
  125 +**这个 Access Token 在你的系统中有两大用途:**
  126 +
  127 +1. **拉取数据** - 调用店匠 Admin API
  128 + - 拉取商品:`GET /openapi/2022-01/products`
  129 + - 拉取订单:`GET /openapi/2022-01/orders`
  130 + - 拉取客户:`GET /openapi/2022-01/customers`
  131 +
  132 +2. **注册 Webhook** - 让店匠主动推送数据变更
  133 + - 注册:`POST /openapi/2022-01/webhooks`(需要 Token)
  134 + - 接收:店匠推送到你的 `/webhook/shoplazza/{storeId}` 端点(不需要 Token)
  135 +
  136 +### ⚠️ 注意事项
  137 +
  138 +```java
  139 +// 注册 Webhook 前,确保 Token 有效
  140 +public void registerWebhooks(Long shopConfigId) {
  141 + ShopConfig shop = shopConfigMapper.selectById(shopConfigId);
  142 +
  143 + // 检查 Token 是否过期
  144 + if (shop.getTokenExpiresAt().before(new Date())) {
  145 + // Token 已过期,先刷新
  146 + tokenService.refreshToken(shop);
  147 + shop = shopConfigMapper.selectById(shopConfigId); // 重新读取
  148 + }
  149 +
  150 + // 使用有效的 Token 注册 Webhook
  151 + String accessToken = shop.getAccessToken();
  152 + // ... 注册逻辑
  153 +}
  154 +```
  155 +
  156 +---
  157 +
  158 +## 3️⃣ **RefreshToken 的实现 - 基于定时任务,需考虑对多家店铺的处理**
  159 +
  160 +**任务目标:** 自动刷新即将过期的 Access Token
  161 +
  162 +**实现方式:**
  163 +
  164 +```java
  165 +@Scheduled(cron = "0 0 2 * * ?") // 每天凌晨2点执行
  166 +public void refreshExpiringTokens() {
  167 + // 1. 查询7天内过期的所有店铺
  168 + DateTime sevenDaysLater = DateTime.now().plusDays(7);
  169 + List<ShopConfig> shops = shopConfigMapper.selectExpiringTokens(sevenDaysLater);
  170 +
  171 + // 2. 遍历每个店铺,刷新 Token
  172 + for (ShopConfig shop : shops) {
  173 + try {
  174 + TokenResponse newToken = oauthClient.refreshToken(
  175 + shop.getRefreshToken(),
  176 + clientId,
  177 + clientSecret
  178 + );
  179 +
  180 + // 3. 更新数据库中的 Token
  181 + shop.setAccessToken(newToken.getAccessToken());
  182 + shop.setRefreshToken(newToken.getRefreshToken());
  183 + shop.setTokenExpiresAt(newToken.getExpiresAt());
  184 + shopConfigMapper.updateById(shop);
  185 +
  186 + log.info("Token refreshed for shop: {}", shop.getStoreName());
  187 + } catch (Exception e) {
  188 + log.error("Failed to refresh token for shop: {}", shop.getStoreName(), e);
  189 + // 发送告警通知
  190 + }
  191 + }
  192 +}
  193 +```
  194 +
  195 +**关键点:**
  196 +- ✅ 批量处理多家店铺
  197 +- ✅ 提前7天刷新(避免过期)
  198 +- ✅ 异常处理和告警
  199 +
  200 +---
  201 +
  202 +## 4️⃣ **批量拉取商品信息的优化 - 验证分页查询**
  203 +
  204 +**任务目标:** 完善商品数据同步,确保分页正确处理
  205 +
  206 +**当前问题:** 代码可能只拉取了第一页数据,未正确遍历所有页
  207 +
  208 +**需要验证和优化:**
  209 +
  210 +```java
  211 +public void syncProducts(Long shopConfigId) {
  212 + ShopConfig shop = shopConfigMapper.selectById(shopConfigId);
  213 +
  214 + int page = 1;
  215 + int limit = 50;
  216 + boolean hasMore = true;
  217 +
  218 + while (hasMore) { // 👈 关键:循环直到没有更多数据
  219 + // 调用店匠 API
  220 + String url = String.format(
  221 + "https://%s/openapi/2022-01/products?page=%d&limit=%d",
  222 + shop.getStoreDomain(), page, limit
  223 + );
  224 +
  225 + ProductListResponse response = apiClient.get(url, shop.getAccessToken());
  226 +
  227 + // 判断是否还有更多数据
  228 + if (response.getProducts() == null || response.getProducts().isEmpty()) {
  229 + hasMore = false; // 👈 没有数据了,退出循环
  230 + break;
  231 + }
  232 +
  233 + // 保存当前页的商品
  234 + for (ProductDto product : response.getProducts()) {
  235 + saveProduct(shop.getTenantId(), shop.getStoreId(), product);
  236 + }
  237 +
  238 + page++; // 👈 下一页
  239 + Thread.sleep(100); // 避免触发速率限制
  240 + }
  241 +}
  242 +```
  243 +
  244 +**验证要点:**
  245 +- ✅ 分页参数正确传递
  246 +- ✅ 循环终止条件正确
  247 +- ✅ 处理空页面情况
  248 +- ✅ 速率限制控制
  249 +
  250 +---
  251 +
  252 +## 5️⃣ **批量拉取客户信息的优化 - 验证分页查询**
  253 +
  254 +**任务目标:** 与商品同步类似,完善客户数据同步
  255 +
  256 +**实现逻辑:**
  257 +```java
  258 +public void syncCustomers(Long shopConfigId) {
  259 + // 与 syncProducts 类似,遍历所有分页
  260 + String url = "https://{shop}/openapi/2022-01/customers?page={page}&limit=50";
  261 +
  262 + // 循环拉取所有页
  263 + // 保存到 shoplazza_customer 和 shoplazza_customer_address 表
  264 +}
  265 +```
  266 +
  267 +---
  268 +
  269 +## 6️⃣ **批量拉取订单信息的优化 - 验证分页查询**
  270 +
  271 +**任务目标:** 完善订单数据同步
  272 +
  273 +**实现逻辑:**
  274 +```java
  275 +public void syncOrders(Long shopConfigId) {
  276 + String url = "https://{shop}/openapi/2022-01/orders?page={page}&limit=50";
  277 +
  278 + // 保存到 shoplazza_order 和 shoplazza_order_item 表
  279 +}
  280 +```
  281 +
  282 +---
  283 +
  284 +## 7️⃣ **批量拉取店铺信息的实现 - 新增实现,需设计对应的数据库表**
  285 +
  286 +**任务目标:** 拉取店铺的详细配置信息
  287 +
  288 +**API 调用:**
  289 +```bash
  290 +GET /openapi/2022-01/shop
  291 +```
  292 +
  293 +**可能的响应字段:**
  294 +```json
  295 +{
  296 + "id": "2286274",
  297 + "name": "47167113-1",
  298 + "domain": "47167113-1.myshoplaza.com",
  299 + "email": "shop@example.com",
  300 + "currency": "USD",
  301 + "timezone": "Asia/Shanghai",
  302 + "locale": "zh-CN",
  303 + "address": {...},
  304 + "phone": "+86 123456789"
  305 +}
  306 +```
  307 +
  308 +**需要设计的数据表:**
  309 +```sql
  310 +CREATE TABLE `shoplazza_shop_info` (
  311 + `id` BIGINT NOT NULL AUTO_INCREMENT,
  312 + `store_id` VARCHAR(64) NOT NULL,
  313 + `shop_name` VARCHAR(255),
  314 + `domain` VARCHAR(255),
  315 + `email` VARCHAR(255),
  316 + `currency` VARCHAR(16),
  317 + `timezone` VARCHAR(64),
  318 + `locale` VARCHAR(16),
  319 + `phone` VARCHAR(64),
  320 + `address` JSON, -- 存储完整地址信息
  321 + `plan_name` VARCHAR(64), -- 套餐名称
  322 + `created_at` DATETIME,
  323 + `updated_at` DATETIME,
  324 + PRIMARY KEY (`id`),
  325 + UNIQUE KEY `uk_store_id` (`store_id`)
  326 +) COMMENT='店铺详细信息表';
  327 +```
  328 +
  329 +---
  330 +
  331 +## 8️⃣ **注册店铺的 Webhook - 新增实现,需考虑安全验证**
  332 +
  333 +**任务目标:** 为每个店铺注册 Webhook,接收实时数据变更通知
  334 +
  335 +**实现步骤:**
  336 +
  337 +### A. 注册 Webhook(后端主动调用)
  338 +
  339 +```java
  340 +@Service
  341 +public class WebhookService {
  342 +
  343 + private static final List<String> WEBHOOK_TOPICS = Arrays.asList(
  344 + "products/create", "products/update", "products/delete",
  345 + "orders/create", "orders/updated", "customers/create"
  346 + );
  347 +
  348 + public void registerWebhooks(Long shopConfigId) {
  349 + ShopConfig shop = shopConfigMapper.selectById(shopConfigId);
  350 + String webhookUrl = "https://your-domain.com/webhook/shoplazza/" + shop.getStoreId();
  351 +
  352 + for (String topic : WEBHOOK_TOPICS) {
  353 + // 调用店匠 API 注册
  354 + apiClient.post(
  355 + "https://" + shop.getStoreDomain() + "/openapi/2022-01/webhooks",
  356 + shop.getAccessToken(),
  357 + Map.of("address", webhookUrl, "topic", topic)
  358 + );
  359 + }
  360 + }
  361 +}
  362 +```
  363 +
  364 +### B. 接收 Webhook(店匠主动推送)
  365 +
  366 +```java
  367 +@RestController
  368 +@RequestMapping("/webhook/shoplazza")
  369 +public class WebhookController {
  370 +
  371 + @PostMapping("/{storeId}")
  372 + public ResponseEntity<String> handleWebhook(
  373 + @PathVariable String storeId,
  374 + @RequestHeader("X-Shoplazza-Hmac-Sha256") String signature, // 👈 安全验证
  375 + @RequestHeader("X-Shoplazza-Topic") String topic,
  376 + @RequestBody String payload) {
  377 +
  378 + // 1. 验证签名(安全验证)
  379 + if (!verifySignature(payload, signature, clientSecret)) {
  380 + return ResponseEntity.status(401).body("Invalid signature");
  381 + }
  382 +
  383 + // 2. 异步处理事件
  384 + webhookService.processAsync(storeId, topic, payload);
  385 +
  386 + // 3. 立即返回 200(店匠要求3秒内响应)
  387 + return ResponseEntity.ok("OK");
  388 + }
  389 +
  390 + // HMAC-SHA256 签名验证
  391 + private boolean verifySignature(String payload, String signature, String secret) {
  392 + Mac mac = Mac.getInstance("HmacSHA256");
  393 + mac.init(new SecretKeySpec(secret.getBytes(), "HmacSHA256"));
  394 + byte[] hash = mac.doFinal(payload.getBytes());
  395 + String computed = Base64.getEncoder().encodeToString(hash);
  396 + return computed.equals(signature);
  397 + }
  398 +}
  399 +```
  400 +
  401 +**安全验证关键点:**
  402 +- ✅ 使用 HMAC-SHA256 验证签名
  403 +- ✅ 签名密钥使用 APP 的 Client Secret
  404 +- ✅ 3秒内返回响应
  405 +- ✅ 异步处理事件,避免超时
  406 +
  407 +---
  408 +
  409 +
  410 +
  411 +
  412 +
  413 +
  414 +
  415 +
  416 +
  417 +
  418 +
  419 +
  420 +
  421 +
  422 +
  423 +
  424 +
  425 +
  426 +
... ...