Blame view

.cursor/plans/所有tenant按同一份所有_返回接口优化.md 9.85 KB
13377199   tangwang   接口优化
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
  <!-- b5a93a00-49d7-4266-8dbf-3d3f708334ed 23831e56-f1c5-48ab-8ed5-b11125ad0cf9 -->
  # SPU-Level Indexing with Shoplazza API Format & BASE Configuration
  
  ## Phase 1: Schema Analysis & Design
  
  ### 1.1 Analyze SPU/SKU Fields for Indexing
  
  **SPU Fields** (from `shoplazza_product_spu`):
  
  - **Index + Store**: `id`, `shop_id`, `handle`, `title`, `brief`, `vendor`, `product_type`, `tags`, `category`, `image_src`, `published`, `published_at`
  - **Index only**: `seo_title`, `seo_description`, `seo_keywords`
  - **Store only**: `description` (HTML, no tokenization)
  
  **SKU Fields** (from `shoplazza_product_sku`, as nested array):
  
  - **Index + Store**: `id`, `title`, `sku`, `barcode`, `price`, `compare_at_price`, `option1`, `option2`, `option3`, `inventory_quantity`, `image_src`
  
  **Price Strategy (CONFIRMED - Option B)**:
  
  - Flatten `min_price`, `max_price`, `compare_at_price` at SPU level for fast filtering/sorting
  - Keep full variant prices in nested array for display
  
  ### 1.2 Design BASE Schema
  
  **Index**: `search_products` (shared by all tenants)
  
  **Key fields**:
  
  - `tenant_id` (KEYWORD, **REQUIRED**) - always filtered, never optional
  - SPU-level flattened fields
  - `variants` (NESTED array) - SKU data
  - Flattened: `min_price`, `max_price`, `compare_at_price`
  - Multi-language: `title_zh`, `title_en`, `title_ru`, etc.
  - Embeddings: `title_embedding`, `image_embedding`
  
  ## Phase 2: BASE Configuration (Universal Standard)
  
  ### 2.1 Create BASE Config
  
  **File**: [`config/schema/base/config.yaml`](config/schema/base/config.yaml)
  
  **This is the universal configuration for ALL merchants using Shoplazza tables.**
  
  Key points:
  
  - Index name: `search_products` (shared)
  - Required field: `tenant_id` (always filtered in queries)
  - SPU-level fields with multi-language support
  - Nested variants structure
  - Flattened price fields: `min_price`, `max_price`, `compare_at_price`
  - Function_score configuration
  
  ### 2.2 Update Field Types & Mapping
  
  **Files**: [`config/field_types.py`](config/field_types.py), [`indexer/mapping_generator.py`](indexer/mapping_generator.py)
  
  - Add `NESTED` field type
  - Handle nested mapping generation for variants
  - Auto-generate flattened price fields
  
  ## Phase 3: Data Ingestion for BASE
  
  ### 3.1 SPU-Level Data Transformer
  
  **File**: [`indexer/spu_data_transformer.py`](indexer/spu_data_transformer.py)
  
  Features:
  
  - Load SPU from `shoplazza_product_spu`
  - Join SKU from `shoplazza_product_sku` (grouped by spu_id)
  - Create nested variants array
  - Calculate `min_price`, `max_price`, `compare_at_price`
  - Generate title & image embeddings
  - Inject `tenant_id` from config
  
  ### 3.2 Test Data Generator
  
  **File**: [`scripts/generate_shoplazza_test_data.py`](scripts/generate_shoplazza_test_data.py)
  
  Generate 100 SPU records with:
  
  - 10 categories, multiple vendors
  - Multi-language (zh/en/ru)
  - Price range: $5-$500
  - 1-5 variants per SPU (color, size options)
  - Insert into MySQL Shoplazza tables
  
  ### 3.3 BASE Ingestion Script
  
  **File**: [`scripts/ingest_base.py`](scripts/ingest_base.py)
  
  - Load from MySQL `shoplazza_product_spu` + `shoplazza_product_sku`
  - Use `SPUDataTransformer`
  - Index into `search_products` with configured `tenant_id`
  
  ## Phase 4: Query Updates
  
  ### 4.1 Query Builder Enhancements
  
  **File**: [`search/multilang_query_builder.py`](search/multilang_query_builder.py)
  
  - **Auto-inject `tenant_id` filter** (from config, always applied)
  - Support nested queries for variants
  - Use flattened price fields for filters: `min_price`, `max_price`
  
  ### 4.2 Searcher Updates
  
  **File**: [`search/searcher.py`](search/searcher.py)
  
  - Enforce `tenant_id` filtering
  - Handle nested inner_hits for variants
  
  ## Phase 5: API Response Transformation
  
  ### 5.1 Response Transformer
  
  **File**: [`api/response_transformer.py`](api/response_transformer.py)
  
  Transform ES response to Shoplazza format:
  
  - Extract variants from nested array
  - Map fields: `product_id`, `title`, `handle`, `vendor`, `product_type`, `tags`, `price`, `variants`, etc.
  - Calculate `in_stock` from variants
  
  ### 5.2 Update API Models
  
  **File**: [`api/models.py`](api/models.py)
  
  New models:
  
  - `VariantOption`, `ProductVariant`, `ProductResult`
  - Updated `SearchResponse` with `results: List[ProductResult]`
  - Add placeholders: `suggestions: List[str] = []`, `related_searches: List[str] = []`
  
  ### 5.3 Update Search Routes
  
  **File**: [`api/routes/search.py`](api/routes/search.py)
  
  - Use `ResponseTransformer` to convert ES hits
  - Return new Shoplazza-compatible format
  - **Ensure `tenant_id` is required in request**
  
  ## Phase 6: Legacy Migration
  
ae5a294d   tangwang   命名修改、代码清理
145
  ### 6.1 Rename Tenant1 to Legacy
13377199   tangwang   接口优化
146
  
ae5a294d   tangwang   命名修改、代码清理
147
148
  - Rename [`config/schema/tenant1/`](config/schema/tenant1/) to [`config/schema/tenant1_legacy/`](config/schema/tenant1_legacy/)
  - Update config to use old index `search_tenant1` (preserve for backward compatibility)
13377199   tangwang   接口优化
149
150
151
152
153
154
155
156
157
158
159
160
161
  - Mark as deprecated in comments
  
  ### 6.2 Update Scripts for BASE
  
  - [`run.sh`](run.sh): Use BASE config, `search_products` index
  - [`restart.sh`](restart.sh): Use BASE config
  - [`test_all.sh`](test_all.sh): Test BASE config
  - Legacy scripts: Rename with `_legacy` suffix (e.g., `run_legacy.sh`)
  
  ### 6.3 Update Frontend
  
  **Files**: [`frontend/`](frontend/) HTML/JS files
  
ae5a294d   tangwang   命名修改、代码清理
162
  - Change index name references from `search_tenant1` to `search_products`
13377199   tangwang   接口优化
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
  - Use BASE config endpoints
  - Archive old frontend as `frontend_legacy/` if needed
  
  ## Phase 7: API Documentation Updates
  
  ### 7.1 Update API Docs
  
  **File**: [`API_DOCUMENTATION.md`](API_DOCUMENTATION.md)
  
  **Critical additions**:
  
  - **Document `tenant_id` as REQUIRED parameter** in all search requests
  - Explain that `tenant_id` filter is always applied
  - Update all response examples to new Shoplazza format
  - Document `suggestions` and `related_searches` (not yet implemented)
  - Add nested variant query examples
  - Multi-tenant isolation guarantees
  
  ### 7.2 Update Request Models
  
  **File**: [`api/models.py`](api/models.py)
  
  Add `tenant_id` to `SearchRequest`:
  
  ```python
  class SearchRequest(BaseModel):
      tenant_id: str = Field(..., description="租户ID (必需)")
      query: str = Field(...)
      # ... other fields
  ```
  
  ## Phase 8: Design Documentation
  
  ### 8.1 Update Design Doc
  
  **File**: [`设计文档.md`](设计文档.md)
  
  Updates:
  
  - **索引粒度**: 改为 SPU 维度(非SKU)
  - **统一索引**: 所有租户共用 `search_products`,通过 `tenant_id` 隔离
  - **BASE配置**: 说明BASE配置为通用标准,所有新商户使用
  - **API响应格式**: 采用 Shoplazza 标准格式
  - **Price扁平化**: 说明高频字段的性能优化策略
  - **Nested变体**: 详细说明 variants 数组结构
ae5a294d   tangwang   命名修改、代码清理
208
  - **Legacy配置**: tenant1等为遗留配置,仅用于兼容
13377199   tangwang   接口优化
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
  
  ### 8.2 Create BASE Guide
  
  **File**: [`docs/BASE_CONFIG_GUIDE.md`](docs/BASE_CONFIG_GUIDE.md)
  
  Contents:
  
  - BASE configuration overview
  - How to generate test data
  - How to run ingestion for new tenant
  - Search examples
  - Response format examples
  - Multi-tenant isolation
  
  ### 8.3 Create Migration Guide
  
  **File**: [`docs/MIGRATION_TO_BASE.md`](docs/MIGRATION_TO_BASE.md)
  
  - Breaking changes from SKU-level to SPU-level
  - Response format changes
  - How existing deployments should migrate
  - Legacy config deprecation timeline
  
  ## Phase 9: Testing
  
  ### 9.1 Create Test Script
  
  **File**: [`scripts/test_base.sh`](scripts/test_base.sh)
  
  Steps:
  
  1. Generate 100 test SPU records
  2. Run BASE ingestion with tenant_id="test_tenant"
  3. Run searches, verify response format
  4. Test faceted search
  5. Verify multi-tenant isolation
  6. Verify `tenant_id` filtering
  
  ### 9.2 Integration Tests
  
  **File**: [`tests/test_base_integration.py`](tests/test_base_integration.py)
  
  - Test SPU-level indexing
  - Test nested variant queries
  - Test price filtering with flattened fields
  - Test tenant_id isolation
  - Test response transformation
  
  ## Key Architectural Decisions
  
  ### BASE Configuration Philosophy
  
ae5a294d   tangwang   命名修改、代码清理
261
  **BASE = Universal Standard**: All new merchants use BASE config with Shoplazza tables. No per-tenant schema customization. Customization happens through:
13377199   tangwang   接口优化
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
  
  - Configuration parameters (analyzers, function_score, etc.)
  - Extension tables (if needed for additional fields)
  - NOT through separate schemas
  
  ### Tenant Isolation
  
  **tenant_id is SACRED**:
  
  - Always present in queries (enforced at query builder)
  - Never optional
  - Guarantees data isolation between tenants
  - Documented prominently in API docs
  
  ### Price Flattening Rationale
  
  High-frequency operations (filtering, sorting) on price require optimal performance. Nested queries add overhead. Solution: Duplicate price data at SPU level (flattened) while maintaining full variant details in nested array.
  
  ### Legacy vs BASE
  
  - **BASE**: New standard, all future merchants
ae5a294d   tangwang   命名修改、代码清理
283
  - **Legacy (tenant1_legacy)**: Deprecated, exists only for backward compatibility
13377199   tangwang   接口优化
284
285
286
287
288
289
290
291
292
293
  - All scripts/frontend default to BASE
  - Legacy access requires explicit suffix (`_legacy`)
  
  ### To-dos
  
  - [ ] Analyze SPU/SKU fields and design unified schema with tenant_id, nested variants, and flattened price fields
  - [ ] Create unified schema config for multi-tenant SPU-level indexing
  - [ ] Add NESTED field type support to field_types.py and mapping generator
  - [ ] Create SPU-level data transformer that joins SPU+SKU tables and creates nested variant array
  - [ ] Create script to generate 100 realistic SPU+SKU test records in Shoplazza tables
ae5a294d   tangwang   命名修改、代码清理
294
295
  - [ ] Create tenant2 configuration using unified schema and Shoplazza tables only
  - [ ] Create tenant2 ingestion script that loads from MySQL Shoplazza tables
13377199   tangwang   接口优化
296
297
298
299
  - [ ] Update query builder to support tenant_id filtering and nested variant queries
  - [ ] Create response transformer to convert ES format to Shoplazza-compatible format
  - [ ] Update API models with new Shoplazza response format (ProductResult, variants, suggestions, etc.)
  - [ ] Update search routes to use response transformer and return new format
ae5a294d   tangwang   命名修改、代码清理
300
301
302
  - [ ] Migrate tenant1 configuration to unified schema and SPU-level indexing
  - [ ] Create tenant2 guide, update design docs, API docs, and create migration guide
  - [ ] Create comprehensive test script for tenant2 with data generation, ingestion, and search validation