# SPU-Level Indexing with Shoplazza API Format & BASE Configuration ## Phase 1: Schema Analysis & Design ### 1.1 Analyze SPU/SKU Fields for Indexing **SPU Fields** (from `shoplazza_product_spu`): - **Index + Store**: `id`, `shop_id`, `handle`, `title`, `brief`, `vendor`, `product_type`, `tags`, `category`, `image_src`, `published`, `published_at` - **Index only**: `seo_title`, `seo_description`, `seo_keywords` - **Store only**: `description` (HTML, no tokenization) **SKU Fields** (from `shoplazza_product_sku`, as nested array): - **Index + Store**: `id`, `title`, `sku`, `barcode`, `price`, `compare_at_price`, `option1`, `option2`, `option3`, `inventory_quantity`, `image_src` **Price Strategy (CONFIRMED - Option B)**: - Flatten `min_price`, `max_price`, `compare_at_price` at SPU level for fast filtering/sorting - Keep full variant prices in nested array for display ### 1.2 Design BASE Schema **Index**: `search_products` (shared by all tenants) **Key fields**: - `tenant_id` (KEYWORD, **REQUIRED**) - always filtered, never optional - SPU-level flattened fields - `variants` (NESTED array) - SKU data - Flattened: `min_price`, `max_price`, `compare_at_price` - Multi-language: `title_zh`, `title_en`, `title_ru`, etc. - Embeddings: `title_embedding`, `image_embedding` ## Phase 2: BASE Configuration (Universal Standard) ### 2.1 Create BASE Config **File**: [`config/schema/base/config.yaml`](config/schema/base/config.yaml) **This is the universal configuration for ALL merchants using Shoplazza tables.** Key points: - Index name: `search_products` (shared) - Required field: `tenant_id` (always filtered in queries) - SPU-level fields with multi-language support - Nested variants structure - Flattened price fields: `min_price`, `max_price`, `compare_at_price` - Function_score configuration ### 2.2 Update Field Types & Mapping **Files**: [`config/field_types.py`](config/field_types.py), [`indexer/mapping_generator.py`](indexer/mapping_generator.py) - Add `NESTED` field type - Handle nested mapping generation for variants - Auto-generate flattened price fields ## Phase 3: Data Ingestion for BASE ### 3.1 SPU-Level Data Transformer **File**: [`indexer/spu_data_transformer.py`](indexer/spu_data_transformer.py) Features: - Load SPU from `shoplazza_product_spu` - Join SKU from `shoplazza_product_sku` (grouped by spu_id) - Create nested variants array - Calculate `min_price`, `max_price`, `compare_at_price` - Generate title & image embeddings - Inject `tenant_id` from config ### 3.2 Test Data Generator **File**: [`scripts/generate_shoplazza_test_data.py`](scripts/generate_shoplazza_test_data.py) Generate 100 SPU records with: - 10 categories, multiple vendors - Multi-language (zh/en/ru) - Price range: $5-$500 - 1-5 variants per SPU (color, size options) - Insert into MySQL Shoplazza tables ### 3.3 BASE Ingestion Script **File**: [`scripts/ingest_base.py`](scripts/ingest_base.py) - Load from MySQL `shoplazza_product_spu` + `shoplazza_product_sku` - Use `SPUDataTransformer` - Index into `search_products` with configured `tenant_id` ## Phase 4: Query Updates ### 4.1 Query Builder Enhancements **File**: [`search/multilang_query_builder.py`](search/multilang_query_builder.py) - **Auto-inject `tenant_id` filter** (from config, always applied) - Support nested queries for variants - Use flattened price fields for filters: `min_price`, `max_price` ### 4.2 Searcher Updates **File**: [`search/searcher.py`](search/searcher.py) - Enforce `tenant_id` filtering - Handle nested inner_hits for variants ## Phase 5: API Response Transformation ### 5.1 Response Transformer **File**: [`api/response_transformer.py`](api/response_transformer.py) Transform ES response to Shoplazza format: - Extract variants from nested array - Map fields: `product_id`, `title`, `handle`, `vendor`, `product_type`, `tags`, `price`, `variants`, etc. - Calculate `in_stock` from variants ### 5.2 Update API Models **File**: [`api/models.py`](api/models.py) New models: - `VariantOption`, `ProductVariant`, `ProductResult` - Updated `SearchResponse` with `results: List[ProductResult]` - Add placeholders: `suggestions: List[str] = []`, `related_searches: List[str] = []` ### 5.3 Update Search Routes **File**: [`api/routes/search.py`](api/routes/search.py) - Use `ResponseTransformer` to convert ES hits - Return new Shoplazza-compatible format - **Ensure `tenant_id` is required in request** ## Phase 6: Legacy Migration ### 6.1 Rename Tenant1 to Legacy - Rename [`config/schema/tenant1/`](config/schema/tenant1/) to [`config/schema/tenant1_legacy/`](config/schema/tenant1_legacy/) - Update config to use old index `search_tenant1` (preserve for backward compatibility) - Mark as deprecated in comments ### 6.2 Update Scripts for BASE - [`run.sh`](run.sh): Use BASE config, `search_products` index - [`restart.sh`](restart.sh): Use BASE config - [`test_all.sh`](test_all.sh): Test BASE config - Legacy scripts: Rename with `_legacy` suffix (e.g., `run_legacy.sh`) ### 6.3 Update Frontend **Files**: [`frontend/`](frontend/) HTML/JS files - Change index name references from `search_tenant1` to `search_products` - Use BASE config endpoints - Archive old frontend as `frontend_legacy/` if needed ## Phase 7: API Documentation Updates ### 7.1 Update API Docs **File**: [`API_DOCUMENTATION.md`](API_DOCUMENTATION.md) **Critical additions**: - **Document `tenant_id` as REQUIRED parameter** in all search requests - Explain that `tenant_id` filter is always applied - Update all response examples to new Shoplazza format - Document `suggestions` and `related_searches` (not yet implemented) - Add nested variant query examples - Multi-tenant isolation guarantees ### 7.2 Update Request Models **File**: [`api/models.py`](api/models.py) Add `tenant_id` to `SearchRequest`: ```python class SearchRequest(BaseModel): tenant_id: str = Field(..., description="租户ID (必需)") query: str = Field(...) # ... other fields ``` ## Phase 8: Design Documentation ### 8.1 Update Design Doc **File**: [`设计文档.md`](设计文档.md) Updates: - **索引粒度**: 改为 SPU 维度(非SKU) - **统一索引**: 所有租户共用 `search_products`,通过 `tenant_id` 隔离 - **BASE配置**: 说明BASE配置为通用标准,所有新商户使用 - **API响应格式**: 采用 Shoplazza 标准格式 - **Price扁平化**: 说明高频字段的性能优化策略 - **Nested变体**: 详细说明 variants 数组结构 - **Legacy配置**: tenant1等为遗留配置,仅用于兼容 ### 8.2 Create BASE Guide **File**: [`docs/BASE_CONFIG_GUIDE.md`](docs/BASE_CONFIG_GUIDE.md) Contents: - BASE configuration overview - How to generate test data - How to run ingestion for new tenant - Search examples - Response format examples - Multi-tenant isolation ### 8.3 Create Migration Guide **File**: [`docs/MIGRATION_TO_BASE.md`](docs/MIGRATION_TO_BASE.md) - Breaking changes from SKU-level to SPU-level - Response format changes - How existing deployments should migrate - Legacy config deprecation timeline ## Phase 9: Testing ### 9.1 Create Test Script **File**: [`scripts/test_base.sh`](scripts/test_base.sh) Steps: 1. Generate 100 test SPU records 2. Run BASE ingestion with tenant_id="test_tenant" 3. Run searches, verify response format 4. Test faceted search 5. Verify multi-tenant isolation 6. Verify `tenant_id` filtering ### 9.2 Integration Tests **File**: [`tests/test_base_integration.py`](tests/test_base_integration.py) - Test SPU-level indexing - Test nested variant queries - Test price filtering with flattened fields - Test tenant_id isolation - Test response transformation ## Key Architectural Decisions ### BASE Configuration Philosophy **BASE = Universal Standard**: All new merchants use BASE config with Shoplazza tables. No per-tenant schema customization. Customization happens through: - Configuration parameters (analyzers, function_score, etc.) - Extension tables (if needed for additional fields) - NOT through separate schemas ### Tenant Isolation **tenant_id is SACRED**: - Always present in queries (enforced at query builder) - Never optional - Guarantees data isolation between tenants - Documented prominently in API docs ### Price Flattening Rationale High-frequency operations (filtering, sorting) on price require optimal performance. Nested queries add overhead. Solution: Duplicate price data at SPU level (flattened) while maintaining full variant details in nested array. ### Legacy vs BASE - **BASE**: New standard, all future merchants - **Legacy (tenant1_legacy)**: Deprecated, exists only for backward compatibility - All scripts/frontend default to BASE - Legacy access requires explicit suffix (`_legacy`) ### To-dos - [ ] Analyze SPU/SKU fields and design unified schema with tenant_id, nested variants, and flattened price fields - [ ] Create unified schema config for multi-tenant SPU-level indexing - [ ] Add NESTED field type support to field_types.py and mapping generator - [ ] Create SPU-level data transformer that joins SPU+SKU tables and creates nested variant array - [ ] Create script to generate 100 realistic SPU+SKU test records in Shoplazza tables - [ ] Create tenant2 configuration using unified schema and Shoplazza tables only - [ ] Create tenant2 ingestion script that loads from MySQL Shoplazza tables - [ ] Update query builder to support tenant_id filtering and nested variant queries - [ ] Create response transformer to convert ES format to Shoplazza-compatible format - [ ] Update API models with new Shoplazza response format (ProductResult, variants, suggestions, etc.) - [ ] Update search routes to use response transformer and return new format - [ ] Migrate tenant1 configuration to unified schema and SPU-level indexing - [ ] Create tenant2 guide, update design docs, API docs, and create migration guide - [ ] Create comprehensive test script for tenant2 with data generation, ingestion, and search validation