<!-- b5a93a00-49d7-4266-8dbf-3d3f708334ed 23831e56-f1c5-48ab-8ed5-b11125ad0cf9 -->
SPU-Level Indexing with Shoplazza API Format & BASE Configuration
Phase 1: Schema Analysis & Design
1.1 Analyze SPU/SKU Fields for Indexing
SPU Fields (from shoplazza_product_spu):
- Index + Store:
id,shop_id,handle,title,brief,vendor,product_type,tags,category,image_src,published,published_at - Index only:
seo_title,seo_description,seo_keywords - Store only:
description(HTML, no tokenization)
SKU Fields (from shoplazza_product_sku, as nested array):
- Index + Store:
id,title,sku,barcode,price,compare_at_price,option1,option2,option3,inventory_quantity,image_src
Price Strategy (CONFIRMED - Option B):
- Flatten
min_price,max_price,compare_at_priceat SPU level for fast filtering/sorting - Keep full variant prices in nested array for display
1.2 Design BASE Schema
Index: search_products (shared by all tenants)
Key fields:
tenant_id(KEYWORD, REQUIRED) - always filtered, never optional- SPU-level flattened fields
variants(NESTED array) - SKU data- Flattened:
min_price,max_price,compare_at_price - Multi-language:
title_zh,title_en,title_ru, etc. - Embeddings:
title_embedding,image_embedding
Phase 2: BASE Configuration (Universal Standard)
2.1 Create BASE Config
File: <code>config/schema/base/config.yaml</code>
This is the universal configuration for ALL merchants using Shoplazza tables.
Key points:
- Index name:
search_products(shared) - Required field:
tenant_id(always filtered in queries) - SPU-level fields with multi-language support
- Nested variants structure
- Flattened price fields:
min_price,max_price,compare_at_price - Function_score configuration
2.2 Update Field Types & Mapping
Files: <code>config/field_types.py</code>, <code>indexer/mapping_generator.py</code>
- Add
NESTEDfield type - Handle nested mapping generation for variants
- Auto-generate flattened price fields
Phase 3: Data Ingestion for BASE
3.1 SPU-Level Data Transformer
File: <code>indexer/spu_data_transformer.py</code>
Features:
- Load SPU from
shoplazza_product_spu - Join SKU from
shoplazza_product_sku(grouped by spu_id) - Create nested variants array
- Calculate
min_price,max_price,compare_at_price - Generate title & image embeddings
- Inject
tenant_idfrom config
3.2 Test Data Generator
File: <code>scripts/generate_shoplazza_test_data.py</code>
Generate 100 SPU records with:
- 10 categories, multiple vendors
- Multi-language (zh/en/ru)
- Price range: $5-$500
- 1-5 variants per SPU (color, size options)
- Insert into MySQL Shoplazza tables
3.3 BASE Ingestion Script
File: <code>scripts/ingest_base.py</code>
- Load from MySQL
shoplazza_product_spu+shoplazza_product_sku - Use
SPUDataTransformer - Index into
search_productswith configuredtenant_id
Phase 4: Query Updates
4.1 Query Builder Enhancements
File: <code>search/multilang_query_builder.py</code>
- Auto-inject
tenant_idfilter (from config, always applied) - Support nested queries for variants
- Use flattened price fields for filters:
min_price,max_price
4.2 Searcher Updates
File: <code>search/searcher.py</code>
- Enforce
tenant_idfiltering - Handle nested inner_hits for variants
Phase 5: API Response Transformation
5.1 Response Transformer
File: <code>api/response_transformer.py</code>
Transform ES response to Shoplazza format:
- Extract variants from nested array
- Map fields:
product_id,title,handle,vendor,product_type,tags,price,variants, etc. - Calculate
in_stockfrom variants
5.2 Update API Models
File: <code>api/models.py</code>
New models:
VariantOption,ProductVariant,ProductResult- Updated
SearchResponsewithresults: List[ProductResult] - Add placeholders:
suggestions: List[str] = [],related_searches: List[str] = []
5.3 Update Search Routes
File: <code>api/routes/search.py</code>
- Use
ResponseTransformerto convert ES hits - Return new Shoplazza-compatible format
- Ensure
tenant_idis required in request
Phase 6: Legacy Migration
6.1 Rename Tenant1 to Legacy
- Rename <code>config/schema/tenant1/</code> to <code>config/schema/tenant1_legacy/</code>
- Update config to use old index
search_tenant1(preserve for backward compatibility) - Mark as deprecated in comments
6.2 Update Scripts for BASE
- <code>run.sh</code>: Use BASE config,
search_productsindex - <code>restart.sh</code>: Use BASE config
- <code>test_all.sh</code>: Test BASE config
- Legacy scripts: Rename with
_legacysuffix (e.g.,run_legacy.sh)
6.3 Update Frontend
Files: <code>frontend/</code> HTML/JS files
- Change index name references from
search_tenant1tosearch_products - Use BASE config endpoints
- Archive old frontend as
frontend_legacy/if needed
Phase 7: API Documentation Updates
7.1 Update API Docs
File: <code>API_DOCUMENTATION.md</code>
Critical additions:
- Document
tenant_idas REQUIRED parameter in all search requests - Explain that
tenant_idfilter is always applied - Update all response examples to new Shoplazza format
- Document
suggestionsandrelated_searches(not yet implemented) - Add nested variant query examples
- Multi-tenant isolation guarantees
7.2 Update Request Models
File: <code>api/models.py</code>
Add tenant_id to SearchRequest:
class SearchRequest(BaseModel):
tenant_id: str = Field(..., description="租户ID (必需)")
query: str = Field(...)
# ... other fields
Phase 8: Design Documentation
8.1 Update Design Doc
File: <code>设计文档.md</code>
Updates:
- 索引粒度: 改为 SPU 维度(非SKU)
- 统一索引: 所有租户共用
search_products,通过tenant_id隔离 - BASE配置: 说明BASE配置为通用标准,所有新商户使用
- API响应格式: 采用 Shoplazza 标准格式
- Price扁平化: 说明高频字段的性能优化策略
- Nested变体: 详细说明 variants 数组结构
- Legacy配置: tenant1等为遗留配置,仅用于兼容
8.2 Create BASE Guide
File: <code>docs/BASE_CONFIG_GUIDE.md</code>
Contents:
- BASE configuration overview
- How to generate test data
- How to run ingestion for new tenant
- Search examples
- Response format examples
- Multi-tenant isolation
8.3 Create Migration Guide
File: <code>docs/MIGRATION_TO_BASE.md</code>
- Breaking changes from SKU-level to SPU-level
- Response format changes
- How existing deployments should migrate
- Legacy config deprecation timeline
Phase 9: Testing
9.1 Create Test Script
File: <code>scripts/test_base.sh</code>
Steps:
- Generate 100 test SPU records
- Run BASE ingestion with tenant_id="test_tenant"
- Run searches, verify response format
- Test faceted search
- Verify multi-tenant isolation
- Verify
tenant_idfiltering
9.2 Integration Tests
File: <code>tests/test_base_integration.py</code>
- Test SPU-level indexing
- Test nested variant queries
- Test price filtering with flattened fields
- Test tenant_id isolation
- Test response transformation
Key Architectural Decisions
BASE Configuration Philosophy
BASE = Universal Standard: All new merchants use BASE config with Shoplazza tables. No per-tenant schema customization. Customization happens through:
- Configuration parameters (analyzers, function_score, etc.)
- Extension tables (if needed for additional fields)
- NOT through separate schemas
Tenant Isolation
tenant_id is SACRED:
- Always present in queries (enforced at query builder)
- Never optional
- Guarantees data isolation between tenants
- Documented prominently in API docs
Price Flattening Rationale
High-frequency operations (filtering, sorting) on price require optimal performance. Nested queries add overhead. Solution: Duplicate price data at SPU level (flattened) while maintaining full variant details in nested array.
Legacy vs BASE
- BASE: New standard, all future merchants
- Legacy (tenant1_legacy): Deprecated, exists only for backward compatibility
- All scripts/frontend default to BASE
- Legacy access requires explicit suffix (
_legacy)
To-dos
- [ ] Analyze SPU/SKU fields and design unified schema with tenant_id, nested variants, and flattened price fields
- [ ] Create unified schema config for multi-tenant SPU-level indexing
- [ ] Add NESTED field type support to field_types.py and mapping generator
- [ ] Create SPU-level data transformer that joins SPU+SKU tables and creates nested variant array
- [ ] Create script to generate 100 realistic SPU+SKU test records in Shoplazza tables
- [ ] Create tenant2 configuration using unified schema and Shoplazza tables only
- [ ] Create tenant2 ingestion script that loads from MySQL Shoplazza tables
- [ ] Update query builder to support tenant_id filtering and nested variant queries
- [ ] Create response transformer to convert ES format to Shoplazza-compatible format
- [ ] Update API models with new Shoplazza response format (ProductResult, variants, suggestions, etc.)
- [ ] Update search routes to use response transformer and return new format
- [ ] Migrate tenant1 configuration to unified schema and SPU-level indexing
- [ ] Create tenant2 guide, update design docs, API docs, and create migration guide
- [ ] Create comprehensive test script for tenant2 with data generation, ingestion, and search validation