<!-- b5a93a00-49d7-4266-8dbf-3d3f708334ed 23831e56-f1c5-48ab-8ed5-b11125ad0cf9 -->
# SPU-Level Indexing with Shoplazza API Format & BASE Configuration

## Phase 1: Schema Analysis & Design

### 1.1 Analyze SPU/SKU Fields for Indexing

**SPU Fields** (from `shoplazza_product_spu`):

- **Index + Store**: `id`, `shop_id`, `handle`, `title`, `brief`, `vendor`, `product_type`, `tags`, `category`, `image_src`, `published`, `published_at`
- **Index only**: `seo_title`, `seo_description`, `seo_keywords`
- **Store only**: `description` (HTML, no tokenization)

**SKU Fields** (from `shoplazza_product_sku`, as nested array):

- **Index + Store**: `id`, `title`, `sku`, `barcode`, `price`, `compare_at_price`, `option1`, `option2`, `option3`, `inventory_quantity`, `image_src`

**Price Strategy (CONFIRMED - Option B)**:

- Flatten `min_price`, `max_price`, `compare_at_price` at SPU level for fast filtering/sorting
- Keep full variant prices in nested array for display

### 1.2 Design BASE Schema

**Index**: `search_products` (shared by all tenants)

**Key fields**:

- `tenant_id` (KEYWORD, **REQUIRED**) - always filtered, never optional
- SPU-level flattened fields
- `variants` (NESTED array) - SKU data
- Flattened: `min_price`, `max_price`, `compare_at_price`
- Multi-language: `title_zh`, `title_en`, `title_ru`, etc.
- Embeddings: `title_embedding`, `image_embedding`

## Phase 2: BASE Configuration (Universal Standard)

### 2.1 Create BASE Config

**File**: [`config/schema/base/config.yaml`](config/schema/base/config.yaml)

**This is the universal configuration for ALL merchants using Shoplazza tables.**

Key points:

- Index name: `search_products` (shared)
- Required field: `tenant_id` (always filtered in queries)
- SPU-level fields with multi-language support
- Nested variants structure
- Flattened price fields: `min_price`, `max_price`, `compare_at_price`
- Function_score configuration

### 2.2 Update Field Types & Mapping

**Files**: [`config/field_types.py`](config/field_types.py), [`indexer/mapping_generator.py`](indexer/mapping_generator.py)

- Add `NESTED` field type
- Handle nested mapping generation for variants
- Auto-generate flattened price fields

## Phase 3: Data Ingestion for BASE

### 3.1 SPU-Level Data Transformer

**File**: [`indexer/spu_data_transformer.py`](indexer/spu_data_transformer.py)

Features:

- Load SPU from `shoplazza_product_spu`
- Join SKU from `shoplazza_product_sku` (grouped by spu_id)
- Create nested variants array
- Calculate `min_price`, `max_price`, `compare_at_price`
- Generate title & image embeddings
- Inject `tenant_id` from config

### 3.2 Test Data Generator

**File**: [`scripts/generate_shoplazza_test_data.py`](scripts/generate_shoplazza_test_data.py)

Generate 100 SPU records with:

- 10 categories, multiple vendors
- Multi-language (zh/en/ru)
- Price range: $5-$500
- 1-5 variants per SPU (color, size options)
- Insert into MySQL Shoplazza tables

### 3.3 BASE Ingestion Script

**File**: [`scripts/ingest_base.py`](scripts/ingest_base.py)

- Load from MySQL `shoplazza_product_spu` + `shoplazza_product_sku`
- Use `SPUDataTransformer`
- Index into `search_products` with configured `tenant_id`

## Phase 4: Query Updates

### 4.1 Query Builder Enhancements

**File**: [`search/multilang_query_builder.py`](search/multilang_query_builder.py)

- **Auto-inject `tenant_id` filter** (from config, always applied)
- Support nested queries for variants
- Use flattened price fields for filters: `min_price`, `max_price`

### 4.2 Searcher Updates

**File**: [`search/searcher.py`](search/searcher.py)

- Enforce `tenant_id` filtering
- Handle nested inner_hits for variants

## Phase 5: API Response Transformation

### 5.1 Response Transformer

**File**: [`api/response_transformer.py`](api/response_transformer.py)

Transform ES response to Shoplazza format:

- Extract variants from nested array
- Map fields: `product_id`, `title`, `handle`, `vendor`, `product_type`, `tags`, `price`, `variants`, etc.
- Calculate `in_stock` from variants

### 5.2 Update API Models

**File**: [`api/models.py`](api/models.py)

New models:

- `VariantOption`, `ProductVariant`, `ProductResult`
- Updated `SearchResponse` with `results: List[ProductResult]`
- Add placeholders: `suggestions: List[str] = []`, `related_searches: List[str] = []`

### 5.3 Update Search Routes

**File**: [`api/routes/search.py`](api/routes/search.py)

- Use `ResponseTransformer` to convert ES hits
- Return new Shoplazza-compatible format
- **Ensure `tenant_id` is required in request**

## Phase 6: Legacy Migration

### 6.1 Rename Tenant1 to Legacy

- Rename [`config/schema/tenant1/`](config/schema/tenant1/) to [`config/schema/tenant1_legacy/`](config/schema/tenant1_legacy/)
- Update config to use old index `search_tenant1` (preserve for backward compatibility)
- Mark as deprecated in comments

### 6.2 Update Scripts for BASE

- [`run.sh`](run.sh): Use BASE config, `search_products` index
- [`restart.sh`](restart.sh): Use BASE config
- [`test_all.sh`](test_all.sh): Test BASE config
- Legacy scripts: Rename with `_legacy` suffix (e.g., `run_legacy.sh`)

### 6.3 Update Frontend

**Files**: [`frontend/`](frontend/) HTML/JS files

- Change index name references from `search_tenant1` to `search_products`
- Use BASE config endpoints
- Archive old frontend as `frontend_legacy/` if needed

## Phase 7: API Documentation Updates

### 7.1 Update API Docs

**File**: [`API_DOCUMENTATION.md`](API_DOCUMENTATION.md)

**Critical additions**:

- **Document `tenant_id` as REQUIRED parameter** in all search requests
- Explain that `tenant_id` filter is always applied
- Update all response examples to new Shoplazza format
- Document `suggestions` and `related_searches` (not yet implemented)
- Add nested variant query examples
- Multi-tenant isolation guarantees

### 7.2 Update Request Models

**File**: [`api/models.py`](api/models.py)

Add `tenant_id` to `SearchRequest`:

```python
class SearchRequest(BaseModel):
    tenant_id: str = Field(..., description="租户ID (必需)")
    query: str = Field(...)
    # ... other fields
```

## Phase 8: Design Documentation

### 8.1 Update Design Doc

**File**: [`设计文档.md`](设计文档.md)

Updates:

- **索引粒度**: 改为 SPU 维度（非SKU）
- **统一索引**: 所有租户共用 `search_products`，通过 `tenant_id` 隔离
- **BASE配置**: 说明BASE配置为通用标准，所有新商户使用
- **API响应格式**: 采用 Shoplazza 标准格式
- **Price扁平化**: 说明高频字段的性能优化策略
- **Nested变体**: 详细说明 variants 数组结构
- **Legacy配置**: tenant1等为遗留配置，仅用于兼容

### 8.2 Create BASE Guide

**File**: [`docs/BASE_CONFIG_GUIDE.md`](docs/BASE_CONFIG_GUIDE.md)

Contents:

- BASE configuration overview
- How to generate test data
- How to run ingestion for new tenant
- Search examples
- Response format examples
- Multi-tenant isolation

### 8.3 Create Migration Guide

**File**: [`docs/MIGRATION_TO_BASE.md`](docs/MIGRATION_TO_BASE.md)

- Breaking changes from SKU-level to SPU-level
- Response format changes
- How existing deployments should migrate
- Legacy config deprecation timeline

## Phase 9: Testing

### 9.1 Create Test Script

**File**: [`scripts/test_base.sh`](scripts/test_base.sh)

Steps:

1. Generate 100 test SPU records
2. Run BASE ingestion with tenant_id="test_tenant"
3. Run searches, verify response format
4. Test faceted search
5. Verify multi-tenant isolation
6. Verify `tenant_id` filtering

### 9.2 Integration Tests

**File**: [`tests/test_base_integration.py`](tests/test_base_integration.py)

- Test SPU-level indexing
- Test nested variant queries
- Test price filtering with flattened fields
- Test tenant_id isolation
- Test response transformation

## Key Architectural Decisions

### BASE Configuration Philosophy

**BASE = Universal Standard**: All new merchants use BASE config with Shoplazza tables. No per-tenant schema customization. Customization happens through:

- Configuration parameters (analyzers, function_score, etc.)
- Extension tables (if needed for additional fields)
- NOT through separate schemas

### Tenant Isolation

**tenant_id is SACRED**:

- Always present in queries (enforced at query builder)
- Never optional
- Guarantees data isolation between tenants
- Documented prominently in API docs

### Price Flattening Rationale

High-frequency operations (filtering, sorting) on price require optimal performance. Nested queries add overhead. Solution: Duplicate price data at SPU level (flattened) while maintaining full variant details in nested array.

### Legacy vs BASE

- **BASE**: New standard, all future merchants
- **Legacy (tenant1_legacy)**: Deprecated, exists only for backward compatibility
- All scripts/frontend default to BASE
- Legacy access requires explicit suffix (`_legacy`)

### To-dos

- [ ] Analyze SPU/SKU fields and design unified schema with tenant_id, nested variants, and flattened price fields
- [ ] Create unified schema config for multi-tenant SPU-level indexing
- [ ] Add NESTED field type support to field_types.py and mapping generator
- [ ] Create SPU-level data transformer that joins SPU+SKU tables and creates nested variant array
- [ ] Create script to generate 100 realistic SPU+SKU test records in Shoplazza tables
- [ ] Create tenant2 configuration using unified schema and Shoplazza tables only
- [ ] Create tenant2 ingestion script that loads from MySQL Shoplazza tables
- [ ] Update query builder to support tenant_id filtering and nested variant queries
- [ ] Create response transformer to convert ES format to Shoplazza-compatible format
- [ ] Update API models with new Shoplazza response format (ProductResult, variants, suggestions, etc.)
- [ ] Update search routes to use response transformer and return new format
- [ ] Migrate tenant1 configuration to unified schema and SPU-level indexing
- [ ] Create tenant2 guide, update design docs, API docs, and create migration guide
- [ ] Create comprehensive test script for tenant2 with data generation, ingestion, and search validation