fake 批量导入数据的脚步（多款式 )

脚本：scripts/csv_to_excel_multi_variant.py 主要功能：单一款式商品（S 类型）- 30% 商品属性为 S 不填写 option1/option2/option3 包含所有商品信息（标题、描述、价格、库存等）多款式商品（M+P 类型）- 70% M 行（商品主体）：商品属性为 M 填写商品主体信息（标题、描述、SEO、分类等） option1="color", option2="size", option3="material" 不填写价格、库存、SKU 等子款式信息 P 行（子款式）：商品属性为 P 商品标题与 M 行一致 option1/2/3 填写具体值（color、size、material 的笛卡尔积）每个 SKU 有独立的价格、库存、SKU 编码等多款式商品生成规则： Color（颜色）：从 color1 到 color30 中随机选择 2-10 个 Size（尺寸）：从 1-30 中随机选择 4-8 个 Material（材质）：从商品标题按空格分割后的最后一个字符串提取（去掉特殊字符）笛卡尔积：生成所有组合的 P 行（例如：3 个颜色 × 5 个尺寸 × 1 个材质 = 15 个 SKU）

fake 批量导入数据的脚步（多款式 )
脚本：scripts/csv_to_excel_multi_variant.py 主要功能：单一款式商品（S 类型）- 30% 商品属性为 S 不填写 option1/option2/option3 包含所有商品信息（标题、描述、价格、库存等）多款式商品（M+P 类型）- 70% M 行（商品主体）：商品属性为 M 填写商品主体信息（标题、描述、SEO、分类等） option1="color", option2="size", option3="material" 不填写价格、库存、SKU 等子款式信息 P 行（子款式）：商品属性为 P 商品标题与 M 行一致 option1/2/3 填写具体值（color、size、material 的笛卡尔积）每个 SKU 有独立的价格、库存、SKU 编码等多款式商品生成规则： Color（颜色）：从 color1 到 color30 中随机选择 2-10 个 Size（尺寸）：从 1-30 中随机选择 4-8 个 Material（材质）：从商品标题按空格分割后的最后一个字符串提取（去掉特殊字符）笛卡尔积：生成所有组合的 P 行（例如：3 个颜色 × 5 个尺寸 × 1 个材质 = 15 个 SKU）
tangwang
1 parent ca91352a
Showing 3 changed files with 824 additions and 65 deletions Show diff stats
CLAUDE.md
README.md
scripts/csv_to_excel_multi_variant.py
@@ -4,18 +4,25 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
  
 ## Project Overview
  
-This is a **Search Engine SaaS** project for e-commerce product search, designed for Shoplazza (店匠) independent sites. The system provides Elasticsearch-based product search capabilities with multi-language support, text/image embeddings, and configurable ranking.
+This is a **Search Engine SaaS** platform for e-commerce product search, specifically designed for Shoplazza (店匠) independent sites. It's a multi-tenant configurable search system built on Elasticsearch with AI-powered search capabilities.
  
 **Tech Stack:**
-- Elasticsearch as the search engine backend
+- Elasticsearch 8.x as the search engine backend
 - MySQL (Shoplazza database) as the primary data source
-- Python for data processing and ingestion
+- Python 3.10 with PyTorch/CUDA support
 - BGE-M3 model for text embeddings (1024-dim vectors)
 - CN-CLIP (ViT-H-14) for image embeddings
+- FastAPI for REST API layer
  
-## Database Configuration
+## Development Environment
  
-**Shoplazza Production Database:**
+**Required Environment Setup:**
+```bash
+source /home/tw/miniconda3/etc/profile.d/conda.sh
+conda activate searchengine
+```
+
+**Database Configuration:**
 ```
 host: 120.79.247.228
 port: 3316
@@ -24,85 +31,217 @@ username: saas
 password: P89cZHS5d7dFyc9R
 ```
  
-**Main Tables:**
-- `shoplazza_product_sku` - SKU level product data
-- `shoplazza_product_spu` - SPU level product data
+## Common Development Commands
  
-## Architecture
+### Environment Setup
+```bash
+# Complete environment setup
+./setup.sh
  
-### Data Flow
-1. **Data Source (MySQL)** → Main tables (`shoplazza_product_sku`, `shoplazza_product_spu`) + tenant extension tables
-2. **Indexer** → Reads from MySQL, applies transformations (embeddings, etc.), writes to Elasticsearch
-3. **Query Parser** → Query rewriting, translation, text embedding conversion
-4. **Searcher** → Executes searches against Elasticsearch with configurable ranking
+# Install Python dependencies
+pip install -r requirements.txt
+```
  
-### Multi-Tenant Design
-Each tenant has their own extension table to store custom attributes, multi-language fields (titles, brand names, tags, categories), and business-specific metadata. The main SKU table is joined with tenant extension tables during indexing.
+### Data Management
+```bash
+# Generate test data (Tenant1 Mock + Tenant2 CSV)
+./scripts/mock_data.sh
  
-### Configuration System
+# Ingest data to Elasticsearch
+./scripts/ingest.sh <tenant_id> [recreate]  # e.g., ./scripts/ingest.sh 1 true
+python main.py ingest data.csv --limit 1000 --batch-size 50
+```
  
-The system uses two types of configurations per tenant:
+### Running Services
+```bash
+# Start all services (production)
+./run.sh
  
-1. **Application Structure Config** (`IndexerConfig`) - Defines:
-   - Input field mappings from MySQL to Elasticsearch
-   - Field types (TEXT, EMBEDDING, LITERAL, INT, DOUBLE, etc.)
-   - Which fields require preprocessing (embeddings, transformations)
+# Start development server with auto-reload
+./scripts/start_backend.sh
+python main.py serve --host 0.0.0.0 --port 6002 --reload
  
-2. **Index Structure Config** - Defines:
-   - Elasticsearch field mappings and analyzers
-   - Supported analyzers: Chinese (ansj), English, Arabic, Spanish, Russian, Japanese
-   - Query domain definitions (default, category_name, title, brand_name, etc.)
-   - BM25 parameters and similarity configurations
+# Start frontend debugging UI
+./scripts/start_frontend.sh
+```
  
-### Query Processing
+### Testing
+```bash
+# Run all tests
+python -m pytest tests/
  
-The `queryParser` performs:
-1. **Query Rewriting** - Dictionary-based rewriting (brand terms, category terms, synonyms, corrections)
-2. **Translation** - Language detection and translation to support multi-language search (e.g., zh↔en)
-3. **Text Embedding** - Converts query text to vectors when vector search is enabled
+# Run specific test types
+python -m pytest tests/unit/          # Unit tests
+python -m pytest tests/integration/   # Integration tests
+python -m pytest -m "api"             # API tests only
  
-### Search and Ranking
+# Test search from command line
+python main.py search "query" --tenant-id 1 --size 10
+```
  
-The `searcher` supports:
-- Boolean operators: AND, OR, RANK, ANDNOT with parentheses
-- Operator precedence: `()` > `ANDNOT` > `AND` > `OR` > `RANK`
-- Configurable ranking expressions for the `default` domain:
-  - Example: `static_bm25() + 0.2*text_embedding_relevance() + general_score*2 + timeliness(end_time)`
-  - Combines BM25 text relevance, embedding similarity, product scores, and time decay
+### Development Utilities
+```bash
+# Stop all services
+./scripts/stop.sh
  
-### Embedding Modules
+# Test environment (for CI/development)
+./scripts/start_test_environment.sh
+./scripts/stop_test_environment.sh
  
-**Text Embedding** - Uses BGE-M3 model (`Xorbits/bge-m3`):
-- Singleton pattern with thread-safe initialization
-- Generates 1024-dimensional vectors
-- Configured for GPU/CUDA acceleration
+# Install server dependencies
+./scripts/install_server_deps.sh
+```
  
-**Image Embedding** - Uses CN-CLIP model (ViT-H-14):
-- Downloads and validates images from URLs
-- Preprocesses images (resize, RGB conversion)
-- Generates 1024-dimensional vectors
-- Supports both local and remote images
+## Architecture Overview
+
+### Core Components
+```
+/data/tw/SearchEngine/
+├── api/              # FastAPI REST API service (port 6002)
+├── config/           # Configuration management system
+├── indexer/          # MySQL → Elasticsearch data pipeline
+├── search/           # Search engine and ranking logic
+├── query/            # Query parsing, translation, rewriting
+├── embeddings/       # ML models (BGE-M3, CN-CLIP)
+├── scripts/          # Automation and utility scripts
+├── utils/            # Shared utilities (ES client, etc.)
+├── frontend/         # Simple debugging UI
+├── mappings/         # Elasticsearch index mappings
+└── tests/            # Unit and integration tests
+```
+
+### Data Flow Architecture
+**Pipeline**: MySQL → Indexer → Elasticsearch → API → Frontend
+
+1. **Data Source Layer**:
+   - Shoplazza MySQL database with `shoplazza_product_sku` and `shoplazza_product_spu` tables
+   - Tenant-specific extension tables for custom attributes and multi-language fields
+
+2. **Indexing Layer** (`indexer/`):
+   - Reads from MySQL, applies transformations with embeddings
+   - Uses `DataTransformer` and `IndexingPipeline` for batch processing
+   - Supports both full and incremental indexing with embedding caching
+
+3. **Query Processing Layer** (`query/`):
+   - `QueryParser`: Handles query rewriting, translation, and text embedding conversion
+   - Multi-language support with automatic detection and translation
+   - Boolean logic parsing with operator precedence: `()` > `ANDNOT` > `AND` > `OR` > `RANK`
+
+4. **Search Engine Layer** (`search/`):
+   - `Searcher`: Executes hybrid searches combining BM25 and dense vectors
+   - Configurable ranking expressions with function_score support
+   - Multi-tenant isolation via `tenant_id` field
+
+5. **API Layer** (`api/`):
+   - FastAPI service on port 6002 with multi-tenant support
+   - Text search: `POST /search/`
+   - Image search: `POST /image-search/`
+   - Tenant identification via `X-Tenant-ID` header
  
-## Test Data
-、
-**Tenant1 Test Dataset:**
-- Location: `data/tenant1/goods_with_pic.5years_congku.csv.shuf.1w`
-- Contains 10,000 shuffled product records with images
-- Processing script: `data/tenant1/task2_process_goods.py`
-  - Extracts product data from MySQL
-  - Maps images from filebank database
-  - Creates inverted index (URL → SKU list)
+### Multi-Tenant Configuration System
  
-## Key Implementation Notes
+The system uses centralized configuration through `config/config.yaml`:
  
-1. **Data Sync:** Full data sync from MySQL to Elasticsearch is handled by a separate Java project (not in this repo). This repo may include a simple full-load implementation for testing purposes.
+1. **Field Configuration** (`config/field_types.py`):
+   - Defines field types: TEXT, KEYWORD, EMBEDDING, INT, DOUBLE, etc.
+   - Specifies analyzers: Chinese (ansj), English, Arabic, Spanish, Russian, Japanese
+   - Required fields and preprocessing rules
  
-2. **Extension Tables:** When designing tenant configurations, determine which fields exist in the main SKU table vs. which need to be added to tenant-specific extension tables.
+2. **Index Configuration** (`mappings/search_products.json`):
+   - Unified index structure shared by all tenants
+   - Elasticsearch field mappings and analyzer configurations
+   - BM25 similarity with modified parameters (`b=0.0, k1=0.0`)
  
-3. **Embedding Caching:** For periodic full indexing, embedding results should be cached to avoid recomputation.
+3. **Query Configuration** (`search/query_config.py`):
+   - Query domain definitions (default, category_name, title, brand_name, etc.)
+   - Ranking expressions and function_score configurations
+   - Translation and embedding settings
+
+### Embedding Models
+
+**Text Embedding** (`embeddings/bge_encoder.py`):
+- Uses BGE-M3 model (`Xorbits/bge-m3`)
+- Singleton pattern with thread-safe initialization
+- Generates 1024-dimensional vectors with GPU/CUDA support
+- Configurable caching to avoid recomputation
  
-4. **ES Similarity Configuration:** All text fields use modified BM25 with `b=0.0, k1=0.0` as the default similarity.
+**Image Embedding** (`embeddings/clip_encoder.py`):
+- Uses CN-CLIP model (ViT-H-14)
+- Downloads and preprocesses images from URLs
+- Supports both local and remote image processing
+- Generates 1024-dimensional vectors
+
+### Search and Ranking
  
-5. **Multi-Language Support:** The system is designed for cross-border e-commerce with at minimum Chinese and English support, with extensibility for other languages (Arabic, Spanish, Russian, Japanese).
-- 记住这个项目的环境是source /home/tw/miniconda3/etc/profile.d/conda.sh && conda activate searchengine
+**Hybrid Search Approach**:
+- Combines traditional BM25 text relevance with dense vector similarity
+- Supports text embeddings (BGE-M3) and image embeddings (CN-CLIP)
+- Configurable ranking expressions like: `static_bm25() + 0.2*text_embedding_relevance() + general_score*2 + timeliness(end_time)`
+
+**Boolean Search Support**:
+- Full boolean logic with AND, OR, ANDNOT, RANK operators
+- Parentheses for complex query structures
+- Configurable operator precedence
+
+**Faceted Search**:
+- Terms and range faceting support
+- Multi-dimensional filtering capabilities
+- Configurable facet fields and aggregations
+
+## Testing Infrastructure
+
+**Test Framework**: pytest with async support
+
+**Test Structure**:
+- `tests/conftest.py`: Comprehensive test fixtures and configuration
+- `tests/unit/`: Unit tests for individual components
+- `tests/integration/`: Integration tests for system workflows
+- Test markers: `@pytest.mark.unit`, `@pytest.mark.integration`, `@pytest.mark.api`
+
+**Test Data**:
+- Tenant1: Mock data with 10,000 product records
+- Tenant2: CSV-based test dataset
+- Automated test data generation via `scripts/mock_data.sh`
+
+**Key Test Fixtures** (from `conftest.py`):
+- `sample_search_config`: Complete configuration for testing
+- `mock_es_client`: Mocked Elasticsearch client
+- `test_searcher`: Searcher instance with mock dependencies
+- `temp_config_file`: Temporary YAML configuration for tests
+
+## API Endpoints
+
+**Main API** (FastAPI on port 6002):
+- `POST /search/` - Text search with multi-language support
+- `POST /image-search/` - Image search using CN-CLIP embeddings
+- Health check and management endpoints
+- Multi-tenant support via `X-Tenant-ID` header
+
+**API Features**:
+- Hybrid search combining text and vector similarity
+- Configurable ranking and filtering
+- Faceted search with aggregations
+- Multi-language query processing and translation
+- Real-time search with configurable result sizes
+
+## Key Implementation Details
+
+1. **Environment Variables**: All sensitive configuration stored in `.env` (template: `.env.example`)
+2. **Configuration Management**: Dynamic config loading through `config/config_loader.py`
+3. **Error Handling**: Comprehensive error handling with proper HTTP status codes
+4. **Performance**: Batch processing for indexing, embedding caching, and connection pooling
+5. **Logging**: Structured logging with request tracing for debugging
+6. **Security**: Tenant isolation at the index level with proper access controls
+
+## Database Tables
+
+**Main Tables**:
+- `shoplazza_product_sku` - SKU level product data with pricing and inventory
+- `shoplazza_product_spu` - SPU level product data with categories and attributes
+- Tenant extension tables for custom fields and multi-language content
+
+**Data Processing**:
+- Full data sync handled by separate Java project (not in this repo)
+- This repository includes test implementations for development and debugging
+- Extension tables joined with main tables during indexing process
  
@@ -2,6 +2,10 @@
  
 一个针对跨境独立站（店匠 Shoplazza 等）的多租户可配置搜索平台。README 作为项目导航入口，帮助你在不同阶段定位到更详细的文档。
  
+## 项目环境
+source /home/tw/miniconda3/etc/profile.d/conda.sh
+conda activate searchengine
+
 ## 核心能力速览
  
 - **多语言 + 自动翻译**：中文、英文、俄文等语言检测与路由（BGE-M3、DeepL）
@@ -0,0 +1,616 @@
+#!/usr/bin/env python3
+"""
+Convert CSV data to Excel import template with multi-variant support.
+
+Reads CSV file (goods_with_pic.5years_congku.csv.shuf.1w) and generates Excel file
+based on the template format (商品导入模板.xlsx).
+
+Features:
+- 30% products as Single variant (S type)
+- 70% products as Multi variant (M+P type) with color, size, material options
+"""
+
+import sys
+import os
+import csv
+import random
+import argparse
+import re
+from pathlib import Path
+from datetime import datetime, timedelta
+import itertools
+from openpyxl import load_workbook
+from openpyxl.styles import Alignment
+
+# Add parent directory to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+# Color definitions
+COLORS = [
+    "Red", "Blue", "Green", "Yellow", "Black", "White", "Orange", "Purple",
+    "Pink", "Brown", "Gray", "Navy", "Beige", "Cream", "Maroon", "Olive",
+    "Teal", "Cyan", "Magenta", "Lime", "Indigo", "Gold", "Silver", "Bronze",
+    "Coral", "Turquoise", "Violet", "Khaki", "Charcoal", "Ivory"
+]
+
+
+def clean_value(value):
+    """
+    Clean and normalize value.
+    
+    Args:
+        value: Value to clean
+        
+    Returns:
+        Cleaned string value
+    """
+    if value is None:
+        return ''
+    value = str(value).strip()
+    # Remove surrounding quotes
+    if value.startswith('"') and value.endswith('"'):
+        value = value[1:-1]
+    return value
+
+
+def parse_csv_row(row: dict) -> dict:
+    """
+    Parse CSV row and extract fields.
+    
+    Args:
+        row: CSV row dictionary
+        
+    Returns:
+        Parsed data dictionary
+    """
+    return {
+        'skuId': clean_value(row.get('skuId', '')),
+        'name': clean_value(row.get('name', '')),
+        'name_pinyin': clean_value(row.get('name_pinyin', '')),
+        'create_time': clean_value(row.get('create_time', '')),
+        'ruSkuName': clean_value(row.get('ruSkuName', '')),
+        'enSpuName': clean_value(row.get('enSpuName', '')),
+        'categoryName': clean_value(row.get('categoryName', '')),
+        'supplierName': clean_value(row.get('supplierName', '')),
+        'brandName': clean_value(row.get('brandName', '')),
+        'file_id': clean_value(row.get('file_id', '')),
+        'days_since_last_update': clean_value(row.get('days_since_last_update', '')),
+        'id': clean_value(row.get('id', '')),
+        'imageUrl': clean_value(row.get('imageUrl', ''))
+    }
+
+
+def generate_handle(title: str) -> str:
+    """
+    Generate URL-friendly handle from title.
+    
+    Args:
+        title: Product title
+        
+    Returns:
+        URL-friendly handle (ASCII only)
+    """
+    # Convert to lowercase
+    handle = title.lower()
+    
+    # Remove non-ASCII characters, keep only letters, numbers, spaces, and hyphens
+    handle = re.sub(r'[^a-z0-9\s-]', '', handle)
+    
+    # Replace spaces and multiple hyphens with single hyphen
+    handle = re.sub(r'[-\s]+', '-', handle)
+    handle = handle.strip('-')
+    
+    # Limit length
+    if len(handle) > 255:
+        handle = handle[:255]
+    
+    return handle or 'product'
+
+
+def extract_material_from_title(title: str) -> str:
+    """
+    Extract material from title by taking the last word after splitting by space.
+    
+    按照商品标题空格分割后的最后一个字符串作为material。
+    例如："消防套 塑料【英文包装】" -> 最后一个字符串是 "塑料【英文包装】"
+    
+    Args:
+        title: Product title
+        
+    Returns:
+        Material string (single value)
+    """
+    if not title:
+        return 'default'
+    
+    # Split by spaces (只按空格分割，保持原样)
+    parts = title.strip().split()
+    if parts:
+        # Get last part (最后一个字符串)
+        material = parts[-1]
+        # Remove brackets but keep content
+        material = re.sub(r'[【】\[\]()（）]', '', material)
+        material = material.strip()
+        if material:
+            return material
+    
+    return 'default'
+
+
+def generate_single_variant_row(csv_data: dict, base_sku_id: int = 1) -> dict:
+    """
+    Generate Excel row for Single variant (S type) product.
+    
+    Args:
+        csv_data: Parsed CSV row data
+        base_sku_id: Base SKU ID for generating SKU code
+        
+    Returns:
+        Dictionary mapping Excel column names to values
+    """
+    # Parse create_time
+    try:
+        created_at = datetime.strptime(csv_data['create_time'], '%Y-%m-%d %H:%M:%S')
+        create_time_str = created_at.strftime('%Y-%m-%d %H:%M:%S')
+    except:
+        created_at = datetime.now() - timedelta(days=random.randint(1, 365))
+        create_time_str = created_at.strftime('%Y-%m-%d %H:%M:%S')
+    
+    # Generate title - use name or enSpuName
+    title = csv_data['name'] or csv_data['enSpuName'] or 'Product'
+    
+    # Generate handle - prefer enSpuName, then name_pinyin, then title
+    handle_source = csv_data['enSpuName'] or csv_data['name_pinyin'] or title
+    handle = generate_handle(handle_source)
+    if handle and not handle.startswith('products/'):
+        handle = f'products/{handle}'
+    
+    # Generate SEO fields
+    seo_title = f"{title} - {csv_data['categoryName']}" if csv_data['categoryName'] else title
+    seo_description = f"购买{csv_data['brandName']}{title}" if csv_data['brandName'] else title
+    seo_keywords_parts = [title]
+    if csv_data['categoryName']:
+        seo_keywords_parts.append(csv_data['categoryName'])
+    if csv_data['brandName']:
+        seo_keywords_parts.append(csv_data['brandName'])
+    seo_keywords = ','.join(seo_keywords_parts)
+    
+    # Generate tags from category and brand
+    tags_parts = []
+    if csv_data['categoryName']:
+        tags_parts.append(csv_data['categoryName'])
+    if csv_data['brandName']:
+        tags_parts.append(csv_data['brandName'])
+    tags = ','.join(tags_parts) if tags_parts else ''
+    
+    # Generate prices
+    price = round(random.uniform(50, 500), 2)
+    compare_at_price = round(price * random.uniform(1.2, 1.5), 2)
+    cost_price = round(price * 0.6, 2)
+    
+    # Generate random stock
+    inventory_quantity = random.randint(0, 100)
+    
+    # Generate random weight
+    weight = round(random.uniform(0.1, 5.0), 2)
+    weight_unit = 'kg'
+    
+    # Use skuId as SKU code
+    sku_code = csv_data['skuId'] or f'SKU-{base_sku_id}'
+    
+    # Generate barcode
+    try:
+        sku_id = int(csv_data['skuId']) if csv_data['skuId'] else base_sku_id
+        barcode = f"BAR{sku_id:08d}"
+    except:
+        barcode = f"BAR{base_sku_id:08d}"
+    
+    # Build description
+    description = f"<p>{csv_data['name']}</p>" if csv_data['name'] else ''
+    
+    # Build brief (subtitle)
+    brief = csv_data['name'] or ''
+    
+    # Excel row data
+    excel_row = {
+        '商品ID': '',  # Empty for new products
+        '创建时间': create_time_str,
+        '商品标题*': title,
+        '商品属性*': 'S',  # Single variant product
+        '商品副标题': brief,
+        '商品描述': description,
+        'SEO标题': seo_title,
+        'SEO描述': seo_description,
+        'SEO URL Handle': handle,
+        'SEO URL 重定向': 'N',
+        'SEO关键词': seo_keywords,
+        '商品上架': 'Y',
+        '需要物流': 'Y',
+        '商品收税': 'N',
+        '商品spu': '',
+        '启用虚拟销量': 'N',
+        '虚拟销量值': '',
+        '跟踪库存': 'Y',
+        '库存规则*': '1',
+        '专辑名称': csv_data['categoryName'] or '',
+        '标签': tags,
+        '供应商名称': csv_data['supplierName'] or '',
+        '供应商URL': '',
+        '款式1': '',  # Empty for S type
+        '款式2': '',  # Empty for S type
+        '款式3': '',  # Empty for S type
+        '商品售价*': price,
+        '商品原价': compare_at_price,
+        '成本价': cost_price,
+        '商品SKU': sku_code,
+        '商品重量': weight,
+        '重量单位': weight_unit,
+        '商品条形码': barcode,
+        '商品库存': inventory_quantity,
+        '尺寸信息': '',
+        '原产地国别': '',
+        'HS（协调制度）代码': '',
+        '商品图片*': csv_data['imageUrl'] or '',
+        '商品备注': '',
+        '款式备注': '',
+        '商品主图': csv_data['imageUrl'] or '',
+    }
+    
+    return excel_row
+
+
+def generate_multi_variant_rows(csv_data: dict, base_sku_id: int = 1) -> list:
+    """
+    Generate Excel rows for Multi variant (M+P type) product.
+    
+    Returns a list of rows:
+    - First row: M (主商品) with option names
+    - Following rows: P (子款式) with option values
+    
+    Args:
+        csv_data: Parsed CSV row data
+        base_sku_id: Base SKU ID for generating SKU codes
+        
+    Returns:
+        List of dictionaries mapping Excel column names to values
+    """
+    rows = []
+    
+    # Parse create_time
+    try:
+        created_at = datetime.strptime(csv_data['create_time'], '%Y-%m-%d %H:%M:%S')
+        create_time_str = created_at.strftime('%Y-%m-%d %H:%M:%S')
+    except:
+        created_at = datetime.now() - timedelta(days=random.randint(1, 365))
+        create_time_str = created_at.strftime('%Y-%m-%d %H:%M:%S')
+    
+    # Generate title
+    title = csv_data['name'] or csv_data['enSpuName'] or 'Product'
+    
+    # Generate handle
+    handle_source = csv_data['enSpuName'] or csv_data['name_pinyin'] or title
+    handle = generate_handle(handle_source)
+    if handle and not handle.startswith('products/'):
+        handle = f'products/{handle}'
+    
+    # Generate SEO fields
+    seo_title = f"{title} - {csv_data['categoryName']}" if csv_data['categoryName'] else title
+    seo_description = f"购买{csv_data['brandName']}{title}" if csv_data['brandName'] else title
+    seo_keywords_parts = [title]
+    if csv_data['categoryName']:
+        seo_keywords_parts.append(csv_data['categoryName'])
+    if csv_data['brandName']:
+        seo_keywords_parts.append(csv_data['brandName'])
+    seo_keywords = ','.join(seo_keywords_parts)
+    
+    # Generate tags
+    tags_parts = []
+    if csv_data['categoryName']:
+        tags_parts.append(csv_data['categoryName'])
+    if csv_data['brandName']:
+        tags_parts.append(csv_data['brandName'])
+    tags = ','.join(tags_parts) if tags_parts else ''
+    
+    # Extract material from title (last word after splitting by space)
+    material = extract_material_from_title(title)
+    
+    # Generate color options: randomly select 2-10 colors from COLORS list
+    num_colors = random.randint(2, 10)
+    selected_colors = random.sample(COLORS, min(num_colors, len(COLORS)))
+    
+    # Generate size options: 1-30, randomly select 4-8
+    num_sizes = random.randint(4, 8)
+    all_sizes = [str(i) for i in range(1, 31)]
+    selected_sizes = random.sample(all_sizes, num_sizes)
+    
+    # Material has only one value
+    materials = [material]
+    
+    # Generate all combinations (Cartesian product)
+    variants = list(itertools.product(selected_colors, selected_sizes, materials))
+    
+    # Generate M row (主商品)
+    description = f"<p>{csv_data['name']}</p>" if csv_data['name'] else ''
+    brief = csv_data['name'] or ''
+    
+    m_row = {
+        '商品ID': '',
+        '创建时间': create_time_str,
+        '商品标题*': title,
+        '商品属性*': 'M',  # Main product
+        '商品副标题': brief,
+        '商品描述': description,
+        'SEO标题': seo_title,
+        'SEO描述': seo_description,
+        'SEO URL Handle': handle,
+        'SEO URL 重定向': 'N',
+        'SEO关键词': seo_keywords,
+        '商品上架': 'Y',
+        '需要物流': 'Y',
+        '商品收税': 'N',
+        '商品spu': '',
+        '启用虚拟销量': 'N',
+        '虚拟销量值': '',
+        '跟踪库存': 'Y',
+        '库存规则*': '1',
+        '专辑名称': csv_data['categoryName'] or '',
+        '标签': tags,
+        '供应商名称': csv_data['supplierName'] or '',
+        '供应商URL': '',
+        '款式1': 'color',  # Option name
+        '款式2': 'size',   # Option name
+        '款式3': 'material',  # Option name
+        '商品售价*': '',  # Empty for M row
+        '商品原价': '',
+        '成本价': '',
+        '商品SKU': '',  # Empty for M row
+        '商品重量': '',
+        '重量单位': '',
+        '商品条形码': '',
+        '商品库存': '',  # Empty for M row
+        '尺寸信息': '',
+        '原产地国别': '',
+        'HS（协调制度）代码': '',
+        '商品图片*': csv_data['imageUrl'] or '',  # Main product image
+        '商品备注': '',
+        '款式备注': '',
+        '商品主图': csv_data['imageUrl'] or '',
+    }
+    rows.append(m_row)
+    
+    # Generate P rows (子款式) for each variant combination
+    base_price = round(random.uniform(50, 500), 2)
+    
+    for variant_idx, (color, size, mat) in enumerate(variants):
+        # Generate price variation (within ±20% of base)
+        price = round(base_price * random.uniform(0.8, 1.2), 2)
+        compare_at_price = round(price * random.uniform(1.2, 1.5), 2)
+        cost_price = round(price * 0.6, 2)
+        
+        # Generate random stock
+        inventory_quantity = random.randint(0, 100)
+        
+        # Generate random weight
+        weight = round(random.uniform(0.1, 5.0), 2)
+        weight_unit = 'kg'
+        
+        # Generate SKU code
+        sku_code = f"{csv_data['skuId']}-{color}-{size}-{mat}" if csv_data['skuId'] else f'SKU-{base_sku_id}-{variant_idx+1}'
+        
+        # Generate barcode
+        barcode = f"BAR{base_sku_id:08d}{variant_idx+1:03d}"
+        
+        p_row = {
+            '商品ID': '',
+            '创建时间': create_time_str,
+            '商品标题*': title,  # Same as M row
+            '商品属性*': 'P',  # Variant
+            '商品副标题': '',  # Empty for P row
+            '商品描述': '',  # Empty for P row
+            'SEO标题': '',  # Empty for P row
+            'SEO描述': '',  # Empty for P row
+            'SEO URL Handle': '',  # Empty for P row
+            'SEO URL 重定向': '',
+            'SEO关键词': '',
+            '商品上架': 'Y',
+            '需要物流': 'Y',
+            '商品收税': 'N',
+            '商品spu': '',
+            '启用虚拟销量': 'N',
+            '虚拟销量值': '',
+            '跟踪库存': 'Y',
+            '库存规则*': '1',
+            '专辑名称': '',  # Empty for P row
+            '标签': '',  # Empty for P row
+            '供应商名称': '',  # Empty for P row
+            '供应商URL': '',
+            '款式1': color,  # Option value
+            '款式2': size,   # Option value
+            '款式3': mat,    # Option value
+            '商品售价*': price,
+            '商品原价': compare_at_price,
+            '成本价': cost_price,
+            '商品SKU': sku_code,
+            '商品重量': weight,
+            '重量单位': weight_unit,
+            '商品条形码': barcode,
+            '商品库存': inventory_quantity,
+            '尺寸信息': '',
+            '原产地国别': '',
+            'HS（协调制度）代码': '',
+            '商品图片*': '',  # Empty for P row (uses main product image)
+            '商品备注': '',
+            '款式备注': '',
+            '商品主图': '',
+        }
+        rows.append(p_row)
+    
+    return rows
+
+
+def read_csv_file(csv_file: str) -> list:
+    """
+    Read CSV file and return list of parsed rows.
+    
+    Args:
+        csv_file: Path to CSV file
+        
+    Returns:
+        List of parsed CSV data dictionaries
+    """
+    csv_data_list = []
+    
+    with open(csv_file, 'r', encoding='utf-8') as f:
+        reader = csv.DictReader(f)
+        for row in reader:
+            parsed = parse_csv_row(row)
+            csv_data_list.append(parsed)
+    
+    return csv_data_list
+
+
+def create_excel_from_template(template_file: str, output_file: str, excel_rows: list):
+    """
+    Create Excel file from template and fill with data rows.
+    
+    Args:
+        template_file: Path to Excel template file
+        output_file: Path to output Excel file
+        excel_rows: List of dictionaries mapping Excel column names to values
+    """
+    # Load template
+    wb = load_workbook(template_file)
+    ws = wb.active  # Use the active sheet (Sheet4)
+    
+    # Find header row (row 2)
+    header_row_idx = 2
+    
+    # Get column mapping from header row
+    column_mapping = {}
+    for col_idx in range(1, ws.max_column + 1):
+        cell_value = ws.cell(row=header_row_idx, column=col_idx).value
+        if cell_value:
+            column_mapping[cell_value] = col_idx
+    
+    # Start writing data from row 4
+    data_start_row = 4
+    
+    # Clear existing data rows
+    last_template_row = ws.max_row
+    if last_template_row >= data_start_row:
+        for row in range(data_start_row, last_template_row + 1):
+            for col in range(1, ws.max_column + 1):
+                ws.cell(row=row, column=col).value = None
+    
+    # Write data rows
+    for row_idx, excel_row in enumerate(excel_rows):
+        excel_row_num = data_start_row + row_idx
+        
+        # Write each field to corresponding column
+        for field_name, col_idx in column_mapping.items():
+            if field_name in excel_row:
+                cell = ws.cell(row=excel_row_num, column=col_idx)
+                value = excel_row[field_name]
+                cell.value = value
+                
+                # Set alignment
+                if isinstance(value, str):
+                    cell.alignment = Alignment(vertical='top', wrap_text=True)
+                elif isinstance(value, (int, float)):
+                    cell.alignment = Alignment(vertical='top')
+    
+    # Save workbook
+    wb.save(output_file)
+    print(f"Excel file created: {output_file}")
+    print(f"  - Total rows: {len(excel_rows)}")
+
+
+def main():
+    parser = argparse.ArgumentParser(description='Convert CSV data to Excel import template with multi-variant support')
+    parser.add_argument('--csv-file', 
+                       default='data/customer1/goods_with_pic.5years_congku.csv.shuf.1w',
+                       help='CSV file path')
+    parser.add_argument('--template', 
+                       default='docs/商品导入模板.xlsx',
+                       help='Excel template file path')
+    parser.add_argument('--output', 
+                       default='商品导入数据.xlsx',
+                       help='Output Excel file path')
+    parser.add_argument('--limit', 
+                       type=int, 
+                       default=None,
+                       help='Limit number of products to process')
+    parser.add_argument('--single-ratio', 
+                       type=float, 
+                       default=0.3,
+                       help='Ratio of single variant products (default: 0.3 = 30%%)')
+    parser.add_argument('--seed', 
+                       type=int, 
+                       default=None,
+                       help='Random seed for reproducible results')
+    
+    args = parser.parse_args()
+    
+    # Set random seed if provided
+    if args.seed is not None:
+        random.seed(args.seed)
+    
+    # Check if files exist
+    if not os.path.exists(args.csv_file):
+        print(f"Error: CSV file not found: {args.csv_file}")
+        sys.exit(1)
+    
+    if not os.path.exists(args.template):
+        print(f"Error: Template file not found: {args.template}")
+        sys.exit(1)
+    
+    # Read CSV file
+    print(f"Reading CSV file: {args.csv_file}")
+    csv_data_list = read_csv_file(args.csv_file)
+    print(f"Read {len(csv_data_list)} rows from CSV")
+    
+    # Limit products if specified
+    if args.limit:
+        csv_data_list = csv_data_list[:args.limit]
+        print(f"Limited to {len(csv_data_list)} products")
+    
+    # Generate Excel rows
+    print(f"\nGenerating Excel rows...")
+    print(f"  - Single variant ratio: {args.single_ratio*100:.0f}%")
+    print(f"  - Multi variant ratio: {(1-args.single_ratio)*100:.0f}%")
+    
+    excel_rows = []
+    single_count = 0
+    multi_count = 0
+    
+    for idx, csv_data in enumerate(csv_data_list):
+        # Decide if this product should be single or multi variant
+        is_single = random.random() < args.single_ratio
+        
+        if is_single:
+            # Generate single variant (S type)
+            row = generate_single_variant_row(csv_data, base_sku_id=idx+1)
+            excel_rows.append(row)
+            single_count += 1
+        else:
+            # Generate multi variant (M+P type)
+            rows = generate_multi_variant_rows(csv_data, base_sku_id=idx+1)
+            excel_rows.extend(rows)
+            multi_count += 1
+    
+    print(f"\nGenerated:")
+    print(f"  - Single variant products: {single_count}")
+    print(f"  - Multi variant products: {multi_count}")
+    print(f"  - Total Excel rows: {len(excel_rows)}")
+    
+    # Create Excel file
+    print(f"\nCreating Excel file from template: {args.template}")
+    print(f"Output file: {args.output}")
+    create_excel_from_template(args.template, args.output, excel_rows)
+    
+    print(f"\nDone! Generated {len(excel_rows)} rows in Excel file.")
+
+
+if __name__ == '__main__':
+    main()
+