重构

tangwang
1 parent e7f2b240
Showing 23 changed files with 1834 additions and 2062 deletions Show diff stats
.env.example
.gitignore
README.md
app/agents/shopping_agent.py
app/config.py
app/services/__init__.py
app/services/embedding_service.py
app/services/milvus_service.py
app/tools/__init__.py
app/tools/search_tools.py
docker-compose.yml
docs/DEPLOY_CENTOS8.md
docs/Skills实现方案-LangChain1.0.md
技术实现报告.md -> docs/技术实现报告.md
docs/搜索API对接指南.md
requirements.txt
scripts/check_services.sh
scripts/index_data.py
scripts/run_clip.sh
scripts/run_milvus.sh
 # ====================
 # OpenAI Configuration
 # ====================
-OPENAI_API_KEY=
-OPENAI_MODEL=gpt-4o-mini
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
+OPENAI_API_KEY=sk-c3b8d4db061840aa8effb748df2a997b
+OPENAI_MODEL=qwen-plus
+# Base URL for Qwen/DashScope (OpenAI-compatible API)
+# 北京: https://dashscope.aliyuncs.com/compatible-mode/v1
+# 弗吉尼亚: https://dashscope-us.aliyuncs.com/compatible-mode/v1
+# 新加坡: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
+OPENAI_API_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
 OPENAI_TEMPERATURE=1
 OPENAI_MAX_TOKENS=1000
 # ====================
-# CLIP Server Configuration
-# ====================
-CLIP_SERVER_URL=grpc://localhost:51000
-
-# ====================
-# Milvus Configuration
-# ====================
-MILVUS_HOST=localhost
-MILVUS_PORT=19530
-
-# Collection settings
-TEXT_COLLECTION_NAME=text_embeddings
-IMAGE_COLLECTION_NAME=image_embeddings
-TEXT_DIM=1536
-IMAGE_DIM=512
-
-# ====================
 # Search Configuration
 # ====================
 TOP_K_RESULTS=30
 SIMILARITY_THRESHOLD=0.6
+# Search API (see docs/搜索API对接指南.md)
+SEARCH_API_BASE_URL=http://120.76.41.98:6002
+SEARCH_API_TENANT_ID=162
+
 # ====================
 # Application Configuration
 # ====================
@@ -53,7 +53,6 @@ data/**
 *.db
 *.sqlite
 *.sqlite3
-data/milvus_lite.db
 # Docker volumes
 volumes/
@@ -12,9 +12,9 @@ OmniShopAgent autonomously decides which tools to call, maintains conversation s
 **Key Features:**
 - Autonomous tool selection and execution
-- Multi-modal search (text + image)
+- Text search via Search API
 - Conversational context awareness
-- Real-time visual analysis 
+- Real-time visual analysis (style extraction from images) 
 ## Tech Stack
@@ -22,9 +22,7 @@ OmniShopAgent autonomously decides which tools to call, maintains conversation s
 |-----------|-----------|
 | **Agent Framework** | LangGraph |
 | **LLM** | any LLM supported by LangChain |
-| **Text Embedding** | text-embedding-3-small |
-| **Image Embedding** | CLIP ViT-B/32 |
-| **Vector Database** | Milvus |
+| **Search** | Search API (HTTP) |
 | **Frontend** | Streamlit |
 | **Dataset** | Kaggle Fashion Products |
@@ -52,8 +50,7 @@ graph LR
 ```
 **Available Tools:**
-- `search_products(query)` - Text-based semantic search
-- `search_by_image(image_path)` - Visual similarity search  
+- `search_products(query)` - Text-based product search via Search API
 - `analyze_image_style(image_path)` - VLM style analysis
@@ -66,12 +63,6 @@ User: &quot;winter coats for women&quot;
 Agent: search_products("winter coats women") → Returns 5 products
 ```
-**Image Upload:**
-```
-User: [uploads sneaker photo] "find similar"
-Agent: search_by_image(path) → Returns visually similar shoes
-```
-
 **Style Analysis + Search:**
 ```
 User: [uploads vintage jacket] "what style is this? find matching pants"
@@ -93,6 +84,8 @@ Agent: [remembers context] → search_products(&quot;red formal dresses&quot;) → Results
 User: [uploads office outfit] "I like the shirt but need something more casual"
 Agent: analyze_image_style(path) → Extracts shirt details
        search_products("casual shirt [color] [style]") → Returns casual alternatives
+
+**Note:** For image uploads "find similar", use analyze_image_style first to extract attributes, then search_products with the description.
 ```
 ## Installation
@@ -100,7 +93,6 @@ Agent: analyze_image_style(path) → Extracts shirt details
 **Prerequisites:**
 - Python 3.12+ (LangChain 1.x 要求 Python 3.10+)
 - OpenAI API Key
-- Docker & Docker Compose
 ### 1. Setup Environment
 ```bash
@@ -116,38 +108,14 @@ cp .env.example .env
 # Edit .env and add your OPENAI_API_KEY
 ```
-### 2. Download Dataset
-Download the [Fashion Product Images Dataset](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) from Kaggle and extract to `./data/`:
-
-```python
-python scripts/download_dataset.py
-```
-
-Expected structure:
-```
-data/
-├── images/       # ~44k product images
-├── styles.csv    # Product metadata
-└── images.csv    # Image filenames
-```
-
-### 3. Start Services
-
-```bash
-docker-compose up
-python -m clip_server
-```
-
-
-### 4. Index Data 
+### 2. (Optional) Download Dataset
+For image style analysis, you may download the [Fashion Product Images Dataset](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) from Kaggle:
 ```bash
-python scripts/index_data.py
+python scripts/download_dataset.py
 ```
-This generates and stores text/image embeddings for all 44k products in Milvus.
-
-### 5. Launch Application
+### 3. Launch Application
 ```bash
 # 使用启动脚本（推荐）
 ./scripts/start.sh
@@ -155,6 +123,9 @@ This generates and stores text/image embeddings for all 44k products in Milvus.
 # 或直接运行
 streamlit run app.py
 ```
+
+Product search uses the external Search API. Configure `SEARCH_API_BASE_URL` and `SEARCH_API_TENANT_ID` in `.env` if needed.
+
 Opens at `http://localhost:8501`
 ### CentOS 8 部署
@@ -52,11 +52,14 @@ class ShoppingAgent:
         self.session_id = session_id or "default"
         # Initialize LLM
-        self.llm = ChatOpenAI(
+        llm_kwargs = dict(
             model=settings.openai_model,
             temperature=settings.openai_temperature,
             api_key=settings.openai_api_key,
         )
+        if settings.openai_api_base_url:
+            llm_kwargs["base_url"] = settings.openai_api_base_url
+        self.llm = ChatOpenAI(**llm_kwargs)
         # Get tools and bind to model
         self.tools = get_all_tools()
@@ -73,12 +76,11 @@ class ShoppingAgent:
         # System prompt for the agent
         system_prompt = """You are an intelligent fashion shopping assistant. You can:
 1. Search for products by text description (use search_products)
-2. Find visually similar products from images (use search_by_image)
-3. Analyze image style and attributes (use analyze_image_style)
+2. Analyze image style and attributes (use analyze_image_style)
 When a user asks about products:
 - For text queries: use search_products directly
-- For image uploads: decide if you need to analyze_image_style first, then search
+- For image uploads: use analyze_image_style first to understand the product, then use search_products with the extracted description
 - You can call multiple tools in sequence if needed
 - Always provide helpful, friendly responses
@@ -4,6 +4,7 @@ Loads environment variables and provides configuration objects
 """
 import os
+from typing import Optional
 from pydantic_settings import BaseSettings
@@ -17,47 +18,20 @@ class Settings(BaseSettings):
     # OpenAI Configuration
     openai_api_key: str
     openai_model: str = "gpt-4o-mini"
-    openai_embedding_model: str = "text-embedding-3-small"
     openai_temperature: float = 0.7
     openai_max_tokens: int = 1000
-
-    # CLIP Server Configuration
-    clip_server_url: str = "grpc://localhost:51000"
-
-    # Milvus Configuration
-    milvus_uri: str = "http://localhost:19530"
-    milvus_host: str = "localhost"
-    milvus_port: int = 19530
-    text_collection_name: str = "text_embeddings"
-    image_collection_name: str = "image_embeddings"
-    text_dim: int = 1536
-    image_dim: int = 512
-
-    @property
-    def milvus_uri_absolute(self) -> str:
-        """Get absolute path for Milvus URI
-
-        Returns:
-            - For http/https URIs: returns as-is (Milvus Standalone)
-            - For file paths starting with ./: converts to absolute path (Milvus Lite)
-            - For other paths: returns as-is
-        """
-        import os
-
-        # If it's a network URI, return as-is (Milvus Standalone)
-        if self.milvus_uri.startswith(("http://", "https://")):
-            return self.milvus_uri
-        # If it's a relative path, convert to absolute (Milvus Lite)
-        if self.milvus_uri.startswith("./"):
-            base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
-            return os.path.join(base_dir, self.milvus_uri[2:])
-        # Otherwise return as-is
-        return self.milvus_uri
+    # Base URL for OpenAI-compatible APIs (e.g. Qwen/DashScope)
+    # Qwen 北京: https://dashscope.aliyuncs.com/compatible-mode/v1
+    openai_api_base_url: Optional[str] = None
     # Search Configuration
     top_k_results: int = 10
     similarity_threshold: float = 0.6
+    # Search API (see docs/搜索API对接指南.md)
+    search_api_base_url: str = "http://120.76.41.98:6002"
+    search_api_tenant_id: str = "162"
+
     # Application Configuration
     app_host: str = "0.0.0.0"
     app_port: int = 8000
@@ -73,6 +47,7 @@ class Settings(BaseSettings):
         env_file = ".env"
         env_file_encoding = "utf-8"
         case_sensitive = False
+        extra = "ignore"
 # Global settings instance
 """
 Services Module
-Provides database and embedding services for the application
 """
-
-from app.services.embedding_service import EmbeddingService, get_embedding_service
-from app.services.milvus_service import MilvusService, get_milvus_service
-
-__all__ = [
-    "EmbeddingService",
-    "get_embedding_service",
-    "MilvusService",
-    "get_milvus_service",
-]
@@ -1,293 +0,0 @@
-"""
-Embedding Service for Text and Image Embeddings
-Supports OpenAI text embeddings and CLIP image embeddings
-"""
-
-import logging
-from pathlib import Path
-from typing import List, Optional, Union
-
-import numpy as np
-from clip_client import Client as ClipClient
-from openai import OpenAI
-
-from app.config import settings
-
-logger = logging.getLogger(__name__)
-
-
-class EmbeddingService:
-    """Service for generating text and image embeddings"""
-
-    def __init__(
-        self,
-        openai_api_key: Optional[str] = None,
-        clip_server_url: Optional[str] = None,
-    ):
-        """Initialize embedding service
-
-        Args:
-            openai_api_key: OpenAI API key. If None, uses settings.openai_api_key
-            clip_server_url: CLIP server URL. If None, uses settings.clip_server_url
-        """
-        # Initialize OpenAI client for text embeddings
-        self.openai_api_key = openai_api_key or settings.openai_api_key
-        self.openai_client = OpenAI(api_key=self.openai_api_key)
-        self.text_embedding_model = settings.openai_embedding_model
-
-        # Initialize CLIP client for image embeddings
-        self.clip_server_url = clip_server_url or settings.clip_server_url
-        self.clip_client: Optional[ClipClient] = None
-
-        logger.info("Embedding service initialized")
-
-    def connect_clip(self) -> None:
-        """Connect to CLIP server"""
-        try:
-            self.clip_client = ClipClient(server=self.clip_server_url)
-            logger.info(f"Connected to CLIP server at {self.clip_server_url}")
-        except Exception as e:
-            logger.error(f"Failed to connect to CLIP server: {e}")
-            raise
-
-    def disconnect_clip(self) -> None:
-        """Disconnect from CLIP server"""
-        if self.clip_client:
-            # Note: clip_client doesn't have explicit close method
-            self.clip_client = None
-            logger.info("Disconnected from CLIP server")
-
-    def get_text_embedding(self, text: str) -> List[float]:
-        """Get embedding for a single text
-
-        Args:
-            text: Input text
-
-        Returns:
-            Embedding vector as list of floats
-        """
-        try:
-            response = self.openai_client.embeddings.create(
-                input=text, model=self.text_embedding_model
-            )
-            embedding = response.data[0].embedding
-            logger.debug(f"Generated text embedding for: {text[:50]}...")
-            return embedding
-        except Exception as e:
-            logger.error(f"Failed to generate text embedding: {e}")
-            raise
-
-    def get_text_embeddings_batch(
-        self, texts: List[str], batch_size: int = 100
-    ) -> List[List[float]]:
-        """Get embeddings for multiple texts in batches
-
-        Args:
-            texts: List of input texts
-            batch_size: Number of texts to process at once
-
-        Returns:
-            List of embedding vectors
-        """
-        all_embeddings = []
-
-        for i in range(0, len(texts), batch_size):
-            batch = texts[i : i + batch_size]
-
-            try:
-                response = self.openai_client.embeddings.create(
-                    input=batch, model=self.text_embedding_model
-                )
-
-                # Extract embeddings in the correct order
-                embeddings = [item.embedding for item in response.data]
-                all_embeddings.extend(embeddings)
-
-                logger.info(
-                    f"Generated text embeddings for batch {i // batch_size + 1}: {len(embeddings)} embeddings"
-                )
-
-            except Exception as e:
-                logger.error(
-                    f"Failed to generate text embeddings for batch {i // batch_size + 1}: {e}"
-                )
-                raise
-
-        return all_embeddings
-
-    def get_image_embedding(self, image_path: Union[str, Path]) -> List[float]:
-        """Get CLIP embedding for a single image
-
-        Args:
-            image_path: Path to image file
-
-        Returns:
-            Embedding vector as list of floats
-        """
-        if not self.clip_client:
-            raise RuntimeError("CLIP client not connected. Call connect_clip() first.")
-
-        image_path = Path(image_path)
-        if not image_path.exists():
-            raise FileNotFoundError(f"Image not found: {image_path}")
-
-        try:
-            # Get embedding from CLIP server using image path (as string)
-            result = self.clip_client.encode([str(image_path)])
-
-            # Extract embedding - result is numpy array
-            import numpy as np
-
-            if isinstance(result, np.ndarray):
-                # If result is numpy array, use first element
-                embedding = (
-                    result[0].tolist() if len(result.shape) > 1 else result.tolist()
-                )
-            else:
-                # If result is DocumentArray
-                embedding = result[0].embedding.tolist()
-
-            logger.debug(f"Generated image embedding for: {image_path.name}")
-            return embedding
-
-        except Exception as e:
-            logger.error(f"Failed to generate image embedding for {image_path}: {e}")
-            raise
-
-    def get_image_embeddings_batch(
-        self, image_paths: List[Union[str, Path]], batch_size: int = 32
-    ) -> List[Optional[List[float]]]:
-        """Get CLIP embeddings for multiple images in batches
-
-        Args:
-            image_paths: List of paths to image files
-            batch_size: Number of images to process at once
-
-        Returns:
-            List of embedding vectors (None for failed images)
-        """
-        if not self.clip_client:
-            raise RuntimeError("CLIP client not connected. Call connect_clip() first.")
-
-        all_embeddings = []
-
-        for i in range(0, len(image_paths), batch_size):
-            batch_paths = image_paths[i : i + batch_size]
-            valid_paths = []
-            valid_indices = []
-
-            # Check which images exist
-            for idx, path in enumerate(batch_paths):
-                path = Path(path)
-                if path.exists():
-                    valid_paths.append(str(path))
-                    valid_indices.append(idx)
-                else:
-                    logger.warning(f"Image not found: {path}")
-
-            # Get embeddings for valid images
-            if valid_paths:
-                try:
-                    # Send paths as strings to CLIP server
-                    result = self.clip_client.encode(valid_paths)
-
-                    # Create embeddings list with None for missing images
-                    batch_embeddings = [None] * len(batch_paths)
-
-                    # Handle result format - could be numpy array or DocumentArray
-                    import numpy as np
-
-                    if isinstance(result, np.ndarray):
-                        # Result is numpy array - shape (n_images, embedding_dim)
-                        for idx in range(len(result)):
-                            original_idx = valid_indices[idx]
-                            batch_embeddings[original_idx] = result[idx].tolist()
-                    else:
-                        # Result is DocumentArray
-                        for idx, doc in enumerate(result):
-                            original_idx = valid_indices[idx]
-                            batch_embeddings[original_idx] = doc.embedding.tolist()
-
-                    all_embeddings.extend(batch_embeddings)
-
-                    logger.info(
-                        f"Generated image embeddings for batch {i // batch_size + 1}: "
-                        f"{len(valid_paths)}/{len(batch_paths)} successful"
-                    )
-
-                except Exception as e:
-                    logger.error(
-                        f"Failed to generate image embeddings for batch {i // batch_size + 1}: {e}"
-                    )
-                    # Add None for all images in failed batch
-                    all_embeddings.extend([None] * len(batch_paths))
-            else:
-                # All images in batch failed to load
-                all_embeddings.extend([None] * len(batch_paths))
-
-        return all_embeddings
-
-    def get_text_embedding_from_image(
-        self, image_path: Union[str, Path]
-    ) -> List[float]:
-        """Get text-based embedding by describing the image
-        This is useful for cross-modal search
-
-        Note: This is a placeholder for future implementation
-        that could use vision models to generate text descriptions
-
-        Args:
-            image_path: Path to image file
-
-        Returns:
-            Text embedding vector
-        """
-        # For now, we just return the image embedding
-        # In the future, this could use a vision-language model to generate
-        # a text description and then embed that
-        raise NotImplementedError("Text embedding from image not yet implemented")
-
-    def cosine_similarity(
-        self, embedding1: List[float], embedding2: List[float]
-    ) -> float:
-        """Calculate cosine similarity between two embeddings
-
-        Args:
-            embedding1: First embedding vector
-            embedding2: Second embedding vector
-
-        Returns:
-            Cosine similarity score (0-1)
-        """
-        vec1 = np.array(embedding1)
-        vec2 = np.array(embedding2)
-
-        # Normalize vectors
-        vec1_norm = vec1 / np.linalg.norm(vec1)
-        vec2_norm = vec2 / np.linalg.norm(vec2)
-
-        # Calculate cosine similarity
-        similarity = np.dot(vec1_norm, vec2_norm)
-
-        return float(similarity)
-
-    def get_embedding_dimensions(self) -> dict:
-        """Get the dimensions of text and image embeddings
-
-        Returns:
-            Dictionary with text_dim and image_dim
-        """
-        return {"text_dim": settings.text_dim, "image_dim": settings.image_dim}
-
-
-# Global instance
-_embedding_service: Optional[EmbeddingService] = None
-
-
-def get_embedding_service() -> EmbeddingService:
-    """Get or create the global embedding service instance"""
-    global _embedding_service
-    if _embedding_service is None:
-        _embedding_service = EmbeddingService()
-        _embedding_service.connect_clip()
-    return _embedding_service
@@ -1,480 +0,0 @@
-"""
-Milvus Service for Vector Storage and Similarity Search
-Manages text and image embeddings in separate collections
-"""
-
-import logging
-from typing import Any, Dict, List, Optional
-
-from pymilvus import (
-    DataType,
-    MilvusClient,
-)
-
-from app.config import settings
-
-logger = logging.getLogger(__name__)
-
-
-class MilvusService:
-    """Service for managing vector embeddings in Milvus"""
-
-    def __init__(self, uri: Optional[str] = None):
-        """Initialize Milvus service
-
-        Args:
-            uri: Milvus connection URI. If None, uses settings.milvus_uri
-        """
-        if uri:
-            self.uri = uri
-        else:
-            # Use absolute path for Milvus Lite
-            self.uri = settings.milvus_uri_absolute
-        self.text_collection_name = settings.text_collection_name
-        self.image_collection_name = settings.image_collection_name
-        self.text_dim = settings.text_dim
-        self.image_dim = settings.image_dim
-
-        # Use MilvusClient for simplified operations
-        self._client: Optional[MilvusClient] = None
-
-        logger.info(f"Initializing Milvus service with URI: {self.uri}")
-
-    def is_connected(self) -> bool:
-        """Check if connected to Milvus"""
-        return self._client is not None
-
-    def connect(self) -> None:
-        """Connect to Milvus"""
-        if self.is_connected():
-            return
-        try:
-            self._client = MilvusClient(uri=self.uri)
-            logger.info(f"Connected to Milvus at {self.uri}")
-        except Exception as e:
-            logger.error(f"Failed to connect to Milvus: {e}")
-            raise
-
-    def disconnect(self) -> None:
-        """Disconnect from Milvus"""
-        if self._client:
-            self._client.close()
-            self._client = None
-            logger.info("Disconnected from Milvus")
-
-    @property
-    def client(self) -> MilvusClient:
-        """Get the Milvus client"""
-        if not self._client:
-            raise RuntimeError("Milvus not connected. Call connect() first.")
-        return self._client
-
-    def create_text_collection(self, recreate: bool = False) -> None:
-        """Create collection for text embeddings with product metadata
-
-        Args:
-            recreate: If True, drop existing collection and recreate
-        """
-        if recreate and self.client.has_collection(self.text_collection_name):
-            self.client.drop_collection(self.text_collection_name)
-            logger.info(f"Dropped existing collection: {self.text_collection_name}")
-
-        if self.client.has_collection(self.text_collection_name):
-            logger.info(f"Text collection already exists: {self.text_collection_name}")
-            return
-
-        # Create collection with schema (includes metadata fields)
-        schema = MilvusClient.create_schema(
-            auto_id=False,
-            enable_dynamic_field=True,  # Allow additional metadata fields
-        )
-
-        # Core fields
-        schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
-        schema.add_field(field_name="text", datatype=DataType.VARCHAR, max_length=2000)
-        schema.add_field(
-            field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=self.text_dim
-        )
-
-        # Product metadata fields
-        schema.add_field(
-            field_name="productDisplayName", datatype=DataType.VARCHAR, max_length=500
-        )
-        schema.add_field(field_name="gender", datatype=DataType.VARCHAR, max_length=50)
-        schema.add_field(
-            field_name="masterCategory", datatype=DataType.VARCHAR, max_length=100
-        )
-        schema.add_field(
-            field_name="subCategory", datatype=DataType.VARCHAR, max_length=100
-        )
-        schema.add_field(
-            field_name="articleType", datatype=DataType.VARCHAR, max_length=100
-        )
-        schema.add_field(
-            field_name="baseColour", datatype=DataType.VARCHAR, max_length=50
-        )
-        schema.add_field(field_name="season", datatype=DataType.VARCHAR, max_length=50)
-        schema.add_field(field_name="usage", datatype=DataType.VARCHAR, max_length=50)
-
-        # Create index parameters
-        index_params = self.client.prepare_index_params()
-        index_params.add_index(
-            field_name="embedding",
-            index_type="AUTOINDEX",
-            metric_type="COSINE",
-        )
-
-        # Create collection
-        self.client.create_collection(
-            collection_name=self.text_collection_name,
-            schema=schema,
-            index_params=index_params,
-        )
-
-        logger.info(
-            f"Created text collection with metadata: {self.text_collection_name}"
-        )
-
-    def create_image_collection(self, recreate: bool = False) -> None:
-        """Create collection for image embeddings with product metadata
-
-        Args:
-            recreate: If True, drop existing collection and recreate
-        """
-        if recreate and self.client.has_collection(self.image_collection_name):
-            self.client.drop_collection(self.image_collection_name)
-            logger.info(f"Dropped existing collection: {self.image_collection_name}")
-
-        if self.client.has_collection(self.image_collection_name):
-            logger.info(
-                f"Image collection already exists: {self.image_collection_name}"
-            )
-            return
-
-        # Create collection with schema (includes metadata fields)
-        schema = MilvusClient.create_schema(
-            auto_id=False,
-            enable_dynamic_field=True,  # Allow additional metadata fields
-        )
-
-        # Core fields
-        schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
-        schema.add_field(
-            field_name="image_path", datatype=DataType.VARCHAR, max_length=500
-        )
-        schema.add_field(
-            field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=self.image_dim
-        )
-
-        # Product metadata fields
-        schema.add_field(
-            field_name="productDisplayName", datatype=DataType.VARCHAR, max_length=500
-        )
-        schema.add_field(field_name="gender", datatype=DataType.VARCHAR, max_length=50)
-        schema.add_field(
-            field_name="masterCategory", datatype=DataType.VARCHAR, max_length=100
-        )
-        schema.add_field(
-            field_name="subCategory", datatype=DataType.VARCHAR, max_length=100
-        )
-        schema.add_field(
-            field_name="articleType", datatype=DataType.VARCHAR, max_length=100
-        )
-        schema.add_field(
-            field_name="baseColour", datatype=DataType.VARCHAR, max_length=50
-        )
-        schema.add_field(field_name="season", datatype=DataType.VARCHAR, max_length=50)
-        schema.add_field(field_name="usage", datatype=DataType.VARCHAR, max_length=50)
-
-        # Create index parameters
-        index_params = self.client.prepare_index_params()
-        index_params.add_index(
-            field_name="embedding",
-            index_type="AUTOINDEX",
-            metric_type="COSINE",
-        )
-
-        # Create collection
-        self.client.create_collection(
-            collection_name=self.image_collection_name,
-            schema=schema,
-            index_params=index_params,
-        )
-
-        logger.info(
-            f"Created image collection with metadata: {self.image_collection_name}"
-        )
-
-    def insert_text_embeddings(
-        self,
-        embeddings: List[Dict[str, Any]],
-    ) -> int:
-        """Insert text embeddings with metadata into collection
-
-        Args:
-            embeddings: List of dictionaries with keys:
-                - id: unique ID (product ID)
-                - text: the text that was embedded
-                - embedding: the embedding vector
-                - productDisplayName, gender, masterCategory, etc. (metadata)
-
-        Returns:
-            Number of inserted embeddings
-        """
-        if not embeddings:
-            return 0
-
-        try:
-            # Insert data directly (all fields including metadata)
-            # Milvus will accept all fields defined in schema + dynamic fields
-            data = embeddings
-
-            # Insert data
-            result = self.client.insert(
-                collection_name=self.text_collection_name,
-                data=data,
-            )
-
-            logger.info(f"Inserted {len(data)} text embeddings")
-            return len(data)
-
-        except Exception as e:
-            logger.error(f"Failed to insert text embeddings: {e}")
-            raise
-
-    def insert_image_embeddings(
-        self,
-        embeddings: List[Dict[str, Any]],
-    ) -> int:
-        """Insert image embeddings with metadata into collection
-
-        Args:
-            embeddings: List of dictionaries with keys:
-                - id: unique ID (product ID)
-                - image_path: path to the image file
-                - embedding: the embedding vector
-                - productDisplayName, gender, masterCategory, etc. (metadata)
-
-        Returns:
-            Number of inserted embeddings
-        """
-        if not embeddings:
-            return 0
-
-        try:
-            # Insert data directly (all fields including metadata)
-            # Milvus will accept all fields defined in schema + dynamic fields
-            data = embeddings
-
-            # Insert data
-            result = self.client.insert(
-                collection_name=self.image_collection_name,
-                data=data,
-            )
-
-            logger.info(f"Inserted {len(data)} image embeddings")
-            return len(data)
-
-        except Exception as e:
-            logger.error(f"Failed to insert image embeddings: {e}")
-            raise
-
-    def search_similar_text(
-        self,
-        query_embedding: List[float],
-        limit: int = 10,
-        filters: Optional[str] = None,
-        output_fields: Optional[List[str]] = None,
-    ) -> List[Dict[str, Any]]:
-        """Search for similar text embeddings
-
-        Args:
-            query_embedding: Query embedding vector
-            limit: Maximum number of results
-            filters: Filter expression (e.g., "product_id in [1, 2, 3]")
-            output_fields: List of fields to return
-
-        Returns:
-            List of search results with fields:
-                - id: embedding ID
-                - distance: similarity distance
-                - entity: the matched entity with requested fields
-        """
-        try:
-            if output_fields is None:
-                output_fields = [
-                    "id",
-                    "text",
-                    "productDisplayName",
-                    "gender",
-                    "masterCategory",
-                    "subCategory",
-                    "articleType",
-                    "baseColour",
-                ]
-
-            search_params = {}
-            if filters:
-                search_params["expr"] = filters
-
-            results = self.client.search(
-                collection_name=self.text_collection_name,
-                data=[query_embedding],
-                limit=limit,
-                output_fields=output_fields,
-                search_params=search_params,
-            )
-
-            # Format results
-            formatted_results = []
-            if results and len(results) > 0:
-                for hit in results[0]:
-                    result = {"id": hit.get("id"), "distance": hit.get("distance")}
-                    # Extract fields from entity
-                    entity = hit.get("entity", {})
-                    for field in output_fields:
-                        if field in entity:
-                            result[field] = entity.get(field)
-                    formatted_results.append(result)
-
-            logger.debug(f"Found {len(formatted_results)} similar text embeddings")
-            return formatted_results
-
-        except Exception as e:
-            logger.error(f"Failed to search similar text: {e}")
-            raise
-
-    def search_similar_images(
-        self,
-        query_embedding: List[float],
-        limit: int = 10,
-        filters: Optional[str] = None,
-        output_fields: Optional[List[str]] = None,
-    ) -> List[Dict[str, Any]]:
-        """Search for similar image embeddings
-
-        Args:
-            query_embedding: Query embedding vector
-            limit: Maximum number of results
-            filters: Filter expression (e.g., "product_id in [1, 2, 3]")
-            output_fields: List of fields to return
-
-        Returns:
-            List of search results with fields:
-                - id: embedding ID
-                - distance: similarity distance
-                - entity: the matched entity with requested fields
-        """
-        try:
-            if output_fields is None:
-                output_fields = [
-                    "id",
-                    "image_path",
-                    "productDisplayName",
-                    "gender",
-                    "masterCategory",
-                    "subCategory",
-                    "articleType",
-                    "baseColour",
-                ]
-
-            search_params = {}
-            if filters:
-                search_params["expr"] = filters
-
-            results = self.client.search(
-                collection_name=self.image_collection_name,
-                data=[query_embedding],
-                limit=limit,
-                output_fields=output_fields,
-                search_params=search_params,
-            )
-
-            # Format results
-            formatted_results = []
-            if results and len(results) > 0:
-                for hit in results[0]:
-                    result = {"id": hit.get("id"), "distance": hit.get("distance")}
-                    # Extract fields from entity
-                    entity = hit.get("entity", {})
-                    for field in output_fields:
-                        if field in entity:
-                            result[field] = entity.get(field)
-                    formatted_results.append(result)
-
-            logger.debug(f"Found {len(formatted_results)} similar image embeddings")
-            return formatted_results
-
-        except Exception as e:
-            logger.error(f"Failed to search similar images: {e}")
-            raise
-
-    def get_collection_stats(self, collection_name: str) -> Dict[str, Any]:
-        """Get statistics for a collection
-
-        Args:
-            collection_name: Name of the collection
-
-        Returns:
-            Dictionary with collection statistics
-        """
-        try:
-            stats = self.client.get_collection_stats(collection_name)
-            return {
-                "collection_name": collection_name,
-                "row_count": stats.get("row_count", 0),
-            }
-        except Exception as e:
-            logger.error(f"Failed to get collection stats: {e}")
-            return {"collection_name": collection_name, "row_count": 0}
-
-    def delete_by_ids(self, collection_name: str, ids: List[int]) -> int:
-        """Delete embeddings by IDs
-
-        Args:
-            collection_name: Name of the collection
-            ids: List of IDs to delete
-
-        Returns:
-            Number of deleted embeddings
-        """
-        if not ids:
-            return 0
-
-        try:
-            self.client.delete(
-                collection_name=collection_name,
-                ids=ids,
-            )
-            logger.info(f"Deleted {len(ids)} embeddings from {collection_name}")
-            return len(ids)
-        except Exception as e:
-            logger.error(f"Failed to delete embeddings: {e}")
-            raise
-
-    def clear_collection(self, collection_name: str) -> None:
-        """Clear all data from a collection
-
-        Args:
-            collection_name: Name of the collection
-        """
-        try:
-            if self.client.has_collection(collection_name):
-                self.client.drop_collection(collection_name)
-                logger.info(f"Dropped collection: {collection_name}")
-        except Exception as e:
-            logger.error(f"Failed to clear collection: {e}")
-            raise
-
-
-# Global instance
-_milvus_service: Optional[MilvusService] = None
-
-
-def get_milvus_service() -> MilvusService:
-    """Get or create the global Milvus service instance"""
-    global _milvus_service
-    if _milvus_service is None:
-        _milvus_service = MilvusService()
-        _milvus_service.connect()
-    return _milvus_service
@@ -5,13 +5,11 @@ LangChain Tools for Product Search and Discovery
 from app.tools.search_tools import (
     analyze_image_style,
     get_all_tools,
-    search_by_image,
     search_products,
 )
 __all__ = [
     "search_products",
-    "search_by_image",
     "analyze_image_style",
     "get_all_tools",
 ]
 """
 Search Tools for Product Discovery
-Provides text-based, image-based, and VLM reasoning capabilities
+Provides text-based search via Search API and VLM style analysis
 """
 import base64
@@ -8,40 +8,24 @@ import logging
 from pathlib import Path
 from typing import Optional
+import requests
 from langchain_core.tools import tool
 from openai import OpenAI
 from app.config import settings
-from app.services.embedding_service import EmbeddingService
-from app.services.milvus_service import MilvusService
 logger = logging.getLogger(__name__)
-# Initialize services as singletons
-_embedding_service: Optional[EmbeddingService] = None
-_milvus_service: Optional[MilvusService] = None
 _openai_client: Optional[OpenAI] = None
-def get_embedding_service() -> EmbeddingService:
-    global _embedding_service
-    if _embedding_service is None:
-        _embedding_service = EmbeddingService()
-    return _embedding_service
-
-
-def get_milvus_service() -> MilvusService:
-    global _milvus_service
-    if _milvus_service is None:
-        _milvus_service = MilvusService()
-        _milvus_service.connect()
-    return _milvus_service
-
-
 def get_openai_client() -> OpenAI:
     global _openai_client
     if _openai_client is None:
-        _openai_client = OpenAI(api_key=settings.openai_api_key)
+        kwargs = {"api_key": settings.openai_api_key}
+        if settings.openai_api_base_url:
+            kwargs["base_url"] = settings.openai_api_base_url
+        _openai_client = OpenAI(**kwargs)
     return _openai_client
@@ -64,30 +48,26 @@ def search_products(query: str, limit: int = 5) -&gt; str:
     try:
         logger.info(f"Searching products: '{query}', limit: {limit}")
-        embedding_service = get_embedding_service()
-        milvus_service = get_milvus_service()
-
-        if not milvus_service.is_connected():
-            milvus_service.connect()
-
-        query_embedding = embedding_service.get_text_embedding(query)
-
-        results = milvus_service.search_similar_text(
-            query_embedding=query_embedding,
-            limit=min(limit, 20),
-            filters=None,
-            output_fields=[
-                "id",
-                "productDisplayName",
-                "gender",
-                "masterCategory",
-                "subCategory",
-                "articleType",
-                "baseColour",
-                "season",
-                "usage",
-            ],
-        )
+        url = f"{settings.search_api_base_url.rstrip('/')}/search/"
+        headers = {
+            "Content-Type": "application/json",
+            "X-Tenant-ID": settings.search_api_tenant_id,
+        }
+        payload = {
+            "query": query,
+            "size": min(limit, 20),
+            "from": 0,
+            "language": "zh",
+        }
+
+        response = requests.post(url, json=payload, headers=headers, timeout=60)
+
+        if response.status_code != 200:
+            logger.error(f"Search API error: {response.status_code} - {response.text}")
+            return f"Error searching products: API returned {response.status_code}"
+
+        data = response.json()
+        results = data.get("results", [])
         if not results:
             return "No products found matching your search."
@@ -95,131 +75,40 @@ def search_products(query: str, limit: int = 5) -&gt; str:
         output = f"Found {len(results)} product(s):\n\n"
         for idx, product in enumerate(results, 1):
-            output += f"{idx}. {product.get('productDisplayName', 'Unknown Product')}\n"
-            output += f"   ID: {product.get('id', 'N/A')}\n"
-            output += f"   Category: {product.get('masterCategory', 'N/A')} > {product.get('subCategory', 'N/A')} > {product.get('articleType', 'N/A')}\n"
-            output += f"   Color: {product.get('baseColour', 'N/A')}\n"
-            output += f"   Gender: {product.get('gender', 'N/A')}\n"
-
-            if product.get("season"):
-                output += f"   Season: {product.get('season')}\n"
-            if product.get("usage"):
-                output += f"   Usage: {product.get('usage')}\n"
-
-            if "distance" in product:
-                similarity = 1 - product["distance"]
-                output += f"   Relevance: {similarity:.2%}\n"
+            output += f"{idx}. {product.get('title', 'Unknown Product')}\n"
+            output += f"   ID: {product.get('spu_id', 'N/A')}\n"
+            output += f"   Category: {product.get('category_path', product.get('category_name', 'N/A'))}\n"
+            if product.get("vendor"):
+                output += f"   Brand: {product.get('vendor')}\n"
+            if product.get("price") is not None:
+                output += f"   Price: {product.get('price')}\n"
+
+            # 规格/颜色信息
+            specs = product.get("specifications", [])
+            if specs:
+                color_spec = next(
+                    (s for s in specs if s.get("name") == "color"),
+                    None,
+                )
+                if color_spec:
+                    output += f"   Color: {color_spec.get('value', 'N/A')}\n"
+
+            if product.get("relevance_score") is not None:
+                output += f"   Relevance: {product['relevance_score']:.2f}\n"
             output += "\n"
         return output.strip()
+    except requests.exceptions.RequestException as e:
+        logger.error(f"Error searching products (network): {e}", exc_info=True)
+        return f"Error searching products: {str(e)}"
     except Exception as e:
         logger.error(f"Error searching products: {e}", exc_info=True)
         return f"Error searching products: {str(e)}"
 @tool
-def search_by_image(image_path: str, limit: int = 5) -> str:
-    """Find similar fashion products using an image.
-
-    Use when users want visually similar items:
-    - User uploads an image and asks "find similar items"
-    - "Show me products that look like this"
-
-    Args:
-        image_path: Path to the image file
-        limit: Maximum number of results (1-20)
-
-    Returns:
-        Formatted string with similar products
-    """
-    try:
-        logger.info(f"Image search: '{image_path}', limit: {limit}")
-
-        img_path = Path(image_path)
-        if not img_path.exists():
-            return f"Error: Image file not found at '{image_path}'"
-
-        embedding_service = get_embedding_service()
-        milvus_service = get_milvus_service()
-
-        if not milvus_service.is_connected():
-            milvus_service.connect()
-
-        if (
-            not hasattr(embedding_service, "clip_client")
-            or embedding_service.clip_client is None
-        ):
-            embedding_service.connect_clip()
-
-        image_embedding = embedding_service.get_image_embedding(image_path)
-
-        if image_embedding is None:
-            return "Error: Failed to generate embedding for image"
-
-        results = milvus_service.search_similar_images(
-            query_embedding=image_embedding,
-            limit=min(limit + 1, 21),
-            filters=None,
-            output_fields=[
-                "id",
-                "image_path",
-                "productDisplayName",
-                "gender",
-                "masterCategory",
-                "subCategory",
-                "articleType",
-                "baseColour",
-                "season",
-                "usage",
-            ],
-        )
-
-        if not results:
-            return "No similar products found."
-
-        # Filter out the query image itself
-        query_id = img_path.stem
-        filtered_results = []
-        for result in results:
-            result_path = result.get("image_path", "")
-            if Path(result_path).stem != query_id:
-                filtered_results.append(result)
-            if len(filtered_results) >= limit:
-                break
-
-        if not filtered_results:
-            return "No similar products found."
-
-        output = f"Found {len(filtered_results)} visually similar product(s):\n\n"
-
-        for idx, product in enumerate(filtered_results, 1):
-            output += f"{idx}. {product.get('productDisplayName', 'Unknown Product')}\n"
-            output += f"   ID: {product.get('id', 'N/A')}\n"
-            output += f"   Category: {product.get('masterCategory', 'N/A')} > {product.get('subCategory', 'N/A')} > {product.get('articleType', 'N/A')}\n"
-            output += f"   Color: {product.get('baseColour', 'N/A')}\n"
-            output += f"   Gender: {product.get('gender', 'N/A')}\n"
-
-            if product.get("season"):
-                output += f"   Season: {product.get('season')}\n"
-            if product.get("usage"):
-                output += f"   Usage: {product.get('usage')}\n"
-
-            if "distance" in product:
-                similarity = 1 - product["distance"]
-                output += f"   Visual Similarity: {similarity:.2%}\n"
-
-            output += "\n"
-
-        return output.strip()
-
-    except Exception as e:
-        logger.error(f"Error in image search: {e}", exc_info=True)
-        return f"Error searching by image: {str(e)}"
-
-
-@tool
 def analyze_image_style(image_path: str) -> str:
     """Analyze a fashion product image using AI vision to extract detailed style information.
@@ -291,4 +180,4 @@ Provide a comprehensive yet concise description (3-4 sentences).&quot;&quot;&quot;
 def get_all_tools():
     """Get all available tools for the agent"""
-    return [search_products, search_by_image, analyze_image_style]
+    return [search_products, analyze_image_style]
@@ -1,76 +0,0 @@
-version: '3.5'
-
-services:
-  etcd:
-    container_name: milvus-etcd
-    image: quay.io/coreos/etcd:v3.5.5
-    environment:
-      - ETCD_AUTO_COMPACTION_MODE=revision
-      - ETCD_AUTO_COMPACTION_RETENTION=1000
-      - ETCD_QUOTA_BACKEND_BYTES=4294967296
-      - ETCD_SNAPSHOT_COUNT=50000
-    volumes:
-      - ./volumes/etcd:/etcd
-    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
-    healthcheck:
-      test: ["CMD", "etcdctl", "endpoint", "health"]
-      interval: 30s
-      timeout: 20s
-      retries: 3
-
-  minio:
-    container_name: milvus-minio
-    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
-    environment:
-      MINIO_ACCESS_KEY: minioadmin
-      MINIO_SECRET_KEY: minioadmin
-    ports:
-      - "9001:9001"
-      - "9000:9000"
-    volumes:
-      - ./volumes/minio:/minio_data
-    command: minio server /minio_data --console-address ":9001"
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
-      interval: 30s
-      timeout: 20s
-      retries: 3
-
-  standalone:
-    container_name: milvus-standalone
-    image: milvusdb/milvus:v2.4.0
-    command: ["milvus", "run", "standalone"]
-    security_opt:
-    - seccomp:unconfined
-    environment:
-      ETCD_ENDPOINTS: etcd:2379
-      MINIO_ADDRESS: minio:9000
-    volumes:
-      - ./volumes/milvus:/var/lib/milvus
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
-      interval: 30s
-      start_period: 90s
-      timeout: 20s
-      retries: 3
-    ports:
-      - "19530:19530"
-      - "9091:9091"
-    depends_on:
-      - "etcd"
-      - "minio"
-
-  attu:
-    container_name: milvus-attu
-    image: zilliz/attu:v2.4
-    environment:
-      MILVUS_URL: milvus-standalone:19530
-    ports:
-      - "8000:3000"
-    depends_on:
-      - "standalone"
-
-networks:
-  default:
-    name: milvus
-
-# OmniShopAgent  centOS 8 部署指南
+# OmniShopAgent CentOS 8 部署指南
 ## 一、环境要求
@@ -6,8 +6,8 @@
 |------|------|
 | 操作系统 | CentOS 8.x |
 | Python | 3.12+（LangChain 1.x 要求 3.10+） |
-| 内存 | 建议 8GB+（Milvus + CLIP 较占内存） |
-| 磁盘 | 建议 20GB+（含数据集） |
+| 内存 | 建议 4GB+ |
+| 磁盘 | 建议 10GB+ |
 ## 二、快速部署步骤
@@ -21,7 +21,6 @@ chmod +x scripts/*.sh
 该脚本会：
 - 安装系统依赖（gcc、openssl-devel 等）
-- 安装 Docker（用于 Milvus）
 - 安装 Python 3.12（conda 或源码编译）
 - 创建虚拟环境并安装 requirements.txt
@@ -59,17 +58,7 @@ make -j $(nproc)
 sudo make altinstall
 ```
-#### 步骤 3：安装 Docker
-
-```bash
-sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
-sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
-sudo systemctl enable docker && sudo systemctl start docker
-sudo usermod -aG docker $USER
-# 执行 newgrp docker 或重新登录
-```
-
-#### 步骤 4：创建虚拟环境并安装依赖
+#### 步骤 3：创建虚拟环境并安装依赖
 ```bash
 cd /path/to/shop_agent
@@ -79,46 +68,35 @@ pip install -U pip
 pip install -r requirements.txt
 ```
-#### 步骤 5：配置环境变量
+#### 步骤 4：配置环境变量
 ```bash
 cp .env.example .env
 # 编辑 .env，至少配置：
 # OPENAI_API_KEY=sk-xxx
-# MILVUS_HOST=localhost
-# MILVUS_PORT=19530
-# CLIP_SERVER_URL=grpc://localhost:51000
+# SEARCH_API_BASE_URL=http://120.76.41.98:6002
+# SEARCH_API_TENANT_ID=162
 ```
-## 三、数据准备
+## 三、数据准备（可选）
 ### 3.1 下载数据集
+如需图片风格分析功能，可下载 Kaggle 数据集：
+
 ```bash
 # 需先配置 Kaggle API：~/.kaggle/kaggle.json
 python scripts/download_dataset.py
 ```
-### 3.2 启动 Milvus 并索引数据
-
-```bash
-# 启动 Milvus
-./scripts/run_milvus.sh
-
-# 等待就绪后，创建索引
-python scripts/index_data.py
-```
-
 ## 四、启动服务
 ### 4.1 启动脚本说明
 | 脚本 | 用途 |
 |------|------|
-| `start.sh` | 主启动脚本：启动 Milvus + Streamlit |
-| `stop.sh` | 停止所有服务 |
-| `run_milvus.sh` | 仅启动 Milvus |
-| `run_clip.sh` | 仅启动 CLIP（图像搜索需此服务） |
+| `start.sh` | 主启动脚本：启动 Streamlit |
+| `stop.sh` | 停止 Streamlit |
 | `check_services.sh` | 健康检查 |
 ### 4.2 启动应用
@@ -127,14 +105,7 @@ python scripts/index_data.py
 # 方式 1：使用 start.sh（推荐）
 ./scripts/start.sh
-# 方式 2：分步启动
-# 终端 1：Milvus
-./scripts/run_milvus.sh
-
-# 终端 2：CLIP（图像搜索需要）
-./scripts/run_clip.sh
-
-# 终端 3：Streamlit
+# 方式 2：直接运行
 source venv/bin/activate
 streamlit run app.py --server.port=8501 --server.address=0.0.0.0
 ```
@@ -142,7 +113,6 @@ streamlit run app.py --server.port=8501 --server.address=0.0.0.0
 ### 4.3 访问地址
 - **Streamlit 应用**：http://服务器IP:8501
-- **Milvus Attu 管理界面**：http://服务器IP:8000
 ## 五、生产部署建议
@@ -153,7 +123,7 @@ streamlit run app.py --server.port=8501 --server.address=0.0.0.0
 ```ini
 [Unit]
 Description=OmniShopAgent Streamlit App
-After=network.target docker.service
+After=network.target
 [Service]
 Type=simple
@@ -194,7 +164,6 @@ server {
 ```bash
 sudo firewall-cmd --permanent --add-port=8501/tcp
-sudo firewall-cmd --permanent --add-port=19530/tcp
 sudo firewall-cmd --reload
 ```
@@ -203,14 +172,8 @@ sudo firewall-cmd --reload
 ### Q: Python 3.12 编译失败？
 A: 确保已安装 `openssl-devel`、`libffi-devel`，或直接使用 Miniconda。
-### Q: Docker 权限不足？
-A: 执行 `sudo usermod -aG docker $USER` 后重新登录。
-
-### Q: Milvus 启动超时？
-A: 首次启动需拉取镜像，可能较慢。可检查 `docker compose logs -f standalone`。
-
-### Q: 图像搜索不可用？
-A: 需单独启动 CLIP 服务：`./scripts/run_clip.sh`。
+### Q: Search API 连接失败？
+A: 检查 `.env` 中 `SEARCH_API_BASE_URL` 和 `SEARCH_API_TENANT_ID` 配置，确保网络可访问搜索服务。
 ### Q: 健康检查？
 A: 执行 `./scripts/check_services.sh` 查看各组件状态。
@@ -7,7 +7,7 @@ Agent 鍦 system prompt 涓彧鐪嬪埌鎶鑳芥憳瑕侊紝鎸夐渶鍔犺浇璇︾粏鎶鑳藉唴瀹
 | 鎶鑳 | 鑻辨枃鏍囪瘑 | 鑱岃矗 |
 |------|----------|------|
-| 鏌ユ壘鐩稿叧鍟嗗搧 | lookup_related | 鍩轰簬鏂囨湰/鍥剧墖鏌ユ壘鐩镐技鎴栫浉鍏冲晢鍝 |
+| 鏌ユ壘鐩稿叧鍟嗗搧 | lookup_related | 鍩轰簬鏂囨湰/鍥剧墖鏌ユ壘鐩镐技鎴栫浉鍏冲晢鍝侊紙鍥剧墖闇鍏堝垎鏋愰鏍硷級 |
 | 鎼滅储鍟嗗搧 | search_products | 鎸夎嚜鐒惰瑷鎻忚堪鎼滅储鍟嗗搧 |
 | 妫楠屽晢鍝 | check_product | 妫楠屽晢鍝佹槸鍚︾鍚堢敤鎴疯姹 |
 | 缁撴灉鍖呰 | result_packaging | 鏍煎紡鍖栥佹帓搴忋佺瓫閫夊苟鍛堢幇缁撴灉 |
@@ -24,7 +24,7 @@ Agent 鍦 system prompt 涓彧鐪嬪埌鎶鑳芥憳瑕侊紝鎸夐渶鍔犺浇璇︾粏鎶鑳藉唴瀹
 | **鏂瑰紡 A锛歝reate_agent + 鑷畾涔 Skill 涓棿浠** | 璐墿瀵艰喘绛変笟鍔 Agent | `langchain>=1.0`銆乣langgraph>=1.0` |
 | **鏂瑰紡 B锛欴eep Agents + SKILL.md** | 渚濊禆鏂囦欢绯荤粺銆佸鎶鑳界洰褰 | `deepagents` |
-璐墿瀵艰喘鍦烘櫙鎺ㄨ崘**鏂瑰紡 A**锛屾洿鏄撲笌鐜版湁 Milvus銆丆LIP 绛夋湇鍔￠泦鎴愩
+璐墿瀵艰喘鍦烘櫙鎺ㄨ崘**鏂瑰紡 A**锛屾洿鏄撲笌鐜版湁 Search API 绛夋湇鍔￠泦鎴愩
 ### 2.2 鏍稿績鎬濊矾锛歅rogressive Disclosure
@@ -58,7 +58,7 @@ class Skill(TypedDict):
 SKILLS: list[Skill] = [
     {
         "name": "lookup_related",
-        "description": "鏌ユ壘涓庢煇鍟嗗搧鐩稿叧鐨勫叾浠栧晢鍝侊紝鏀寔浠ュ浘鎼滃浘銆佹枃鏈浉浼笺佸悓鍝佺被鎺ㄨ崘銆",
+        "description": "鏌ユ壘涓庢煇鍟嗗搧鐩稿叧鐨勫叾浠栧晢鍝侊紝鏀寔鏂囨湰鐩镐技銆佸悓鍝佺被鎺ㄨ崘銆",
         "content": """# 鏌ユ壘鐩稿叧鍟嗗搧
 ## 閫傜敤鍦烘櫙
@@ -67,12 +67,11 @@ SKILLS: list[Skill] = [
 - 鐢ㄦ埛宸叉湁涓浠跺晢鍝侊紝鎯虫壘鐩稿叧娆
 ## 鎿嶄綔姝ラ
-1. **鏈夊浘鐗**锛氬厛璋冪敤 `analyze_image_style` 鐞嗚В椋庢牸锛屽啀璋冪敤 `search_by_image` 鎴 `search_products`
+1. **鏈夊浘鐗**锛氬厛璋冪敤 `analyze_image_style` 鐞嗚В椋庢牸锛屽啀璋冪敤 `search_products` 鐢ㄦ弿杩版悳绱
 2. **鏃犲浘鐗**锛氱敤 `search_products` 鎻忚堪鍝佺被+椋庢牸+棰滆壊
 3. 鍙粨鍚堜笂涓嬫枃涓殑鍟嗗搧 ID銆佸搧绫诲仛鍚屽搧绫绘帹鑽
 ## 鍙敤宸ュ叿
-- `search_by_image(image_path, limit)`锛氫互鍥炬悳鍥
 - `search_products(query, limit)`锛氭枃鏈悳绱
 - `analyze_image_style(image_path)`锛氬垎鏋愬浘鐗囬鏍""",
     },
@@ -225,15 +224,14 @@ class ShoppingSkillMiddleware(AgentMiddleware):
 from langchain.agents import create_agent
 from langgraph.checkpoint.memory import MemorySaver
-# 鍩虹宸ュ叿锛堟悳绱€佷互鍥炬悳鍥俱侀鏍煎垎鏋愮瓑锛
-from app.tools.search_tools import search_products, search_by_image, analyze_image_style
+# 鍩虹宸ュ叿锛堟悳绱€侀鏍煎垎鏋愮瓑锛
+from app.tools.search_tools import search_products, analyze_image_style
 agent = create_agent(
     model="gpt-4o-mini",
     tools=[
         load_skill,           # 鎶鑳藉姞杞
         search_products,
-        search_by_image,
         analyze_image_style,
     ],
     system_prompt="""浣犳槸鏅鸿兘鏃跺皻璐墿鍔╂墜銆傛牴鎹敤鎴烽渶姹傦紝鍏堝垽鏂娇鐢ㄥ摢涓妧鑳斤紝蹇呰鏃剁敤 load_skill 鍔犺浇鎶鑳借鎯呫
@@ -250,7 +248,7 @@ agent = create_agent(
 | 鑳藉姏 | 鎶鑳 | 宸ュ叿 |
 |------|------|------|
-| 鏌ユ壘鐩稿叧 | lookup_related | search_by_image, search_products, analyze_image_style |
+| 鏌ユ壘鐩稿叧 | lookup_related | search_products, analyze_image_style |
 | 鎼滅储鍟嗗搧 | search_products | search_products |
 | 妫楠屽晢鍝 | check_product | search_products锛堢敤 query 琛ㄨ揪绾︽潫锛 |
 | 缁撴灉鍖呰 | result_packaging | 鏃狅紙绾 prompt 绾︽潫锛 |
@@ -7,7 +7,7 @@ OmniShopAgent 是一个基于 **LangGraph** 和 **ReAct 模式** 的自主多模
 ### 核心特性
 - **自主工具选择与执行**：Agent 根据用户意图自主选择并调用工具
-- **多模态搜索**：支持文本搜索 + 图像搜索
+- **文本搜索**：通过 Search API 进行商品搜索
 - **对话上下文感知**：多轮对话中保持上下文记忆
 - **实时视觉分析**：基于 VLM 的图片风格分析
@@ -20,9 +20,7 @@ OmniShopAgent 是一个基于 **LangGraph** 和 **ReAct 模式** 的自主多模
 | 运行环境 | Python 3.12 |
 | Agent 框架 | LangGraph 1.x |
 | LLM 框架 | LangChain 1.x（支持任意 LLM，默认 gpt-4o-mini） |
-| 文本向量 | text-embedding-3-small |
-| 图像向量 | CLIP ViT-B/32 |
-| 向量数据库 | Milvus |
+| 搜索服务 | Search API (HTTP) |
 | 前端 | Streamlit |
 | 数据集 | Kaggle Fashion Products |
@@ -45,23 +43,21 @@ OmniShopAgent 是一个基于 **LangGraph** 和 **ReAct 模式** 的自主多模
 │  │  START → Agent → [Has tool_calls?] → Tools → Agent → END   │  │
 │  └───────────────────────────────────────────────────────────┘  │
 └─────────────────────────────────────────────────────────────────┘
-        │                    │                    │
-        ▼                    ▼                    ▼
-┌──────────────┐   ┌──────────────────┐   ┌─────────────────────┐
-│ search_      │   │ search_by_image   │   │ analyze_image_style  │
-│ products     │   │                   │   │ (OpenAI Vision)      │
-└──────┬───────┘   └────────┬─────────┘   └──────────┬───────────┘
-       │                    │                        │
-       ▼                    ▼                        ▼
+        │                    │
+        ▼                    ▼
+┌──────────────┐   ┌─────────────────────┐
+│ search_      │   │ analyze_image_style  │
+│ products     │   │ (OpenAI Vision)      │
+└──────┬───────┘   └──────────┬──────────┘
+       │                      │
+       ▼                      │
+┌──────────────────┐          │
+│   Search API     │          │
+│ (HTTP POST)      │          │
+└──────────────────┘          │
+                               ▼
 ┌─────────────────────────────────────────────────────────────────┐
-│           EmbeddingService (embedding_service.py)                 │
-│   OpenAI API (文本)  │  CLIP Server (图像)                       │
-└─────────────────────────────────────────────────────────────────┘
-                                    │
-                                    ▼
-┌─────────────────────────────────────────────────────────────────┐
-│              MilvusService (milvus_service.py)                    │
-│   text_embeddings 集合  │  image_embeddings 集合                 │
+│           OpenAI API (VLM 风格分析)                               │
 └─────────────────────────────────────────────────────────────────┘
 ```
@@ -140,12 +136,11 @@ def _build_graph(self):
 ```python
 system_prompt = """You are an intelligent fashion shopping assistant. You can:
 1. Search for products by text description (use search_products)
-2. Find visually similar products from images (use search_by_image)
-3. Analyze image style and attributes (use analyze_image_style)
+2. Analyze image style and attributes (use analyze_image_style)
 When a user asks about products:
 - For text queries: use search_products directly
-- For image uploads: decide if you need to analyze_image_style first, then search
+- For image uploads: use analyze_image_style first to understand the product, then use search_products with the extracted description
 - You can call multiple tools in sequence if needed
 - Always provide helpful, friendly responses
@@ -198,41 +193,38 @@ def chat(self, query: str, image_path: Optional[str] = None) -&gt; dict:
 ### 4.2 搜索工具实现（search_tools.py）
-#### 4.2.1 文本语义搜索
+#### 4.2.1 文本搜索（Search API）
 ```python
 @tool
 def search_products(query: str, limit: int = 5) -> str:
     """Search for fashion products using natural language descriptions."""
     try:
-        embedding_service = get_embedding_service()
-        milvus_service = get_milvus_service()
-
-        query_embedding = embedding_service.get_text_embedding(query)
-
-        results = milvus_service.search_similar_text(
-            query_embedding=query_embedding,
-            limit=min(limit, 20),
-            filters=None,
-            output_fields=[
-                "id", "productDisplayName", "gender", "masterCategory",
-                "subCategory", "articleType", "baseColour", "season", "usage",
-            ],
-        )
+        url = f"{settings.search_api_base_url.rstrip('/')}/search/"
+        headers = {
+            "Content-Type": "application/json",
+            "X-Tenant-ID": settings.search_api_tenant_id,
+        }
+        payload = {
+            "query": query,
+            "size": min(limit, 20),
+            "from": 0,
+            "language": "zh",
+        }
+
+        response = requests.post(url, json=payload, headers=headers, timeout=60)
+        data = response.json()
+        results = data.get("results", [])
         if not results:
             return "No products found matching your search."
         output = f"Found {len(results)} product(s):\n\n"
         for idx, product in enumerate(results, 1):
-            output += f"{idx}. {product.get('productDisplayName', 'Unknown Product')}\n"
-            output += f"   ID: {product.get('id', 'N/A')}\n"
-            output += f"   Category: {product.get('masterCategory')} > {product.get('subCategory')} > {product.get('articleType')}\n"
-            output += f"   Color: {product.get('baseColour')}\n"
-            output += f"   Gender: {product.get('gender')}\n"
-            if "distance" in product:
-                similarity = 1 - product["distance"]
-                output += f"   Relevance: {similarity:.2%}\n"
+            output += f"{idx}. {product.get('title', 'Unknown Product')}\n"
+            output += f"   ID: {product.get('spu_id', 'N/A')}\n"
+            output += f"   Category: {product.get('category_path', 'N/A')}\n"
+            output += f"   Price: {product.get('price')}\n"
             output += "\n"
         return output.strip()
@@ -240,38 +232,7 @@ def search_products(query: str, limit: int = 5) -&gt; str:
         return f"Error searching products: {str(e)}"
 ```
-#### 4.2.2 图像相似度搜索
-
-```python
-@tool
-def search_by_image(image_path: str, limit: int = 5) -> str:
-    """Find similar fashion products using an image."""
-    if not Path(image_path).exists():
-        return f"Error: Image file not found at '{image_path}'"
-
-    embedding_service = get_embedding_service()
-    milvus_service = get_milvus_service()
-
-    if not embedding_service.clip_client:
-        embedding_service.connect_clip()
-
-    image_embedding = embedding_service.get_image_embedding(image_path)
-
-    results = milvus_service.search_similar_images(
-        query_embedding=image_embedding,
-        limit=min(limit + 1, 21),
-        output_fields=[...],
-    )
-
-    # 过滤掉查询图像本身（如上传的是商品库中的图）
-    query_id = Path(image_path).stem
-    filtered_results = [r for r in results if Path(r.get("image_path", "")).stem != query_id]
-    filtered_results = filtered_results[:limit]
-
-
-```
-
-#### 4.2.3 视觉分析（VLM）
+#### 4.2.2 视觉分析（VLM）
 ```python
 @tool
@@ -310,161 +271,9 @@ Provide a comprehensive yet concise description (3-4 sentences).&quot;&quot;&quot;
 ---
-### 4.3 向量服务实现
-
-#### 4.3.1 EmbeddingService（embedding_service.py）
-
-```python
-class EmbeddingService:
-    def get_text_embedding(self, text: str) -> List[float]:
-        """OpenAI text-embedding-3-small"""
-        response = self.openai_client.embeddings.create(
-            input=text, model=self.text_embedding_model
-        )
-        return response.data[0].embedding
-
-    def get_image_embedding(self, image_path: Union[str, Path]) -> List[float]:
-        """CLIP 图像向量"""
-        if not self.clip_client:
-            raise RuntimeError("CLIP client not connected. Call connect_clip() first.")
-        result = self.clip_client.encode([str(image_path)])
-        if isinstance(result, np.ndarray):
-            embedding = result[0].tolist() if len(result.shape) > 1 else result.tolist()
-        else:
-            embedding = result[0].embedding.tolist()
-        return embedding
-
-    def get_text_embeddings_batch(self, texts: List[str], batch_size: int = 100) -> List[List[float]]:
-        """批量文本嵌入，用于索引"""
-        for i in range(0, len(texts), batch_size):
-            batch = texts[i : i + batch_size]
-            response = self.openai_client.embeddings.create(input=batch, ...)
-            embeddings = [item.embedding for item in response.data]
-            all_embeddings.extend(embeddings)
-        return all_embeddings
-```
-
-#### 4.3.2 MilvusService（milvus_service.py）
+### 4.3 Streamlit 前端（app.py）
-**文本集合 Schema：**
-
-```python
-schema = MilvusClient.create_schema(auto_id=False, enable_dynamic_field=True)
-schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
-schema.add_field(field_name="text", datatype=DataType.VARCHAR, max_length=2000)
-schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=self.text_dim)  # 1536
-schema.add_field(field_name="productDisplayName", datatype=DataType.VARCHAR, max_length=500)
-schema.add_field(field_name="gender", datatype=DataType.VARCHAR, max_length=50)
-schema.add_field(field_name="masterCategory", datatype=DataType.VARCHAR, max_length=100)
-# ... 更多元数据字段
-```
-
-**图像集合 Schema：**
-
-```python
-schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
-schema.add_field(field_name="image_path", datatype=DataType.VARCHAR, max_length=500)
-schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=self.image_dim)  # 512
-# ... 产品元数据
-```
-
-**相似度搜索：**
-
-```python
-def search_similar_text(self, query_embedding, limit=10, output_fields=None):
-    results = self.client.search(
-        collection_name=self.text_collection_name,
-        data=[query_embedding],
-        limit=limit,
-        output_fields=output_fields,
-    )
-    formatted_results = []
-    for hit in results[0]:
-        result = {"id": hit.get("id"), "distance": hit.get("distance")}
-        entity = hit.get("entity", {})
-        for field in output_fields:
-            if field in entity:
-                result[field] = entity.get(field)
-        formatted_results.append(result)
-    return formatted_results
-```
-
----
-
-### 4.4 数据索引脚本（index_data.py）
-
-#### 4.4.1 产品数据加载
-
-```python
-def _load_products_from_csv(self) -> Dict[int, Dict[str, Any]]:
-    products = {}
-    # 加载 images.csv 映射
-    with open(self.images_csv, "r") as f:
-        images_dict = {int(row["filename"].split(".")[0]): row["link"] for row in csv.DictReader(f)}
-
-    # 加载 styles.csv
-    with open(self.styles_csv, "r") as f:
-        for row in csv.DictReader(f):
-            product_id = int(row["id"])
-            products[product_id] = {
-                "id": product_id,
-                "gender": row.get("gender", ""),
-                "masterCategory": row.get("masterCategory", ""),
-                "subCategory": row.get("subCategory", ""),
-                "articleType": row.get("articleType", ""),
-                "baseColour": row.get("baseColour", ""),
-                "season": row.get("season", ""),
-                "usage": row.get("usage", ""),
-                "productDisplayName": row.get("productDisplayName", ""),
-                "imagePath": f"{product_id}.jpg",
-            }
-    return products
-```
-
-#### 4.4.2 文本索引
-
-```python
-def _create_product_text(self, product: Dict[str, Any]) -> str:
-    """构造产品文本用于 embedding"""
-    parts = [
-        product.get("productDisplayName", ""),
-        f"Gender: {product.get('gender', '')}",
-        f"Category: {product.get('masterCategory', '')} > {product.get('subCategory', '')}",
-        f"Type: {product.get('articleType', '')}",
-        f"Color: {product.get('baseColour', '')}",
-        f"Season: {product.get('season', '')}",
-        f"Usage: {product.get('usage', '')}",
-    ]
-    return " | ".join([p for p in parts if p and p != "Gender: " and p != "Color: "])
-```
-
-#### 4.4.3 批量索引流程
-
-```python
-# 文本索引
-texts = [self._create_product_text(p) for p in products]
-embeddings = self.embedding_service.get_text_embeddings_batch(texts, batch_size=50)
-milvus_data = [{
-    "id": product_id,
-    "text": text[:2000],
-    "embedding": embedding,
-    "productDisplayName": product["productDisplayName"][:500],
-    "gender": product["gender"][:50],
-    # ... 其他元数据
-} for product_id, text, embedding in zip(...)]
-self.milvus_service.insert_text_embeddings(milvus_data)
-
-# 图像索引
-image_paths = [self.image_dir / p["imagePath"] for p in products]
-embeddings = self.embedding_service.get_image_embeddings_batch(image_paths, batch_size=32)
-# 类似插入 image_embeddings 集合
-```
-
----
-
-### 4.5 Streamlit 前端（app.py）
-
-#### 4.5.1 会话与 Agent 初始化
+#### 4.3.1 会话与 Agent 初始化
 ```python
 def initialize_session():
@@ -478,7 +287,7 @@ def initialize_session():
         st.session_state.uploaded_image = None
 ```
-#### 4.5.2 产品信息解析
+#### 4.3.2 产品信息解析
 ```python
 def extract_products_from_response(response: str) -> list:
@@ -501,7 +310,7 @@ def extract_products_from_response(response: str) -&gt; list:
     return products
 ```
-#### 4.5.3 多轮对话中的图片引用
+#### 4.3.3 多轮对话中的图片引用
 ```python
 # 用户输入 "make them formal" 时，若上一条消息有图片，则引用该图片
@@ -514,28 +323,14 @@ if any(ref in query_lower for ref in [&quot;this&quot;, &quot;that&quot;, &quot;the image&quot;, &quot;it&quot;]):
 ---
-### 4.6 配置管理（config.py）
+### 4.4 配置管理（config.py）
 ```python
 class Settings(BaseSettings):
     openai_api_key: str
     openai_model: str = "gpt-4o-mini"
-    openai_embedding_model: str = "text-embedding-3-small"
-    clip_server_url: str = "grpc://localhost:51000"
-    milvus_uri: str = "http://localhost:19530"
-    text_collection_name: str = "text_embeddings"
-    image_collection_name: str = "image_embeddings"
-    text_dim: int = 1536
-    image_dim: int = 512
-
-    @property
-    def milvus_uri_absolute(self) -> str:
-        """支持 Milvus Standalone 和 Milvus Lite"""
-        if self.milvus_uri.startswith(("http://", "https://")):
-            return self.milvus_uri
-        if self.milvus_uri.startswith("./"):
-            return os.path.join(base_dir, self.milvus_uri[2:])
-        return self.milvus_uri
+    search_api_base_url: str = "http://120.76.41.98:6002"
+    search_api_tenant_id: str = "162"
     class Config:
         env_file = ".env"
@@ -547,35 +342,22 @@ class Settings(BaseSettings):
 ### 5.1 依赖服务
-```yaml
-# docker-compose.yml 提供
-- etcd: 元数据存储
-- minio: 对象存储
-- milvus-standalone: 向量数据库
-- attu: Milvus 管理界面
-```
+- **Search API**：外部搜索服务（HTTP）
+- **OpenAI API**：LLM 与 VLM 图像分析
 ### 5.2 启动流程
 ```bash
 # 1. 环境
 pip install -r requirements.txt
-cp .env.example .env  # 配置 OPENAI_API_KEY
+cp .env.example .env  # 配置 OPENAI_API_KEY、SEARCH_API_* 等
-# 2. 下载数据
+# 2. （可选）下载数据
 python scripts/download_dataset.py  # Kaggle Fashion Product Images Dataset
-# 3. 启动 CLIP 服务（需单独运行）
-python -m clip_server
-
-# 4. 启动 Milvus
-docker-compose up
-
-# 5. 索引数据
-python scripts/index_data.py
-
-# 6. 启动应用
+# 3. 启动应用
 streamlit run app.py
+# 或 ./scripts/start.sh
 ```
 ---
@@ -585,7 +367,6 @@ streamlit run app.py
 | 场景 | 用户输入 | Agent 行为 | 工具调用 |
 |------|----------|------------|----------|
 | 文本搜索 | "winter coats for women" | 直接文本搜索 | `search_products("winter coats women")` |
-| 图像搜索 | [上传图片] "find similar" | 图像相似度搜索 | `search_by_image(path)` |
 | 风格分析+搜索 | [上传复古夹克] "what style? find matching pants" | 先分析风格再搜索 | `analyze_image_style(path)` → `search_products("vintage pants casual")` |
 | 多轮上下文 | [第1轮] "show me red dresses"<br>[第2轮] "make them formal" | 结合上下文 | `search_products("red formal dresses")` |
@@ -595,10 +376,9 @@ streamlit run app.py
 1. **ReAct 模式**：Agent 自主决定何时调用工具、调用哪些工具、是否继续调用。
 2. **LangGraph 状态图**：`START → Agent → [条件] → Tools → Agent → END`，支持多轮工具调用。
-3. **多模态**：文本 + 图像 + VLM 分析，覆盖文本搜索、以图搜图、风格理解。
-4. **双向量集合**：Milvus 中 text_embeddings / image_embeddings 分别存储，支持不同模态的检索。
-5. **会话持久化**：`MemorySaver` + `thread_id` 实现多轮对话记忆。
-6. **格式约束**：System prompt 严格限制产品输出格式，便于前端解析和展示。
+3. **搜索与风格分析**：Search API 文本搜索 + VLM 图像风格分析。
+4. **会话持久化**：`MemorySaver` + `thread_id` 实现多轮对话记忆。
+5. **格式约束**：System prompt 严格限制产品输出格式，便于前端解析和展示。
 ---
@@ -611,8 +391,6 @@ OmniShopAgent/
 │   │   └── shopping_agent.py
 │   ├── config.py
 │   ├── services/
-│   │   ├── embedding_service.py
-│   │   └── milvus_service.py
 │   └── tools/
 │       └── search_tools.py
 ├── scripts/
@@ -0,0 +1,1651 @@
+# 搜索API接口对接指南
+
+本文档为搜索服务的使用方提供完整的API对接指南，包括接口说明、请求参数、响应格式和使用示例。
+
+## 目录
+
+1. [快速开始](#快速开始)
+   - 1.1 [基础信息](#11-基础信息)
+   - 1.2 [最简单的搜索请求](#12-最简单的搜索请求)
+   - 1.3 [带过滤与分页的搜索](#13-带过滤与分页的搜索)
+   - 1.4 [开启分面的搜索](#14-开启分面的搜索)
+
+2. [接口概览](#接口概览)
+
+3. [搜索接口](#搜索接口)
+   - 3.1 [接口信息](#31-接口信息)
+   - 3.2 [请求参数](#32-请求参数)
+   - 3.3 [过滤器详解](#33-过滤器详解)
+   - 3.4 [分面配置](#34-分面配置)
+   - 3.5 [SKU筛选维度](#35-sku筛选维度)
+   - 3.6 [布尔表达式语法](#36-布尔表达式语法)
+   - 3.7 [搜索建议接口](#37-搜索建议接口)
+   - 3.8 [即时搜索接口](#38-即时搜索接口)
+   - 3.9 [获取单个文档](#39-获取单个文档)
+
+4. [响应格式说明](#响应格式说明)
+   - 4.1 [标准响应结构](#41-标准响应结构)
+   - 4.2 [响应字段说明](#42-响应字段说明)
+   - 4.2.1 [query_info 说明](#421-query_info-说明)
+   - 4.3 [SpuResult字段说明](#43-spuresult字段说明)
+   - 4.4 [SkuResult字段说明](#44-skuresult字段说明)
+   - 4.5 [多语言字段说明](#45-多语言字段说明)
+
+5. [索引接口](#索引接口)
+   - 5.0 [为租户创建索引](#50-为租户创建索引)
+   - 5.1 [全量索引接口](#51-全量索引接口)
+   - 5.2 [增量索引接口](#52-增量索引接口)
+   - 5.3 [查询文档接口](#53-查询文档接口)
+   - 5.4 [索引健康检查接口](#54-索引健康检查接口)
+
+6. [管理接口](#管理接口)
+   - 6.1 [健康检查](#61-健康检查)
+   - 6.2 [获取配置](#62-获取配置)
+   - 6.3 [索引统计](#63-索引统计)
+
+7. [常见场景示例](#常见场景示例)
+   - 7.1 [基础搜索与排序](#71-基础搜索与排序)
+   - 7.2 [过滤搜索](#72-过滤搜索)
+   - 7.3 [分面搜索](#73-分面搜索)
+   - 7.4 [规格过滤与分面](#74-规格过滤与分面)
+   - 7.5 [SKU筛选](#75-sku筛选)
+   - 7.6 [布尔表达式搜索](#76-布尔表达式搜索)
+   - 7.7 [分页查询](#77-分页查询)
+
+8. [数据模型](#数据模型)
+   - 8.1 [商品字段定义](#81-商品字段定义)
+   - 8.2 [字段类型速查](#82-字段类型速查)
+   - 8.3 [常用字段列表](#83-常用字段列表)
+   - 8.4 [支持的分析器](#84-支持的分析器)
+
+---
+
+## 快速开始
+
+### 1.1 基础信息
+
+- **Base URL**: `http://120.76.41.98:6002`
+- **协议**: HTTP/HTTPS
+- **数据格式**: JSON
+- **字符编码**: UTF-8
+- **请求方法**: POST（搜索接口）
+
+**重要提示**: `tenant_id` 通过 HTTP Header `X-Tenant-ID` 传递，不在请求体中。
+
+### 1.2 最简单的搜索请求
+
+```bash
+curl -X POST "http://120.76.41.98:6002/search/" \
+  -H "Content-Type: application/json" \
+  -H "X-Tenant-ID: 162" \
+  -d '{"query": "芭比娃娃"}'
+```
+
+### 1.3 带过滤与分页的搜索
+
+```bash
+curl -X POST "http://120.76.41.98:6002/search/" \
+  -H "Content-Type: application/json" \
+  -H "X-Tenant-ID: 162" \
+  -d '{
+    "query": "芭比娃娃",
+    "size": 5,
+    "from": 10,
+    "range_filters": {
+      "min_price": {
+        "gte": 50,
+        "lte": 200
+      },
+      "create_time": {
+        "gte": "2020-01-01T00:00:00Z" 
+      }
+    },
+    "sort_by": "price",
+    "sort_order": "asc"
+  }'
+```
+
+### 1.4 开启分面的搜索
+
+```bash
+curl -X POST "http://120.76.41.98:6002/search/" \
+  -H "Content-Type: application/json" \
+  -H "X-Tenant-ID: 162" \
+  -d '{
+    "query": "芭比娃娃",
+    "facets": [
+      {"field": "category1_name", "size": 10, "type": "terms"},
+      {"field": "specifications.color", "size": 10, "type": "terms"},
+      {"field": "specifications.size", "size": 10, "type": "terms"}
+    ],
+    "min_score": 0.2
+  }'
+```
+
+---
+
+## 接口概览
+
+| 接口 | HTTP Method | Endpoint | 说明 |
+|------|------|------|------|
+| 搜索 | POST | `/search/` | 执行搜索查询 |
+| 搜索建议 | GET | `/search/suggestions` | 搜索建议（框架，暂未实现） ⚠️ TODO |
+| 即时搜索 | GET | `/search/instant` | 边输入边搜索（框架） ⚠️ TODO |
+| 获取文档 | GET | `/search/{doc_id}` | 获取单个文档 |
+| 全量索引 | POST | `/indexer/reindex` | 全量索引接口（导入数据，不删除索引） |
+| 增量索引 | POST | `/indexer/index` | 增量索引接口（指定SPU ID列表进行索引，支持自动检测删除和显式删除） |
+| 查询文档 | POST | `/indexer/documents` | 查询SPU文档数据（不写入ES） |
+| 索引健康检查 | GET | `/indexer/health` | 检查索引服务状态 |
+| 健康检查 | GET | `/admin/health` | 服务健康检查 |
+| 获取配置 | GET | `/admin/config` | 获取租户配置 |
+| 索引统计 | GET | `/admin/stats` | 获取索引统计信息 |
+
+---
+
+## 搜索接口
+
+### 3.1 接口信息
+
+- **端点**: `POST /search/`
+- **描述**: 执行文本搜索查询，支持多语言、布尔表达式、过滤器和分面搜索
+
+### 3.2 请求参数
+
+#### 完整请求体结构
+
+```json
+{
+  "query": "string (required)",
+  "size": 10,
+  "from": 0,
+  "language": "zh",
+  "filters": {},
+  "range_filters": {},
+  "facets": [],
+  "sort_by": "string",
+  "sort_order": "desc",
+  "min_score": 0.0,
+  "sku_filter_dimension": ["string"],
+  "debug": false,
+  "enable_rerank": false,
+  "rerank_query_template": "{query}",
+  "rerank_doc_template": "{title}",
+  "user_id": "string",
+  "session_id": "string"
+}
+```
+
+#### 参数详细说明
+
+| 参数 | 类型 | 必填 | 默认值 | 说明 |
+|------|------|------|--------|------|
+| `query` | string | Y | - | 搜索查询字符串，支持布尔表达式（AND, OR, RANK, ANDNOT） |
+| `size` | integer | N | 10 | 返回结果数量（1-100） |
+| `from` | integer | N | 0 | 分页偏移量（用于分页） |
+| `language` | string | N | "zh" | 返回语言：`zh`（中文）或 `en`（英文）。后端会根据此参数选择对应的中英文字段返回 |
+| `filters` | object | N | null | 精确匹配过滤器（见[过滤器详解](#33-过滤器详解)） |
+| `range_filters` | object | N | null | 数值范围过滤器（见[过滤器详解](#33-过滤器详解)） |
+| `facets` | array | N | null | 分面配置（见[分面配置](#34-分面配置)） |
+| `sort_by` | string | N | null | 排序字段名。支持：`price`（价格）、`sales`（销量）、`create_time`（创建时间）、`update_time`（更新时间）。默认按相关性排序 |
+| `sort_order` | string | N | "desc" | 排序方向：`asc`（升序）或 `desc`（降序）。注意：`price`+`asc`=价格从低到高，`price`+`desc`=价格从高到低（后端自动映射为min_price或max_price） |
+| `min_score` | float | N | null | 最小相关性分数阈值 |
+| `sku_filter_dimension` | array[string] | N | null | 子SKU筛选维度列表（见[SKU筛选维度](#35-sku筛选维度)） |
+| `debug` | boolean | N | false | 是否返回调试信息 |
+| `enable_rerank` | boolean | N | false | 是否开启重排（调用外部重排服务对 ES 结果进行二次排序）。开启后若 `from+size<=rerank_window` 才会触发重排 |
+| `rerank_query_template` | string | N | null | 重排 query 模板（可选）。支持 `{query}` 占位符；不传则使用服务端配置 |
+| `rerank_doc_template` | string | N | null | 重排 doc 模板（可选）。支持 `{title} {brief} {vendor} {description} {category_path}`；不传则使用服务端配置 |
+| `user_id` | string | N | null | 用户ID（用于个性化，预留） |
+| `session_id` | string | N | null | 会话ID（用于分析，预留） |
+
+### 3.3 过滤器详解
+
+#### 3.3.1 精确匹配过滤器 (filters)
+
+用于精确匹配或多值匹配。对于普通字段，数组表示 OR 逻辑（匹配任意一个值）；对于 specifications 字段，按维度分组处理。**任意字段名加 `_all` 后缀**表示多值 AND 逻辑（必须同时匹配所有值）。
+
+**格式**:
+```json
+{
+  "filters": {
+    "category_name": "手机",                      // 可以为单值 或者 数组 匹配数组中任意一个（OR）
+    "category1_name": "服装",                    // 可以为单值 或者 数组 匹配数组中任意一个（OR）
+    "category2_name": "男装",                    // 可以为单值 或者 数组 匹配数组中任意一个（OR）
+    "category3_name": "衬衫",                    // 可以为单值 或者 数组 匹配数组中任意一个（OR）
+    "vendor.zh.keyword": ["奇乐", "品牌A"],      // 可以为单值 或者 数组 匹配数组中任意一个（OR）
+    "tags": "手机",                              // 可以为单值 或者 数组 匹配数组中任意一个（OR）
+    "tags_all": ["手机", "促销", "新品"],        // *_all：多值为 AND，必须同时包含所有标签
+    "category1_name_all": ["服装", "男装"],     // 同上，适用于任意可过滤字段
+    // specifications 嵌套过滤（特殊格式）
+    "specifications": {
+      "name": "color",
+      "value": "white"
+    }
+  }
+}
+```
+
+**支持的值类型**:
+- 字符串：精确匹配
+- 整数：精确匹配
+- 布尔值：精确匹配
+- 数组：匹配任意值（OR 逻辑）；若字段名以 `_all` 结尾，则数组表示 AND 逻辑（必须同时匹配所有值）
+- 对象：specifications 嵌套过滤（见下文）
+
+**`*_all` 语义（多值 AND）**:
+- 任意过滤字段均可使用 `_all` 后缀，对应 ES 字段名为去掉 `_all` 后的名称。
+- 例如：`tags_all: ["A", "B"]` 表示文档的 `tags` 必须**同时包含** A 和 B；`vendor.zh.keyword_all: ["奇乐", "品牌A"]` 表示同时匹配两个品牌（通常用于 keyword 多值场景）。
+- `specifications_all`：传列表 `[{"name":"color","value":"white"},{"name":"size","value":"256GB"}]` 时，表示所有列出的规格条件都要满足（与 `specifications` 多维度时的 AND 一致；若同维度多值则要求文档同时满足多个值，一般用于嵌套多值场景）。
+
+**Specifications 嵌套过滤**:
+
+`specifications` 是嵌套字段，支持按规格名称和值进行过滤。
+
+**单个规格过滤**:
+```json
+{
+  "filters": {
+    "specifications": {
+      "name": "color",
+      "value": "white"
+    }
+  }
+}
+```
+查询规格名称为"color"且值为"white"的商品。
+
+**多个规格过滤（按维度分组）**:
+```json
+{
+  "filters": {
+    "specifications": [
+      {"name": "color", "value": "white"},
+      {"name": "size", "value": "256GB"}
+    ]
+  }
+}
+```
+查询同时满足所有规格的商品（color=white **且** size=256GB）。
+
+**相同维度的多个值（OR 逻辑）**:
+```json
+{
+  "filters": {
+    "specifications": [
+      {"name": "size", "value": "3"},
+      {"name": "size", "value": "4"},
+      {"name": "size", "value": "5"},
+      {"name": "color", "value": "green"}
+    ]
+  }
+}
+```
+查询满足 (size=3 **或** size=4 **或** size=5) **且** color=green 的商品。
+
+**过滤逻辑说明**:
+- **不同维度**（不同的 `name`）之间是 **AND** 关系（求交集）
+- **相同维度**（相同的 `name`）的多个值之间是 **OR** 关系（求并集）
+
+**常用过滤字段**（详见[常用字段列表](#83-常用字段列表)）:
+- `category_name`: 类目名称
+- `category1_name`, `category2_name`, `category3_name`: 多级类目
+- `category_id`: 类目ID
+- `vendor.zh.keyword`, `vendor.en.keyword`: 供应商/品牌（使用keyword子字段）
+- `tags`: 标签（keyword类型，支持数组）
+- `option1_name`, `option2_name`, `option3_name`: 选项名称
+- `specifications`: 规格过滤（嵌套字段，格式见上文）
+- 以上任意字段均可加 `_all` 后缀表示多值 AND，如 `tags_all`、`category1_name_all`。
+
+#### 3.3.2 范围过滤器 (range_filters)
+
+用于数值字段的范围过滤。
+
+**格式**:
+```json
+{
+  "range_filters": {
+    "min_price": {
+      "gte": 50,    // 大于等于
+      "lte": 200    // 小于等于
+    },
+    "max_price": {
+      "gt": 100     // 大于
+    },
+    "create_time": {
+      "gte": "2024-01-01T00:00:00Z"  // 日期时间字符串
+    }
+  }
+}
+```
+
+**支持的操作符**:
+- `gte`: 大于等于 (>=)
+- `gt`: 大于 (>)
+- `lte`: 小于等于 (<=)
+- `lt`: 小于 (<)
+
+**注意**: 至少需要指定一个操作符。
+
+**常用范围字段**（详见[常用字段列表](#83-常用字段列表)）:
+- `min_price`: 最低价格
+- `max_price`: 最高价格
+- `compare_at_price`: 原价
+- `create_time`: 创建时间
+- `update_time`: 更新时间
+
+### 3.4 分面配置
+
+用于生成分面统计（分组聚合），常用于构建筛选器UI。
+
+#### 3.4.1 配置格式
+
+```json
+{
+  "facets": [
+    {
+      "field": "category1_name",
+      "size": 15,
+      "type": "terms",
+      "disjunctive": false
+    },
+    {
+      "field": "brand_name",
+      "size": 10,
+      "type": "terms",
+      "disjunctive": true
+    },
+    {
+      "field": "specifications.color",
+      "size": 20,
+      "type": "terms",
+      "disjunctive": true
+    },
+    {
+      "field": "min_price",
+      "type": "range",
+      "ranges": [
+        {"key": "0-50", "to": 50},
+        {"key": "50-100", "from": 50, "to": 100},
+        {"key": "100-200", "from": 100, "to": 200},
+        {"key": "200+", "from": 200}
+      ]
+    }
+  ]
+}
+```
+
+#### 3.4.2 Facet 字段说明
+
+| 字段 | 类型 | 必填 | 默认值 | 说明 |
+|------|------|------|--------|------|
+| `field` | string | 是 | - | 分面字段名 |
+| `size` | int | 否 | 10 | 返回的分面值数量（1-100） |
+| `type` | string | 否 | "terms" | 分面类型：`terms`（词条聚合）或 `range`（范围聚合） |
+| `disjunctive` | bool | 否 | false | 是否支持多选（disjunctive faceting）。启用后，选中该分面的过滤器时，仍会显示其他可选项 |
+| `ranges` | array | 否 | null | 范围配置（仅 `type="range"` 时需要） |
+
+#### 3.4.3 disjunctive字段说明
+
+**重要特性**: `disjunctive` 字段控制分面的行为模式。启用后，选中该分面的过滤器时，仍会显示其他可选项
+
+**标准模式 (disjunctive: false)**:
+- **行为**: 选中某个分面值后，该分面只显示选中的值
+- **适用场景**: 层级类目、互斥选择
+- **示例**: 类目下钻（玩具 > 娃娃 > 芭比）
+
+**Multi-Select 模式 (disjunctive: true)** ⭐:
+- **行为**: 选中某个分面值后，该分面仍显示所有可选项
+- **适用场景**: 颜色、品牌、尺码等可切换属性
+- **示例**: 选择了"红色"后，仍能看到"蓝色"、"绿色"等选项
+
+**推荐配置**:
+
+| 分面类型 | disjunctive | 原因 |
+|---------|-------------|------|
+| 颜色 | `true` | 用户需要切换颜色 |
+| 品牌 | `true` | 用户需要比较品牌 |
+| 尺码 | `true` | 用户需要查看其他尺码 |
+| 类目 | `false` | 层级下钻 |
+| 价格区间 | `false` | 互斥选择 |
+
+#### 3.4.4 规格分面说明
+
+`specifications` 是嵌套字段，支持两种分面模式：
+
+**模式1：所有规格名称的分面**:
+```json
+{
+  "facets": [
+    {
+      "field": "specifications",
+      "size": 10,
+      "type": "terms"
+    }
+  ]
+}
+```
+返回所有规格名称（name）及其对应的值（value）列表。每个 name 会生成一个独立的分面结果。
+
+**模式2：指定规格名称的分面**:
+```json
+{
+  "facets": [
+    {
+      "field": "specifications.color",
+      "size": 20,
+      "type": "terms",
+      "disjunctive": true
+    },
+    {
+      "field": "specifications.size",
+      "size": 15,
+      "type": "terms",
+      "disjunctive": true
+    }
+  ]
+}
+```
+只返回指定规格名称的值列表。格式：`specifications.{name}`，其中 `{name}` 是规格名称（如"color"、"size"、"material"）。
+
+**返回格式示例**:
+```json
+{
+  "facets": [
+    {
+      "field": "specifications.color",
+      "label": "color",
+      "type": "terms",
+      "values": [
+        {"value": "white", "count": 50, "selected": true},  // ✓ selected 字段由后端标记
+        {"value": "black", "count": 30, "selected": false},
+        {"value": "red", "count": 20, "selected": false}
+      ]
+    },
+    {
+      "field": "specifications.size",
+      "label": "size",
+      "type": "terms",
+      "values": [
+        {"value": "256GB", "count": 40, "selected": false},
+        {"value": "512GB", "count": 20, "selected": false}
+      ]
+    }
+  ]
+}
+```
+
+### 3.5 SKU筛选维度
+
+**功能说明**:
+`sku_filter_dimension` 用于控制搜索列表页中 **每个 SPU 下方可切换的子款式（子 SKU）维度**，为字符串列表。  
+在店铺的 **主题装修配置** 中，商家可以为店铺设置一个或多个子款式筛选维度（例如 `color`、`size`），前端列表页会在每个 SPU 下展示这些维度对应的子 SKU 列表，用户可以通过点击不同维度值（如不同颜色）来切换展示的子款式。  
+当指定 `sku_filter_dimension` 后，后端会根据店铺的这项配置，从所有 SKU 中筛选出这些维度组合对应的子 SKU 数据：系统会按指定维度**组合**对 SKU 进行分组，每个维度组合只返回第一个 SKU（从简实现，选择该组合下的第一款），其余不在这些维度组合中的子 SKU 将不返回。
+
+**支持的维度值**:
+1. **直接选项字段**: `option1`、`option2`、`option3`
+   - 直接使用对应的 `option1_value`、`option2_value`、`option3_value` 字段进行分组
+   
+2. **规格/选项名称**: 通过 `option1_name`、`option2_name`、`option3_name` 匹配
+   - 例如：如果 `option1_name` 为 `"color"`，则可以使用 `sku_filter_dimension: ["color"]` 来按颜色分组
+
+**示例**:
+
+**按颜色筛选（假设 option1_name = "color"）**:
+```json
+{
+  "query": "芭比娃娃",
+  "sku_filter_dimension": ["color"]
+}
+```
+
+**按选项1筛选**:
+```json
+{
+  "query": "芭比娃娃",
+  "sku_filter_dimension": ["option1"]
+}
+```
+
+**按颜色 + 尺寸组合筛选（假设 option1_name = "color", option2_name = "size"）**:
+```json
+{
+  "query": "芭比娃娃",
+  "sku_filter_dimension": ["color", "size"]
+}
+```
+
+### 3.6 布尔表达式语法
+
+搜索查询支持布尔表达式，提供更灵活的搜索能力。
+
+**支持的操作符**:
+
+| 操作符 | 描述 | 示例 |
+|--------|------|------|
+| `AND` | 所有词必须匹配 | `玩具 AND 乐高` |
+| `OR` | 任意词匹配 | `芭比 OR 娃娃` |
+| `ANDNOT` | 排除特定词 | `玩具 ANDNOT 电动` |
+| `RANK` | 排序加权（不强制匹配） | `玩具 RANK 乐高` |
+| `()` | 分组 | `玩具 AND (乐高 OR 芭比)` |
+
+**操作符优先级**（从高到低）:
+1. `()` - 括号
+2. `ANDNOT` - 排除
+3. `AND` - 与
+4. `OR` - 或
+5. `RANK` - 排序
+
+**示例**:
+```
+"芭比娃娃"                    // 简单查询
+"玩具 AND 乐高"               // AND 查询
+"芭比 OR 娃娃"                // OR 查询
+"玩具 ANDNOT 电动"            // 排除查询
+"玩具 AND (乐高 OR 芭比)"      // 复杂查询
+```
+
+### 3.7 搜索建议接口
+
+> ⚠️ **TODO**: 此接口当前为框架实现，功能暂未实现，仅返回空结果。接口和响应格式已经固定，可平滑扩展。
+
+- **端点**: `GET /search/suggestions`
+- **描述**: 返回搜索建议（自动补全/热词）。当前为框架实现，接口和响应格式已经固定，可平滑扩展。
+
+#### 查询参数
+
+| 参数 | 类型 | 必填 | 默认值 | 描述 |
+|------|------|------|--------|------|
+| `q` | string | Y | - | 查询字符串（至少 1 个字符） |
+| `size` | integer | N | 5 | 返回建议数量（1-20） |
+| `types` | string | N | `query` | 建议类型（逗号分隔）：`query`, `product`, `category`, `brand` |
+
+#### 响应示例
+
+```json
+{
+  "query": "芭",
+  "suggestions": [
+    {
+      "text": "芭比娃娃",
+      "type": "query",
+      "highlight": "<em>芭</em>比娃娃",
+      "popularity": 850
+    }
+  ],
+  "took_ms": 5
+}
+```
+
+#### 请求示例
+
+```bash
+curl "http://localhost:6002/search/suggestions?q=芭&size=5&types=query,product"
+```
+
+### 3.8 即时搜索接口
+
+> ⚠️ **TODO**: 此接口当前为框架实现，功能暂未实现，调用标准搜索接口。后续需要优化即时搜索性能（添加防抖/节流、实现结果缓存、简化返回字段）。
+
+- **端点**: `GET /search/instant`
+- **描述**: 边输入边搜索，采用轻量参数响应当前输入。底层复用标准搜索能力。
+
+#### 查询参数
+
+| 参数 | 类型 | 必填 | 默认值 | 描述 |
+|------|------|------|--------|------|
+| `q` | string | Y | - | 搜索查询（至少 2 个字符） |
+| `size` | integer | N | 5 | 返回结果数量（1-20） |
+
+#### 请求示例
+
+```bash
+curl "http://localhost:6002/search/instant?q=玩具&size=5"
+```
+
+### 3.9 获取单个文档
+
+- **端点**: `GET /search/{doc_id}`
+- **描述**: 根据文档 ID 获取单个商品详情，用于点击结果后的详情页或排查问题。
+
+#### 路径参数
+
+| 参数 | 类型 | 描述 |
+|------|------|------|
+| `doc_id` | string | 商品或文档 ID |
+
+#### 响应示例
+
+```json
+{
+  "id": "12345",
+  "source": {
+    "title": {
+      "zh": "芭比时尚娃娃"
+    },
+    "min_price": 89.99,
+    "category1_name": "玩具"
+  }
+}
+```
+
+#### 请求示例
+
+```bash
+curl "http://localhost:6002/search/12345"
+```
+
+---
+
+## 响应格式说明
+
+### 4.1 标准响应结构
+
+```json
+{
+  "results": [
+    {
+      "spu_id": "12345",
+      "title": "芭比时尚娃娃",
+      "brief": "高品质芭比娃娃",
+      "description": "详细描述...",
+      "vendor": "美泰",
+      "category": "玩具",
+      "category_path": "玩具/娃娃/时尚",
+      "category_name": "时尚",
+      "category_id": "cat_001",
+      "category_level": 3,
+      "category1_name": "玩具",
+      "category2_name": "娃娃",
+      "category3_name": "时尚",
+      "tags": ["娃娃", "玩具", "女孩"],
+      "price": 89.99,
+      "compare_at_price": 129.99,
+      "currency": "USD",
+      "image_url": "https://example.com/image.jpg",
+      "in_stock": true,
+      "sku_prices": [89.99, 99.99, 109.99],
+      "sku_weights": [100, 150, 200],
+      "sku_weight_units": ["g", "g", "g"],
+      "total_inventory": 500,
+      "option1_name": "color",
+      "option2_name": "size",
+      "option3_name": null,
+      "specifications": [
+        {"sku_id": "sku_001", "name": "color", "value": "pink"},
+        {"sku_id": "sku_001", "name": "size", "value": "standard"}
+      ],
+      "skus": [
+        {
+          "sku_id": "67890",
+          "price": 89.99,
+          "compare_at_price": 129.99,
+          "sku": "BARBIE-001",
+          "stock": 100,
+          "weight": 0.1,
+          "weight_unit": "kg",
+          "option1_value": "pink",
+          "option2_value": "standard",
+          "option3_value": null,
+          "image_src": "https://example.com/sku1.jpg"
+        }
+      ],
+      "relevance_score": 8.5
+    }
+  ],
+  "total": 118,
+  "max_score": 8.5,
+  "facets": [
+    {
+      "field": "category1_name",
+      "label": "category1_name",
+      "type": "terms",
+      "values": [
+        {
+          "value": "玩具",
+          "label": "玩具",
+          "count": 85,
+          "selected": false
+        }
+      ]
+    },
+    {
+      "field": "specifications.color",
+      "label": "color",
+      "type": "terms",
+      "values": [
+        {
+          "value": "pink",
+          "label": "pink",
+          "count": 30,
+          "selected": false
+        }
+      ]
+    }
+  ],
+  "query_info": {
+    "original_query": "芭比娃娃",
+    "query_normalized": "芭比娃娃",
+    "rewritten_query": "芭比娃娃",
+    "detected_language": "zh",
+    "translations": {
+      "en": "barbie doll"
+    },
+    "domain": "default"
+  },
+  "suggestions": [],
+  "related_searches": [],
+  "took_ms": 45,
+  "performance_info": null,
+  "debug_info": null
+}
+```
+
+### 4.2 响应字段说明
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `results` | array | 搜索结果列表（SpuResult对象数组） |
+| `results[].spu_id` | string | SPU ID |
+| `results[].title` | string | 商品标题 |
+| `results[].price` | float | 价格（min_price） |
+| `results[].skus` | array | SKU列表（如果指定了`sku_filter_dimension`，则按维度过滤后的SKU） |
+| `results[].relevance_score` | float | 相关性分数 |
+| `total` | integer | 匹配的总文档数 |
+| `max_score` | float | 最高相关性分数 |
+| `facets` | array | 分面统计结果 |
+| `query_info` | object | query处理信息 |
+| `took_ms` | integer | 搜索耗时（毫秒） |
+
+#### 4.2.1 query_info 说明
+
+`query_info` 包含本次搜索的查询解析与处理结果：
+
+| 子字段 | 类型 | 说明 |
+|--------|------|------|
+| `original_query` | string | 用户原始查询 |
+| `query_normalized` | string | 归一化后的查询（去空白、大小写等预处理，用于后续解析与改写） |
+| `rewritten_query` | string | 重写后的查询（同义词/词典扩展等） |
+| `detected_language` | string | 检测到的查询语言（如 `zh`、`en`） |
+| `translations` | object | 翻译结果，键为语言代码，值为翻译文本 |
+| `domain` | string | 查询域（如 `default`、`title`、`brand` 等） |
+
+### 4.3 SpuResult字段说明
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `spu_id` | string | SPU ID |
+| `title` | string | 商品标题（根据language参数自动选择 `title.zh` 或 `title.en`） |
+| `brief` | string | 商品短描述（根据language参数自动选择） |
+| `description` | string | 商品详细描述（根据language参数自动选择） |
+| `vendor` | string | 供应商/品牌（根据language参数自动选择） |
+| `category` | string | 类目（兼容字段，等同于category_name） |
+| `category_path` | string | 类目路径（多级，用于面包屑，根据language参数自动选择） |
+| `category_name` | string | 类目名称（展示用，根据language参数自动选择） |
+| `category_id` | string | 类目ID |
+| `category_level` | integer | 类目层级（1/2/3） |
+| `category1_name` | string | 一级类目名称 |
+| `category2_name` | string | 二级类目名称 |
+| `category3_name` | string | 三级类目名称 |
+| `tags` | array[string] | 标签列表 |
+| `price` | float | 价格（min_price） |
+| `compare_at_price` | float | 原价 |
+| `currency` | string | 货币单位（默认USD） |
+| `image_url` | string | 主图URL |
+| `in_stock` | boolean | 是否有库存（任意SKU有库存即为true） |
+| `sku_prices` | array[float] | 所有SKU价格列表 |
+| `sku_weights` | array[integer] | 所有SKU重量列表 |
+| `sku_weight_units` | array[string] | 所有SKU重量单位列表 |
+| `total_inventory` | integer | 总库存 |
+| `sales` | integer | 销量（展示销量） |
+| `option1_name` | string | 选项1名称（如"color"） |
+| `option2_name` | string | 选项2名称（如"size"） |
+| `option3_name` | string | 选项3名称 |
+| `specifications` | array[object] | 规格列表（与ES specifications字段对应） |
+| `skus` | array | SKU 列表 |
+| `relevance_score` | float | 相关性分数（默认为 ES 原始分数；当开启 AI 搜索时为融合后的最终分数） |
+
+### 4.4 SkuResult字段说明
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `sku_id` | string | SKU ID |
+| `price` | float | 价格 |
+| `compare_at_price` | float | 原价 |
+| `sku` | string | SKU编码（sku_code） |
+| `stock` | integer | 库存数量 |
+| `weight` | float | 重量 |
+| `weight_unit` | string | 重量单位 |
+| `option1_value` | string | 选项1取值（如color值） |
+| `option2_value` | string | 选项2取值（如size值） |
+| `option3_value` | string | 选项3取值 |
+| `image_src` | string | SKU图片地址 |
+
+### 4.5 多语言字段说明
+
+- `title`, `brief`, `description`, `vendor`, `category_path`, `category_name` 会根据请求的 `language` 参数自动选择对应的中英文字段
+- `language="zh"`: 优先返回 `*_zh` 字段，如果为空则回退到 `*_en` 字段
+- `language="en"`: 优先返回 `*_en` 字段，如果为空则回退到 `*_zh` 字段
+
+---
+
+## 索引接口
+
+### 5.0 为租户创建索引
+
+为租户创建索引需要两个步骤：
+
+1. **创建索引结构**（可选，仅在需要更新 mapping 时执行）
+   - 使用脚本创建 ES 索引结构（基于 `mappings/search_products.json`）
+   - 如果索引已存在，会提示用户确认（会删除现有数据）
+
+2. **导入数据**（必需）
+   - 使用全量索引接口 `/indexer/reindex` 导入数据
+
+**创建索引结构**：
+
+```bash
+./scripts/create_tenant_index.sh 170
+```
+
+脚本会自动从项目根目录的 `.env` 文件加载 ES 配置。
+
+**注意事项**：
+- ⚠️ 如果索引已存在，脚本会提示确认，确认后会删除现有数据
+- 创建索引后，**必须**调用 `/indexer/reindex` 导入数据
+- 如果只是更新数据而不需要修改索引结构，直接使用 `/indexer/reindex` 即可
+
+---
+
+### 5.1 全量索引接口
+
+- **端点**: `POST /indexer/reindex`
+- **描述**: 全量索引，将指定租户的所有SPU数据导入到ES索引（不会删除现有索引）
+
+#### 请求参数
+
+```json
+{
+  "tenant_id": "162",
+  "batch_size": 500
+}
+```
+
+| 参数 | 类型 | 必填 | 默认值 | 说明 |
+|------|------|------|--------|------|
+| `tenant_id` | string | Y | - | 租户ID |
+| `batch_size` | integer | N | 500 | 批量导入大小 |
+
+#### 响应格式
+
+**成功响应（200 OK）**:
+```json
+{
+  "success": true,
+  "total": 1000,
+  "indexed": 1000,
+  "failed": 0,
+  "elapsed_time": 12.34,
+  "index_name": "search_products",
+  "tenant_id": "162"
+}
+```
+
+**错误响应**:
+- `400 Bad Request`: 参数错误
+- `503 Service Unavailable`: 服务未初始化
+
+#### 请求示例
+
+**全量索引（不会删除现有索引）**:
+```bash
+curl -X POST "http://localhost:6004/indexer/reindex" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "tenant_id": "162",
+    "batch_size": 500
+  }'
+```
+
+**查看日志**:
+```bash
+# 查看API日志（包含索引操作日志）
+tail -f logs/api.log
+
+# 或者查看所有日志文件
+tail -f logs/*.log
+```
+
+> ⚠️ **重要提示**：如需 **创建索引结构**，请参考 [5.0 为租户创建索引](#50-为租户创建索引) 章节，使用 `scripts/recreate_all_tenant_indices.py` 脚本。创建后需要调用 `/indexer/reindex` 导入数据。
+
+**查看索引日志**:
+
+索引操作的所有关键信息都会记录到 `logs/indexer.log` 文件中（JSON 格式），包括：
+- 请求开始和结束时间
+- 租户ID、SPU ID、操作类型
+- 每个SPU的处理状态
+- ES批量写入结果
+- 成功/失败统计和详细错误信息
+
+```bash
+# 实时查看索引日志（包含全量和增量索引的所有操作）
+tail -f logs/indexer.log
+
+# 使用 grep 查询（简单方式）
+# 查看全量索引日志
+grep "\"index_type\":\"bulk\"" logs/indexer.log | tail -100
+
+# 查看增量索引日志
+grep "\"index_type\":\"incremental\"" logs/indexer.log | tail -100
+
+# 查看特定租户的索引日志
+grep "\"tenant_id\":\"162\"" logs/indexer.log | tail -100
+
+# 使用 jq 查询（推荐，更精确的 JSON 查询）
+# 安装 jq: sudo apt-get install jq 或 brew install jq
+
+# 查看全量索引日志
+cat logs/indexer.log | jq 'select(.index_type == "bulk")' | tail -100
+
+# 查看增量索引日志
+cat logs/indexer.log | jq 'select(.index_type == "incremental")' | tail -100
+
+# 查看特定租户的索引日志
+cat logs/indexer.log | jq 'select(.tenant_id == "162")' | tail -100
+
+# 查看失败的索引操作
+cat logs/indexer.log | jq 'select(.operation == "request_complete" and .failed_count > 0)'
+
+# 查看特定SPU的处理日志
+cat logs/indexer.log | jq 'select(.spu_id == "123")'
+
+# 查看最近的索引请求统计
+cat logs/indexer.log | jq 'select(.operation == "request_complete") | {timestamp, index_type, tenant_id, total_count, success_count, failed_count, elapsed_time}'
+```
+
+### 5.2 增量索引接口
+
+- **端点**: `POST /indexer/index`
+- **描述**: 增量索引接口，根据指定的SPU ID列表进行索引，直接将数据写入ES。用于增量更新指定商品。
+
+**删除说明**：
+- `spu_ids`中的SPU：如果数据库`deleted=1`，自动从ES删除，响应状态为`deleted`
+- `delete_spu_ids`中的SPU：直接删除，响应状态为`deleted`、`not_found`或`failed`
+
+#### 请求参数
+
+```json
+{
+  "tenant_id": "162",
+  "spu_ids": ["123", "456", "789"],
+  "delete_spu_ids": ["100", "101"]
+}
+```
+
+| 参数 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `tenant_id` | string | Y | 租户ID |
+| `spu_ids` | array[string] | N | SPU ID列表（1-100个），要索引的SPU。如果为空，则只执行删除操作 |
+| `delete_spu_ids` | array[string] | N | 显式指定要删除的SPU ID列表（1-100个），可选。无论数据库状态如何，都会从ES中删除这些SPU |
+
+**注意**：
+- `spu_ids` 和 `delete_spu_ids` 不能同时为空
+- 每个列表最多支持100个SPU ID
+- 如果SPU在`spu_ids`中且数据库`deleted=1`，会自动从ES删除（自动检测删除）
+
+#### 响应格式
+
+```json
+{
+  "spu_ids": [
+    {
+      "spu_id": "123",
+      "status": "indexed"
+    },
+    {
+      "spu_id": "456",
+      "status": "deleted"
+    },
+    {
+      "spu_id": "789",
+      "status": "failed",
+      "msg": "SPU not found (unexpected)"
+    }
+  ],
+  "delete_spu_ids": [
+    {
+      "spu_id": "100",
+      "status": "deleted"
+    },
+    {
+      "spu_id": "101",
+      "status": "not_found"
+    },
+    {
+      "spu_id": "102",
+      "status": "failed",
+      "msg": "Failed to delete from ES: Connection timeout"
+    }
+  ],
+  "total": 6,
+  "success_count": 4,
+  "failed_count": 2,
+  "elapsed_time": 1.23,
+  "index_name": "search_products",
+  "tenant_id": "162"
+}
+```
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `spu_ids` | array | spu_ids对应的响应列表，每个元素包含 `spu_id` 和 `status` |
+| `spu_ids[].status` | string | 状态：`indexed`（已索引）、`deleted`（已删除，自动检测）、`failed`（失败） |
+| `spu_ids[].msg` | string | 当status为`failed`时，包含失败原因（可选） |
+| `delete_spu_ids` | array | delete_spu_ids对应的响应列表，每个元素包含 `spu_id` 和 `status` |
+| `delete_spu_ids[].status` | string | 状态：`deleted`（已删除）、`not_found`（ES中不存在）、`failed`（失败） |
+| `delete_spu_ids[].msg` | string | 当status为`failed`时，包含失败原因（可选） |
+| `total` | integer | 总处理数量（spu_ids数量 + delete_spu_ids数量） |
+| `success_count` | integer | 成功数量（indexed + deleted + not_found） |
+| `failed_count` | integer | 失败数量 |
+| `elapsed_time` | float | 耗时（秒） |
+| `index_name` | string | 索引名称 |
+| `tenant_id` | string | 租户ID |
+
+**状态说明**：
+- `spu_ids` 的状态：
+  - `indexed`: SPU已成功索引到ES
+  - `deleted`: SPU在数据库中被标记为deleted=1，已从ES删除（自动检测）
+  - `failed`: 处理失败，会包含`msg`字段说明失败原因
+- `delete_spu_ids` 的状态：
+  - `deleted`: SPU已从ES成功删除
+  - `not_found`: SPU在ES中不存在（也算成功，可能已经被删除过）
+  - `failed`: 删除失败，会包含`msg`字段说明失败原因
+
+#### 请求示例
+
+**示例1：普通增量索引（自动检测删除）**:
+```bash
+curl -X POST "http://localhost:6004/indexer/index" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "tenant_id": "162",
+    "spu_ids": ["123", "456", "789"]
+  }'
+```
+说明：如果SPU 456在数据库中`deleted=1`，会自动从ES删除，在响应中`spu_ids`列表里456的状态为`deleted`。
+
+**示例2：显式删除（批量删除）**:
+```bash
+curl -X POST "http://localhost:6004/indexer/index" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "tenant_id": "162",
+    "spu_ids": ["123", "456"],
+    "delete_spu_ids": ["100", "101", "102"]
+  }'
+```
+说明：SPU 100、101、102会被显式删除，无论数据库状态如何。
+
+**示例3：仅删除（不索引）**:
+```bash
+curl -X POST "http://localhost:6004/indexer/index" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "tenant_id": "162",
+    "spu_ids": [],
+    "delete_spu_ids": ["100", "101"]
+  }'
+```
+说明：只执行删除操作，不进行索引。
+
+**示例4：混合操作（索引+删除）**:
+```bash
+curl -X POST "http://localhost:6004/indexer/index" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "tenant_id": "162",
+    "spu_ids": ["123", "456", "789"],
+    "delete_spu_ids": ["100", "101"]
+  }'
+```
+说明：同时执行索引和删除操作。
+
+#### 日志说明
+
+增量索引操作的所有关键信息都会记录到 `logs/indexer.log` 文件中（JSON格式），包括：
+- 请求开始和结束时间
+- 每个SPU的处理状态（获取、转换、索引、删除）
+- ES批量写入结果
+- 成功/失败统计
+- 详细的错误信息
+
+日志查询方式请参考[5.1节查看索引日志](#51-全量重建索引接口)部分。
+
+### 5.3 查询文档接口
+
+- **端点**: `POST /indexer/documents`
+- **描述**: 查询文档接口，根据SPU ID列表获取ES文档数据（**不写入ES**）。用于查看、调试或验证SPU数据。
+
+#### 请求参数
+
+```json
+{
+  "tenant_id": "162",
+  "spu_ids": ["123", "456", "789"]
+}
+```
+
+| 参数 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `tenant_id` | string | Y | 租户ID |
+| `spu_ids` | array[string] | Y | SPU ID列表（1-100个） |
+
+#### 响应格式
+
+```json
+{
+  "success": [
+    {
+      "spu_id": "123",
+      "document": {
+        "tenant_id": "162",
+        "spu_id": "123",
+        "title": {
+          "zh": "商品标题"
+        },
+        ...
+      }
+    },
+    {
+      "spu_id": "456",
+      "document": {...}
+    }
+  ],
+  "failed": [
+    {
+      "spu_id": "789",
+      "error": "SPU not found or deleted"
+    }
+  ],
+  "total": 3,
+  "success_count": 2,
+  "failed_count": 1
+}
+```
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `success` | array | 成功获取的SPU列表，每个元素包含 `spu_id` 和 `document`（完整的ES文档数据） |
+| `failed` | array | 失败的SPU列表，每个元素包含 `spu_id` 和 `error`（失败原因） |
+| `total` | integer | 总SPU数量 |
+| `success_count` | integer | 成功数量 |
+| `failed_count` | integer | 失败数量 |
+
+#### 请求示例
+
+**单个SPU查询**:
+```bash
+curl -X POST "http://localhost:6004/indexer/documents" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "tenant_id": "162",
+    "spu_ids": ["123"]
+  }'
+```
+
+**批量SPU查询**:
+```bash
+curl -X POST "http://localhost:6004/indexer/documents" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "tenant_id": "162",
+    "spu_ids": ["123", "456", "789"]
+  }'
+```
+
+#### 与 `/indexer/index` 的区别
+
+| 接口 | 功能 | 是否写入ES | 返回内容 |
+|------|------|-----------|----------|
+| `/indexer/documents` | 查询SPU文档数据 | 否 | 返回完整的ES文档数据 |
+| `/indexer/index` | 增量索引 | 是 | 返回成功/失败列表和统计信息 |
+
+**使用场景**：
+- `/indexer/documents`：用于查看、调试或验证SPU数据，不修改ES索引
+- `/indexer/index`：用于实际的增量索引操作，将更新的SPU数据同步到ES
+
+### 5.4 索引健康检查接口
+
+- **端点**: `GET /indexer/health`
+- **描述**: 检查索引服务的健康状态
+
+#### 响应格式
+
+```json
+{
+  "status": "available",
+  "database": "connected",
+  "preloaded_data": {
+    "category_mappings": 150
+  }
+}
+```
+
+#### 请求示例
+
+```bash
+curl -X GET "http://localhost:6004/indexer/health"
+```
+
+---
+
+## 管理接口
+
+### 6.1 健康检查
+
+- **端点**: `GET /admin/health`
+- **描述**: 检查服务与依赖（如 Elasticsearch）状态。
+
+```json
+{
+  "status": "healthy",
+  "elasticsearch": "connected",
+  "tenant_id": "tenant1"
+}
+```
+
+### 6.2 获取配置
+
+- **端点**: `GET /admin/config`
+- **描述**: 返回当前租户的脱敏配置，便于核对索引及排序表达式。
+
+```json
+{
+  "tenant_id": "tenant1",
+  "tenant_name": "Tenant1 Test Instance",
+  "es_index_name": "search_tenant1",
+  "num_fields": 20,
+  "num_indexes": 4,
+  "supported_languages": ["zh", "en", "ru"],
+  "ranking_expression": "bm25() + 0.2*text_embedding_relevance()",
+  "spu_enabled": false
+}
+```
+
+### 6.3 索引统计
+
+- **端点**: `GET /admin/stats`
+- **描述**: 获取索引文档数量与磁盘大小，方便监控。
+
+```json
+{
+  "index_name": "search_tenant1",
+  "document_count": 10000,
+  "size_mb": 523.45
+}
+```
+
+---
+
+## 常见场景示例
+
+### 7.1 基础搜索与排序
+
+**按价格从低到高排序**:
+```json
+{
+  "query": "玩具",
+  "size": 20,
+  "from": 0,
+  "sort_by": "price",
+  "sort_order": "asc"
+}
+```
+
+**按价格从高到低排序**:
+```json
+{
+  "query": "玩具",
+  "size": 20,
+  "from": 0,
+  "sort_by": "price",
+  "sort_order": "desc"
+}
+```
+
+**按销量从高到低排序**:
+```json
+{
+  "query": "玩具",
+  "size": 20,
+  "from": 0,
+  "sort_by": "sales",
+  "sort_order": "desc"
+}
+```
+
+**按默认（相关性）排序**:
+```json
+{
+  "query": "玩具",
+  "size": 20,
+  "from": 0
+}
+```
+
+### 7.2 过滤搜索
+
+**需求**: 搜索"玩具"，筛选类目为"益智玩具"，价格在50-200之间
+
+```json
+{
+  "query": "玩具",
+  "size": 20,
+  "language": "zh",
+  "filters": {
+    "category_name": "益智玩具"
+  },
+  "range_filters": {
+    "min_price": {
+      "gte": 50,
+      "lte": 200
+    }
+  }
+}
+```
+
+**需求**: 搜索"手机"，筛选多个品牌，价格范围
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "language": "zh",
+  "filters": {
+    "vendor.zh.keyword": ["品牌A", "品牌B"]
+  },
+  "range_filters": {
+    "min_price": {
+      "gte": 50,
+      "lte": 200
+    }
+  }
+}
+```
+
+### 7.3 分面搜索
+
+**需求**: 搜索"玩具"，获取类目和规格的分面统计，用于构建筛选器
+
+```json
+{
+  "query": "玩具",
+  "size": 20,
+  "language": "zh",
+  "facets": [
+    {"field": "category1_name", "size": 15, "type": "terms"},
+    {"field": "category2_name", "size": 10, "type": "terms"},
+    {"field": "specifications", "size": 10, "type": "terms"}
+  ]
+}
+```
+
+**需求**: 搜索"手机"，获取价格区间和规格的分面统计
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "language": "zh",
+  "facets": [
+    {
+      "field": "min_price",
+      "type": "range",
+      "ranges": [
+        {"key": "0-50", "to": 50},
+        {"key": "50-100", "from": 50, "to": 100},
+        {"key": "100-200", "from": 100, "to": 200},
+        {"key": "200+", "from": 200}
+      ]
+    },
+    {
+      "field": "specifications",
+      "size": 10,
+      "type": "terms"
+    }
+  ]
+}
+```
+
+### 7.4 规格过滤与分面
+
+**需求**: 搜索"手机"，筛选color为"white"的商品
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "language": "zh",
+  "filters": {
+    "specifications": {
+      "name": "color",
+      "value": "white"
+    }
+  }
+}
+```
+
+**需求**: 搜索"手机"，筛选color为"white"且size为"256GB"的商品
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "language": "zh",
+  "filters": {
+    "specifications": [
+      {"name": "color", "value": "white"},
+      {"name": "size", "value": "256GB"}
+    ]
+  }
+}
+```
+
+**需求**: 搜索"手机"，筛选size为"3"、"4"或"5"，且color为"green"的商品
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "language": "zh",
+  "filters": {
+    "specifications": [
+      {"name": "size", "value": "3"},
+      {"name": "size", "value": "4"},
+      {"name": "size", "value": "5"},
+      {"name": "color", "value": "green"}
+    ]
+  }
+}
+```
+
+**需求**: 搜索"手机"，获取所有规格的分面统计
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "language": "zh",
+  "facets": [
+    {"field": "specifications", "size": 10, "type": "terms"}
+  ]
+}
+```
+
+**需求**: 只获取"color"和"size"规格的分面统计
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "language": "zh",
+  "facets": [
+    {"field": "specifications.color", "size": 20, "type": "terms"},
+    {"field": "specifications.size", "size": 15, "type": "terms"}
+  ]
+}
+```
+
+**需求**: 搜索"手机"，筛选类目和规格，并获取对应的分面统计
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "language": "zh",
+  "filters": {
+    "category_name": "手机",
+    "specifications": {
+      "name": "color",
+      "value": "white"
+    }
+  },
+  "facets": [
+    {"field": "category1_name", "size": 15, "type": "terms"},
+    {"field": "category2_name", "size": 10, "type": "terms"},
+    {"field": "specifications.color", "size": 20, "type": "terms"},
+    {"field": "specifications.size", "size": 15, "type": "terms"}
+  ]
+}
+```
+
+### 7.5 SKU筛选
+
+**需求**: 搜索"芭比娃娃"，每个SPU下按颜色筛选，每种颜色只显示一个SKU
+
+```json
+{
+  "query": "芭比娃娃",
+  "size": 20,
+  "sku_filter_dimension": ["color"]
+}
+```
+
+**说明**:
+- 如果 `option1_name` 为 `"color"`，则使用 `sku_filter_dimension: ["color"]` 可以按颜色分组
+- 每个SPU下，每种颜色只会返回第一个SKU
+- 如果维度不匹配，返回所有SKU（不进行过滤）
+
+### 7.6 布尔表达式搜索
+
+**需求**: 搜索包含"手机"和"智能"的商品，排除"二手"
+
+```json
+{
+  "query": "手机 AND 智能 ANDNOT 二手",
+  "size": 20
+}
+```
+
+### 7.7 分页查询
+
+**需求**: 获取第2页结果（每页20条）
+
+```json
+{
+  "query": "手机",
+  "size": 20,
+  "from": 20
+}
+```
+
+---
+
+## 数据模型
+
+### 8.1 商品字段定义
+
+| 字段名 | 类型 | 描述 |
+|--------|------|------|
+| `tenant_id` | keyword | 租户ID（多租户隔离） |
+| `spu_id` | keyword | SPU ID |
+| `title.<lang>` | object/text | 商品标题（多语言对象，如 `title.zh`, `title.en`） |
+| `brief.<lang>` | object/text | 商品短描述（多语言对象，如 `brief.zh`, `brief.en`） |
+| `description.<lang>` | object/text | 商品详细描述（多语言对象，如 `description.zh`, `description.en`） |
+| `vendor.<lang>` | object/text | 供应商/品牌（多语言对象，且带 keyword 子字段，如 `vendor.zh.keyword`） |
+| `category_path.<lang>` | object/text | 类目路径（多语言对象，用于搜索，如 `category_path.zh`） |
+| `category_name_text.<lang>` | object/text | 类目名称（多语言对象，用于搜索，如 `category_name_text.zh`） |
+| `category_id` | keyword | 类目ID |
+| `category_name` | keyword | 类目名称（用于过滤） |
+| `category_level` | integer | 类目层级 |
+| `category1_name`, `category2_name`, `category3_name` | keyword | 多级类目名称（用于过滤和分面） |
+| `tags` | keyword | 标签（数组） |
+| `specifications` | nested | 规格（嵌套对象数组） |
+| `option1_name`, `option2_name`, `option3_name` | keyword | 选项名称 |
+| `min_price`, `max_price` | float | 最低/最高价格 |
+| `compare_at_price` | float | 原价 |
+| `sku_prices` | float | SKU价格列表（数组） |
+| `sku_weights` | long | SKU重量列表（数组） |
+| `sku_weight_units` | keyword | SKU重量单位列表（数组） |
+| `total_inventory` | long | 总库存 |
+| `sales` | long | 销量（展示销量） |
+| `skus` | nested | SKU详细信息（嵌套对象数组） |
+| `create_time`, `update_time` | date | 创建/更新时间 |
+| `title_embedding` | dense_vector | 标题向量（1024维，仅用于搜索） |
+| `image_embedding` | nested | 图片向量（嵌套，仅用于搜索） |
+
+> 所有租户共享统一的索引结构。文本字段支持中英文双语，后端根据 `language` 参数自动选择对应字段返回。
+
+### 8.2 字段类型速查
+
+| 类型 | ES Mapping | 用途 |
+|------|------------|------|
+| `text` | `text` | 全文检索（支持中英文分析器） |
+| `keyword` | `keyword` | 精确匹配、聚合、排序 |
+| `integer` | `integer` | 整数 |
+| `long` | `long` | 长整数 |
+| `float` | `float` | 浮点数 |
+| `date` | `date` | 日期时间 |
+| `nested` | `nested` | 嵌套对象（specifications, skus, image_embedding） |
+| `dense_vector` | `dense_vector` | 向量字段（title_embedding，仅用于搜索） |
+
+### 8.3 常用字段列表
+
+#### 过滤字段
+
+- `category_name`: 类目名称
+- `category1_name`, `category2_name`, `category3_name`: 多级类目
+- `category_id`: 类目ID
+- `vendor.zh.keyword`, `vendor.en.keyword`: 供应商/品牌（使用keyword子字段）
+- `tags`: 标签（keyword类型）
+- `option1_name`, `option2_name`, `option3_name`: 选项名称
+- `specifications`: 规格过滤（嵌套字段，格式见[过滤器详解](#33-过滤器详解)）
+
+#### 范围字段
+
+- `min_price`: 最低价格
+- `max_price`: 最高价格
+- `compare_at_price`: 原价
+- `create_time`: 创建时间
+- `update_time`: 更新时间
+
+#### 排序字段
+
+- `price`: 价格（后端自动根据sort_order映射：asc→min_price，desc→max_price）
+- `sales`: 销量
+- `create_time`: 创建时间
+- `update_time`: 更新时间
+- `relevance_score`: 相关性分数（默认，不指定sort_by时使用）
+
+**注意**: 前端只需传 `price`，后端会自动处理：
+- `sort_by: "price"` + `sort_order: "asc"` → 按 `min_price` 升序（价格从低到高）
+- `sort_by: "price"` + `sort_order: "desc"` → 按 `max_price` 降序（价格从高到低）
+
+### 8.4 支持的分析器
+
+| 分析器 | 语言 | 描述 |
+|--------|------|------|
+| `index_ansj` | 中文 | 中文索引分析器（用于中文字段） |
+| `query_ansj` | 中文 | 中文查询分析器（用于中文字段） |
+| `hanlp_index` ⚠️ TODO（暂不支持） | 中文 | 中文索引分析器（用于中文字段） |
+| `hanlp_standard` ⚠️ TODO（暂不支持） | 中文 | 中文查询分析器（用于中文字段） |
+| `english` | 英文 | 标准英文分析器（用于英文字段） |
+| `lowercase` | - | 小写标准化器（用于keyword子字段） |
@@ -12,13 +12,9 @@ langchain-openai&gt;=0.2.0
 langgraph>=1.0.0
 openai>=1.12.0
-# Embeddings & Vision
-clip-client>=3.5.0  # CLIP-as-Service client
+# Vision (VLM image analysis)
 Pillow>=10.2.0  # Image processing
-# Vector Database
-pymilvus>=2.3.6
-
 # Databases
 pymongo>=4.6.1
 #!/usr/bin/env bash
 # =============================================================================
 # OmniShopAgent - 服务健康检查脚本
-# 检查 Milvus、CLIP、Streamlit 等依赖服务状态
+# 检查 Streamlit、Search API 等依赖
 # =============================================================================
 set -euo pipefail
@@ -49,40 +49,16 @@ else
     echo -e "${RED}FAIL${NC} 未找到"
 fi
-# 4. Milvus
-echo -n "[Milvus] "
-if command -v docker &>/dev/null; then
-    if docker ps --format '{{.Names}}' 2>/dev/null | grep -q milvus-standalone; then
-        if curl -s -o /dev/null -w "%{http_code}" http://localhost:9091/healthz 2>/dev/null | grep -q 200; then
-            echo -e "${GREEN}OK${NC} localhost:19530"
-        else
-            echo -e "${YELLOW}WARN${NC} 容器运行中，健康检查未响应"
-        fi
-    else
-        echo -e "${YELLOW}WARN${NC} 未运行 (docker compose up -d)"
-    fi
-else
-    echo -e "${YELLOW}SKIP${NC} Docker 未安装"
-fi
-
-# 5. CLIP 服务（可选）
-echo -n "[CLIP] "
-if timeout 2 bash -c 'echo >/dev/tcp/localhost/51000' 2>/dev/null; then
-    echo -e "${GREEN}OK${NC} localhost:51000"
-else
-    echo -e "${YELLOW}WARN${NC} 未运行 (图像搜索需启动: python -m clip_server launch)"
-fi
-
-# 6. 数据目录
+# 4. 数据目录（可选，用于图片上传）
 echo -n "[数据] "
 if [ -d "$PROJECT_ROOT/data/images" ] && [ -f "$PROJECT_ROOT/data/styles.csv" ]; then
     IMG_COUNT=$(find "$PROJECT_ROOT/data/images" -name "*.jpg" 2>/dev/null | wc -l)
     echo -e "${GREEN}OK${NC} $IMG_COUNT 张图片"
 else
-    echo -e "${YELLOW}WARN${NC} 未找到 data/images 或 data/styles.csv (运行 download_dataset.py)"
+    echo -e "${YELLOW}WARN${NC} 未找到 data/images 或 data/styles.csv (可选，用于图片风格分析)"
 fi
-# 7. Streamlit
+# 5. Streamlit
 echo -n "[Streamlit] "
 if pgrep -f "streamlit run app.py" >/dev/null 2>&1; then
     echo -e "${GREEN}OK${NC} 运行中"
@@ -1,467 +0,0 @@
-"""
-Data Indexing Script
-Generates embeddings for products and stores them in Milvus
-"""
-
-import csv
-import logging
-import os
-import sys
-from pathlib import Path
-from typing import Any, Dict, Optional
-
-from tqdm import tqdm
-
-# Add parent directory to path
-sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-
-# Import config and settings first
-# Direct imports from files to avoid __init__.py circular issues
-import importlib.util
-
-from app.config import get_absolute_path, settings
-
-
-def load_service_module(module_name, file_name):
-    """Load a service module directly from file"""
-    spec = importlib.util.spec_from_file_location(
-        module_name,
-        os.path.join(
-            os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
-            f"app/services/{file_name}",
-        ),
-    )
-    module = importlib.util.module_from_spec(spec)
-    spec.loader.exec_module(module)
-    return module
-
-
-embedding_module = load_service_module("embedding_service", "embedding_service.py")
-milvus_module = load_service_module("milvus_service", "milvus_service.py")
-
-EmbeddingService = embedding_module.EmbeddingService
-MilvusService = milvus_module.MilvusService
-
-# Configure logging
-logging.basicConfig(
-    level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
-)
-logger = logging.getLogger(__name__)
-
-
-class DataIndexer:
-    """Index product data by generating and storing embeddings"""
-
-    def __init__(self):
-        """Initialize services"""
-        self.embedding_service = EmbeddingService()
-        self.milvus_service = MilvusService()
-
-        self.image_dir = Path(get_absolute_path(settings.image_data_path))
-        self.styles_csv = get_absolute_path("./data/styles.csv")
-        self.images_csv = get_absolute_path("./data/images.csv")
-
-        # Load product data from CSV
-        self.products = self._load_products_from_csv()
-
-    def _load_products_from_csv(self) -> Dict[int, Dict[str, Any]]:
-        """Load products from CSV files"""
-        products = {}
-
-        # Load images mapping
-        images_dict = {}
-        with open(self.images_csv, "r", encoding="utf-8") as f:
-            reader = csv.DictReader(f)
-            for row in reader:
-                product_id = int(row["filename"].split(".")[0])
-                images_dict[product_id] = row["link"]
-
-        # Load styles/products
-        with open(self.styles_csv, "r", encoding="utf-8") as f:
-            reader = csv.DictReader(f)
-            for row in reader:
-                try:
-                    product_id = int(row["id"])
-                    products[product_id] = {
-                        "id": product_id,
-                        "gender": row.get("gender", ""),
-                        "masterCategory": row.get("masterCategory", ""),
-                        "subCategory": row.get("subCategory", ""),
-                        "articleType": row.get("articleType", ""),
-                        "baseColour": row.get("baseColour", ""),
-                        "season": row.get("season", ""),
-                        "year": int(row["year"]) if row.get("year") else 0,
-                        "usage": row.get("usage", ""),
-                        "productDisplayName": row.get("productDisplayName", ""),
-                        "imageUrl": images_dict.get(product_id, ""),
-                        "imagePath": f"{product_id}.jpg",
-                    }
-                except (ValueError, KeyError) as e:
-                    logger.warning(f"Error loading product {row.get('id')}: {e}")
-                    continue
-
-        logger.info(f"Loaded {len(products)} products from CSV")
-        return products
-
-    def setup(self) -> None:
-        """Setup connections and collections"""
-        logger.info("Setting up services...")
-
-        # Connect to CLIP server
-        self.embedding_service.connect_clip()
-        logger.info("✓ CLIP server connected")
-
-        # Connect to Milvus
-        self.milvus_service.connect()
-        logger.info("✓ Milvus connected")
-
-        # Create Milvus collections
-        self.milvus_service.create_text_collection(recreate=False)
-        self.milvus_service.create_image_collection(recreate=False)
-        logger.info("✓ Milvus collections ready")
-
-    def teardown(self) -> None:
-        """Close all connections"""
-        logger.info("Closing connections...")
-        self.embedding_service.disconnect_clip()
-        self.milvus_service.disconnect()
-        logger.info("✓ All connections closed")
-
-    def index_text_embeddings(
-        self, batch_size: int = 100, skip: int = 0, limit: Optional[int] = None
-    ) -> Dict[str, int]:
-        """Generate and store text embeddings for products
-
-        Args:
-            batch_size: Number of products to process at once
-            skip: Number of products to skip
-            limit: Maximum number of products to process (None for all)
-
-        Returns:
-            Dictionary with indexing statistics
-        """
-        logger.info("Starting text embedding indexing...")
-
-        # Get products list
-        product_ids = list(self.products.keys())[skip:]
-        if limit:
-            product_ids = product_ids[:limit]
-
-        total_products = len(product_ids)
-        processed = 0
-        inserted = 0
-        errors = 0
-
-        with tqdm(total=total_products, desc="Indexing text embeddings") as pbar:
-            while processed < total_products:
-                # Get batch of products
-                current_batch_size = min(batch_size, total_products - processed)
-                batch_ids = product_ids[processed : processed + current_batch_size]
-                products = [self.products[pid] for pid in batch_ids]
-
-                if not products:
-                    break
-
-                try:
-                    # Prepare texts for embedding
-                    texts = []
-                    text_mappings = []
-
-                    for product in products:
-                        # Create text representation of product
-                        text = self._create_product_text(product)
-                        texts.append(text)
-                        text_mappings.append(
-                            {"product_id": product["id"], "text": text}
-                        )
-
-                    # Generate embeddings
-                    embeddings = self.embedding_service.get_text_embeddings_batch(
-                        texts, batch_size=50  # OpenAI batch size
-                    )
-
-                    # Prepare data for Milvus (with metadata)
-                    milvus_data = []
-                    for idx, (mapping, embedding) in enumerate(
-                        zip(text_mappings, embeddings)
-                    ):
-                        product_id = mapping["product_id"]
-                        product = self.products[product_id]
-
-                        milvus_data.append(
-                            {
-                                "id": product_id,
-                                "text": mapping["text"][
-                                    :2000
-                                ],  # Truncate to max length
-                                "embedding": embedding,
-                                # Product metadata
-                                "productDisplayName": product["productDisplayName"][
-                                    :500
-                                ],
-                                "gender": product["gender"][:50],
-                                "masterCategory": product["masterCategory"][:100],
-                                "subCategory": product["subCategory"][:100],
-                                "articleType": product["articleType"][:100],
-                                "baseColour": product["baseColour"][:50],
-                                "season": product["season"][:50],
-                                "usage": product["usage"][:50],
-                                "year": product["year"],
-                                "imageUrl": product["imageUrl"],
-                                "imagePath": product["imagePath"],
-                            }
-                        )
-
-                    # Insert into Milvus
-                    count = self.milvus_service.insert_text_embeddings(milvus_data)
-                    inserted += count
-
-                except Exception as e:
-                    logger.error(
-                        f"Error processing text batch at offset {processed}: {e}"
-                    )
-                    errors += len(products)
-
-                processed += len(products)
-                pbar.update(len(products))
-
-        stats = {"total_processed": processed, "inserted": inserted, "errors": errors}
-
-        logger.info(f"Text embedding indexing completed: {stats}")
-        return stats
-
-    def index_image_embeddings(
-        self, batch_size: int = 32, skip: int = 0, limit: Optional[int] = None
-    ) -> Dict[str, int]:
-        """Generate and store image embeddings for products
-
-        Args:
-            batch_size: Number of images to process at once
-            skip: Number of products to skip
-            limit: Maximum number of products to process (None for all)
-
-        Returns:
-            Dictionary with indexing statistics
-        """
-        logger.info("Starting image embedding indexing...")
-
-        # Get products list
-        product_ids = list(self.products.keys())[skip:]
-        if limit:
-            product_ids = product_ids[:limit]
-
-        total_products = len(product_ids)
-        processed = 0
-        inserted = 0
-        errors = 0
-
-        with tqdm(total=total_products, desc="Indexing image embeddings") as pbar:
-            while processed < total_products:
-                # Get batch of products
-                current_batch_size = min(batch_size, total_products - processed)
-                batch_ids = product_ids[processed : processed + current_batch_size]
-                products = [self.products[pid] for pid in batch_ids]
-
-                if not products:
-                    break
-
-                try:
-                    # Prepare image paths
-                    image_paths = []
-                    image_mappings = []
-
-                    for product in products:
-                        image_path = self.image_dir / product["imagePath"]
-                        image_paths.append(image_path)
-                        image_mappings.append(
-                            {
-                                "product_id": product["id"],
-                                "image_path": product["imagePath"],
-                            }
-                        )
-
-                    # Generate embeddings
-                    embeddings = self.embedding_service.get_image_embeddings_batch(
-                        image_paths, batch_size=batch_size
-                    )
-
-                    # Prepare data for Milvus (with metadata)
-                    milvus_data = []
-                    for idx, (mapping, embedding) in enumerate(
-                        zip(image_mappings, embeddings)
-                    ):
-                        if embedding is not None:
-                            product_id = mapping["product_id"]
-                            product = self.products[product_id]
-
-                            milvus_data.append(
-                                {
-                                    "id": product_id,
-                                    "image_path": mapping["image_path"],
-                                    "embedding": embedding,
-                                    # Product metadata
-                                    "productDisplayName": product["productDisplayName"][
-                                        :500
-                                    ],
-                                    "gender": product["gender"][:50],
-                                    "masterCategory": product["masterCategory"][:100],
-                                    "subCategory": product["subCategory"][:100],
-                                    "articleType": product["articleType"][:100],
-                                    "baseColour": product["baseColour"][:50],
-                                    "season": product["season"][:50],
-                                    "usage": product["usage"][:50],
-                                    "year": product["year"],
-                                    "imageUrl": product["imageUrl"],
-                                }
-                            )
-                        else:
-                            errors += 1
-
-                    # Insert into Milvus
-                    if milvus_data:
-                        count = self.milvus_service.insert_image_embeddings(milvus_data)
-                        inserted += count
-
-                except Exception as e:
-                    logger.error(
-                        f"Error processing image batch at offset {processed}: {e}"
-                    )
-                    errors += len(products)
-
-                processed += len(products)
-                pbar.update(len(products))
-
-        stats = {"total_processed": processed, "inserted": inserted, "errors": errors}
-
-        logger.info(f"Image embedding indexing completed: {stats}")
-        return stats
-
-    def _create_product_text(self, product: Dict[str, Any]) -> str:
-        """Create text representation of product for embedding
-
-        Args:
-            product: Product document
-
-        Returns:
-            Text representation
-        """
-        # Create a natural language description
-        parts = [
-            product.get("productDisplayName", ""),
-            f"Gender: {product.get('gender', '')}",
-            f"Category: {product.get('masterCategory', '')} > {product.get('subCategory', '')}",
-            f"Type: {product.get('articleType', '')}",
-            f"Color: {product.get('baseColour', '')}",
-            f"Season: {product.get('season', '')}",
-            f"Usage: {product.get('usage', '')}",
-        ]
-
-        text = " | ".join(
-            [p for p in parts if p and p != "Gender: " and p != "Color: "]
-        )
-        return text
-
-    def get_stats(self) -> Dict[str, Any]:
-        """Get indexing statistics
-
-        Returns:
-            Dictionary with statistics
-        """
-        text_stats = self.milvus_service.get_collection_stats(
-            self.milvus_service.text_collection_name
-        )
-        image_stats = self.milvus_service.get_collection_stats(
-            self.milvus_service.image_collection_name
-        )
-
-        return {
-            "total_products": len(self.products),
-            "milvus_text": text_stats,
-            "milvus_image": image_stats,
-        }
-
-
-def main():
-    """Main function"""
-    import argparse
-
-    parser = argparse.ArgumentParser(description="Index product data for search")
-    parser.add_argument(
-        "--mode",
-        choices=["text", "image", "both"],
-        default="both",
-        help="Which embeddings to index",
-    )
-    parser.add_argument(
-        "--batch-size", type=int, default=100, help="Batch size for processing"
-    )
-    parser.add_argument(
-        "--skip", type=int, default=0, help="Number of products to skip"
-    )
-    parser.add_argument(
-        "--limit", type=int, default=None, help="Maximum number of products to process"
-    )
-    parser.add_argument("--stats", action="store_true", help="Show statistics only")
-
-    args = parser.parse_args()
-
-    # Create indexer
-    indexer = DataIndexer()
-
-    try:
-        # Setup services
-        indexer.setup()
-
-        if args.stats:
-            # Show statistics
-            stats = indexer.get_stats()
-            print("\n=== Indexing Statistics ===")
-            print(f"\nTotal Products in CSV: {stats['total_products']}")
-
-            print("\nMilvus Text Embeddings:")
-            print(f"  Collection: {stats['milvus_text']['collection_name']}")
-            print(f"  Total embeddings: {stats['milvus_text']['row_count']}")
-
-            print("\nMilvus Image Embeddings:")
-            print(f"  Collection: {stats['milvus_image']['collection_name']}")
-            print(f"  Total embeddings: {stats['milvus_image']['row_count']}")
-
-            print(
-                f"\nCoverage: {stats['milvus_image']['row_count'] / stats['total_products'] * 100:.1f}%"
-            )
-        else:
-            # Index data
-            if args.mode in ["text", "both"]:
-                logger.info("=== Indexing Text Embeddings ===")
-                text_stats = indexer.index_text_embeddings(
-                    batch_size=args.batch_size, skip=args.skip, limit=args.limit
-                )
-                print(f"\nText Indexing Results: {text_stats}")
-
-            if args.mode in ["image", "both"]:
-                logger.info("=== Indexing Image Embeddings ===")
-                image_stats = indexer.index_image_embeddings(
-                    batch_size=min(args.batch_size, 32),  # Smaller batch for images
-                    skip=args.skip,
-                    limit=args.limit,
-                )
-                print(f"\nImage Indexing Results: {image_stats}")
-
-            # Show final statistics
-            logger.info("\n=== Final Statistics ===")
-            stats = indexer.get_stats()
-            print(f"Total products: {stats['total_products']}")
-            print(f"Text embeddings: {stats['milvus_text']['row_count']}")
-            print(f"Image embeddings: {stats['milvus_image']['row_count']}")
-
-    except KeyboardInterrupt:
-        logger.info("\nIndexing interrupted by user")
-    except Exception as e:
-        logger.error(f"Error during indexing: {e}", exc_info=True)
-        sys.exit(1)
-    finally:
-        indexer.teardown()
-
-
-if __name__ == "__main__":
-    main()
@@ -1,22 +0,0 @@
-#!/usr/bin/env bash
-# =============================================================================
-# OmniShopAgent - 启动 CLIP 图像向量服务
-# 图像搜索、以图搜图功能依赖此服务
-# =============================================================================
-set -euo pipefail
-
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
-VENV_DIR="${VENV_DIR:-$PROJECT_ROOT/venv}"
-
-cd "$PROJECT_ROOT"
-
-if [ -d "$VENV_DIR" ]; then
-    set +u
-    source "$VENV_DIR/bin/activate"
-    set -u
-fi
-
-echo "启动 CLIP 服务 (端口 51000)..."
-echo "按 Ctrl+C 停止"
-exec python -m clip_server launch
@@ -1,31 +0,0 @@
-#!/usr/bin/env bash
-# =============================================================================
-# OmniShopAgent - 启动 Milvus 向量数据库
-# 使用 Docker Compose 启动 Milvus 及相关依赖
-# =============================================================================
-set -euo pipefail
-
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
-
-cd "$PROJECT_ROOT"
-
-if ! command -v docker &>/dev/null; then
-    echo "错误: 未安装 Docker。请先运行 setup_env_centos8.sh"
-    exit 1
-fi
-
-echo "启动 Milvus..."
-docker compose up -d 2>/dev/null || docker-compose up -d 2>/dev/null || {
-    echo "错误: 无法执行 docker compose。请确保已安装 Docker Compose"
-    exit 1
-}
-
-echo "等待 Milvus 就绪 (约 60 秒)..."
-sleep 60
-
-if curl -s -o /dev/null -w "%{http_code}" http://localhost:9091/healthz 2>/dev/null | grep -q 200; then
-    echo "Milvus 已就绪: localhost:19530"
-else
-    echo "提示: Milvus 可能仍在启动，请稍后执行 check_services.sh 检查"
-fi
@@ -41,9 +41,9 @@ sudo dnf install -y \
     tar
 # -----------------------------------------------------------------------------
-# 2. 安装 Docker（用于 Milvus）
+# 2. 检查 Docker（可选）
 # -----------------------------------------------------------------------------
-echo "[2/4] 检查/安装 Docker..."
+echo "[2/4] 检查 Docker..."
 if ! command -v docker &>/dev/null; then
     echo "  安装 Docker..."
     sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo 2>/dev/null || {
@@ -142,11 +142,9 @@ echo &quot;==========================================&quot;
 echo "环境准备完成！"
 echo "=========================================="
 echo "下一步:"
-echo "  1. 编辑 .env 配置 OPENAI_API_KEY"
-echo "  2. 下载数据: python scripts/download_dataset.py"
-echo "  3. 启动 Milvus: ./scripts/run_milvus.sh"
-echo "  4. 索引数据: python scripts/index_data.py"
-echo "  5. 启动应用: ./scripts/start.sh"
+echo "  1. 编辑 .env 配置 OPENAI_API_KEY、SEARCH_API_BASE_URL 等"
+echo "  2. （可选）下载数据: python scripts/download_dataset.py"
+echo "  3. 启动应用: ./scripts/start.sh"
 echo ""
 echo "激活虚拟环境: source $VENV_DIR/bin/activate"
 echo "=========================================="
 #!/usr/bin/env bash
 # =============================================================================
 # OmniShopAgent - 启动脚本
-# 启动 Milvus、CLIP（可选）、Streamlit 应用
+# 启动 Streamlit 应用
 # =============================================================================
 set -euo pipefail
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
 VENV_DIR="${VENV_DIR:-$PROJECT_ROOT/venv}"
-STREAMLIT_PORT="${STREAMLIT_PORT:-8501}"
+STREAMLIT_PORT="${STREAMLIT_PORT:-6008}"
 STREAMLIT_HOST="${STREAMLIT_HOST:-0.0.0.0}"
 cd "$PROJECT_ROOT"
@@ -27,30 +27,7 @@ echo &quot;==========================================&quot;
 echo "OmniShopAgent 启动"
 echo "=========================================="
-# 1. 启动 Milvus（Docker）
-if command -v docker &>/dev/null; then
-    echo "[1/3] 检查 Milvus..."
-    if ! docker ps --format '{{.Names}}' 2>/dev/null | grep -q milvus-standalone; then
-        echo "  启动 Milvus (docker compose)..."
-        docker compose up -d 2>/dev/null || docker-compose up -d 2>/dev/null || {
-            echo "  警告: 无法启动 Milvus，请手动执行: docker compose up -d"
-        }
-        echo "  等待 Milvus 就绪 (30s)..."
-        sleep 30
-    else
-        echo "  Milvus 已运行"
-    fi
-else
-    echo "[1/3] 跳过 Milvus: 未安装 Docker"
-fi
-
-# 2. 检查 CLIP（可选，图像搜索需要）
-echo "[2/3] 检查 CLIP 服务..."
-echo "  提示: 图像搜索需 CLIP。若未启动，请另开终端执行: python -m clip_server launch"
-echo "  文本搜索可无需 CLIP。"
-
-# 3. 启动 Streamlit
-echo "[3/3] 启动 Streamlit (端口 $STREAMLIT_PORT)..."
+echo "[1/1] 启动 Streamlit (端口 $STREAMLIT_PORT)..."
 echo ""
 echo "  访问: http://$STREAMLIT_HOST:$STREAMLIT_PORT"
 echo "  按 Ctrl+C 停止"
 #!/usr/bin/env bash
 # =============================================================================
 # OmniShopAgent - 停止脚本
-# 停止 Streamlit 进程及 Milvus 容器
+# 停止 Streamlit 进程
 # =============================================================================
 set -euo pipefail
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
-STREAMLIT_PORT="${STREAMLIT_PORT:-8501}"
+STREAMLIT_PORT="${STREAMLIT_PORT:-6008}"
 echo "=========================================="
 echo "OmniShopAgent 停止"
 echo "=========================================="
 # 1. 停止 Streamlit 进程
-echo "[1/2] 停止 Streamlit..."
+echo "[1/1] 停止 Streamlit..."
 if pgrep -f "streamlit run app.py" >/dev/null 2>&1; then
     pkill -f "streamlit run app.py" 2>/dev/null || true
     echo "  Streamlit 已停止"
@@ -31,16 +31,6 @@ if command -v lsof &amp;&gt;/dev/null; then
     fi
 fi
-# 2. 可选：停止 Milvus 容器
-echo "[2/2] 停止 Milvus..."
-if command -v docker &>/dev/null; then
-    cd "$PROJECT_ROOT"
-    docker compose down 2>/dev/null || docker-compose down 2>/dev/null || true
-    echo "  Milvus 已停止"
-else
-    echo "  Docker 未安装，跳过"
-fi
-
 echo "=========================================="
 echo "OmniShopAgent 已停止"
 echo "=========================================="