# ShopAgent 项目技术实现报告

## 一、项目概述

ShopAgent 是一个基于 **LangGraph** 和 **ReAct 模式** 的自主多模态时尚购物智能体。系统能够自主决定调用哪些工具、维护对话状态、判断何时回复，实现智能化的商品发现与推荐。

### 核心特性

- **自主工具选择与执行**：Agent 根据用户意图自主选择并调用工具
- **文本搜索**：通过 Search API 进行商品搜索
- **对话上下文感知**：多轮对话中保持上下文记忆
- **实时视觉分析**：基于 VLM 的图片风格分析

---

## 二、技术栈

| 组件 | 技术选型 |
|------|----------|
| 运行环境 | Python 3.12 |
| Agent 框架 | LangGraph 1.x |
| LLM 框架 | LangChain 1.x（支持任意 LLM，默认 gpt-4o-mini） |
| 搜索服务 | Search API (HTTP) |
| 前端 | Streamlit |
| 数据集 | Kaggle Fashion Products |

---

## 三、系统架构

### 3.1 整体架构图

```
┌─────────────────────────────────────────────────────────────────┐
│                     Streamlit 前端 (app.py)                       │
└─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────┐
│              ShoppingAgent (shopping_agent.py)                    │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  LangGraph StateGraph + ReAct Pattern                      │  │
│  │  START → Agent → [Has tool_calls?] → Tools → Agent → END   │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
        │                    │
        ▼                    ▼
┌──────────────┐   ┌─────────────────────┐
│ search_      │   │ analyze_image_style  │
│ products     │   │ (OpenAI Vision)      │
└──────┬───────┘   └──────────┬──────────┘
       │                      │
       ▼                      │
┌──────────────────┐          │
│   Search API     │          │
│ (HTTP POST)      │          │
└──────────────────┘          │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│           OpenAI API (VLM 风格分析)                               │
└─────────────────────────────────────────────────────────────────┘
```

### 3.2 Agent 流程图（LangGraph）

```mermaid
graph LR
    START --> Agent
    Agent -->|Has tool_calls| Tools
    Agent -->|No tool_calls| END
    Tools --> Agent
```

---

## 四、关键代码实现

### 4.1 Agent 核心实现（shopping_agent.py）

#### 4.1.1 状态定义

```python
from typing_extensions import Annotated, TypedDict
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    """State for the shopping agent with message accumulation"""
    messages: Annotated[Sequence[BaseMessage], add_messages]
    current_image_path: Optional[str]  # Track uploaded image
```

- `messages` 使用 `add_messages` 实现消息累加，支持多轮对话
- `current_image_path` 存储当前上传的图片路径供工具使用

#### 4.1.2 LangGraph 图构建

```python
def _build_graph(self):
    """Build the LangGraph StateGraph"""
    
    def agent_node(state: AgentState):
        """Agent decision node - decides which tools to call or when to respond"""
        messages = state["messages"]
        if not any(isinstance(m, SystemMessage) for m in messages):
            messages = [SystemMessage(content=system_prompt)] + list(messages)
        response = self.llm_with_tools.invoke(messages)
        return {"messages": [response]}

    tool_node = ToolNode(self.tools)

    def should_continue(state: AgentState):
        """Determine if agent should continue or end"""
        last_message = state["messages"][-1]
        if hasattr(last_message, "tool_calls") and last_message.tool_calls:
            return "tools"
        return END

    workflow = StateGraph(AgentState)
    workflow.add_node("agent", agent_node)
    workflow.add_node("tools", tool_node)
    workflow.add_edge(START, "agent")
    workflow.add_conditional_edges("agent", should_continue, ["tools", END])
    workflow.add_edge("tools", "agent")

    checkpointer = MemorySaver()
    return workflow.compile(checkpointer=checkpointer)
```

关键点：
- **agent_node**：将消息传入 LLM，由 LLM 决定是否调用工具
- **should_continue**：若有 `tool_calls` 则进入工具节点，否则结束
- **MemorySaver**：按 `thread_id` 持久化对话状态

#### 4.1.3 System Prompt 设计

```python
system_prompt = """You are an intelligent fashion shopping assistant. You can:
1. Search for products by text description (use search_products)
2. Analyze image style and attributes (use analyze_image_style)

When a user asks about products:
- For text queries: use search_products directly
- For image uploads: use analyze_image_style first to understand the product, then use search_products with the extracted description
- You can call multiple tools in sequence if needed
- Always provide helpful, friendly responses

CRITICAL FORMATTING RULES:
When presenting product results, you MUST use this EXACT format for EACH product:
1. [Product Name]
   ID: [Product ID Number]
   Category: [Category]
   Color: [Color]
   Gender: [Gender]
   (Include Season, Usage, Relevance if available)
..."""
```

通过 system prompt 约束工具使用和输出格式，保证前端可正确解析产品信息。

#### 4.1.4 对话入口与流式处理

```python
def chat(self, query: str, image_path: Optional[str] = None) -> dict:
    # Build input message
    message_content = query
    if image_path:
        message_content = f"{query}\n[User uploaded image: {image_path}]"

    config = {"configurable": {"thread_id": self.session_id}}
    input_state = {
        "messages": [HumanMessage(content=message_content)],
        "current_image_path": image_path,
    }

    tool_calls = []
    for event in self.graph.stream(input_state, config=config):
        if "agent" in event:
            for msg in event["agent"].get("messages", []):
                if hasattr(msg, "tool_calls") and msg.tool_calls:
                    for tc in msg.tool_calls:
                        tool_calls.append({"name": tc["name"], "args": tc.get("args", {})})
        if "tools" in event:
            # 记录工具执行结果
            ...

    final_state = self.graph.get_state(config)
    response_text = final_state.values["messages"][-1].content

    return {"response": response_text, "tool_calls": tool_calls, "error": False}
```

---

### 4.2 搜索工具实现（search_tools.py）

#### 4.2.1 文本搜索（Search API）

```python
@tool
def search_products(query: str, limit: int = 5) -> str:
    """Search for fashion products using natural language descriptions."""
    try:
        url = f"{settings.search_api_base_url.rstrip('/')}/search/"
        headers = {
            "Content-Type": "application/json",
            "X-Tenant-ID": settings.search_api_tenant_id,
        }
        payload = {
            "query": query,
            "size": min(limit, 20),
            "from": 0,
            "language": "zh",
        }

        response = requests.post(url, json=payload, headers=headers, timeout=60)
        data = response.json()
        results = data.get("results", [])

        if not results:
            return "No products found matching your search."

        output = f"Found {len(results)} product(s):\n\n"
        for idx, product in enumerate(results, 1):
            output += f"{idx}. {product.get('title', 'Unknown Product')}\n"
            output += f"   ID: {product.get('spu_id', 'N/A')}\n"
            output += f"   Category: {product.get('category_path', 'N/A')}\n"
            output += f"   Price: {product.get('price')}\n"
            output += "\n"

        return output.strip()
    except Exception as e:
        return f"Error searching products: {str(e)}"
```

#### 4.2.2 视觉分析（VLM）

```python
@tool
def analyze_image_style(image_path: str) -> str:
    """Analyze a fashion product image using AI vision to extract detailed style information."""
    with open(img_path, "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode("utf-8")

    prompt = """Analyze this fashion product image and provide a detailed description.
Include:
- Product type (e.g., shirt, dress, shoes, pants, bag)
- Primary colors
- Style/design (e.g., casual, formal, sporty, vintage, modern)
- Pattern or texture (e.g., plain, striped, checked, floral)
- Key features (e.g., collar type, sleeve length, fit)
- Material appearance (if obvious, e.g., denim, cotton, leather)
- Suitable occasion (e.g., office wear, party, casual, sports)
Provide a comprehensive yet concise description (3-4 sentences)."""

    client = get_openai_client()
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}", "detail": "high"}},
            ],
        }],
        max_tokens=500,
        temperature=0.3,
    )

    return response.choices[0].message.content.strip()
```

---

### 4.3 Streamlit 前端（app.py）

#### 4.3.1 会话与 Agent 初始化

```python
def initialize_session():
    if "session_id" not in st.session_state:
        st.session_state.session_id = str(uuid.uuid4())
    if "shopping_agent" not in st.session_state:
        st.session_state.shopping_agent = ShoppingAgent(session_id=st.session_state.session_id)
    if "messages" not in st.session_state:
        st.session_state.messages = []
    if "uploaded_image" not in st.session_state:
        st.session_state.uploaded_image = None
```

#### 4.3.2 产品信息解析

```python
def extract_products_from_response(response: str) -> list:
    """从 Agent 回复中解析产品信息"""
    products = []
    for line in response.split("\n"):
        if re.match(r"^\*?\*?\d+\.\s+", line):
            if current_product:
                products.append(current_product)
            current_product = {"name": re.sub(r"^\*?\*?\d+\.\s+", "", line).replace("**", "").strip()}
        elif "ID:" in line:
            id_match = re.search(r"(?:ID|id):\s*(\d+)", line)
            if id_match:
                current_product["id"] = id_match.group(1)
        elif "Category:" in line:
            cat_match = re.search(r"Category:\s*(.+?)(?:\n|$)", line)
            if cat_match:
                current_product["category"] = cat_match.group(1).strip()
        # ... Color, Gender, Season, Usage, Similarity/Relevance
    return products
```

#### 4.3.3 多轮对话中的图片引用

```python
# 用户输入 "make them formal" 时，若上一条消息有图片，则引用该图片
if any(ref in query_lower for ref in ["this", "that", "the image", "it"]):
    for msg in reversed(st.session_state.messages):
        if msg.get("role") == "user" and msg.get("image_path"):
            image_path = msg["image_path"]
            break
```

---

### 4.4 配置管理（config.py）

```python
class Settings(BaseSettings):
    openai_api_key: str
    openai_model: str = "gpt-4o-mini"
    search_api_base_url: str = "http://120.76.41.98:6002"
    search_api_tenant_id: str = "162"

    class Config:
        env_file = ".env"
```

---

## 五、部署与运行

### 5.1 依赖服务

- **Search API**：外部搜索服务（HTTP）
- **OpenAI API**：LLM 与 VLM 图像分析

### 5.2 启动流程

```bash
# 1. 环境
pip install -r requirements.txt
cp .env.example .env  # 配置 OPENAI_API_KEY、SEARCH_API_* 等

# 2. （可选）下载数据
python scripts/download_dataset.py  # Kaggle Fashion Product Images Dataset

# 3. 启动应用
streamlit run app.py
# 或 ./scripts/start.sh
```

---

## 六、典型交互流程

| 场景 | 用户输入 | Agent 行为 | 工具调用 |
|------|----------|------------|----------|
| 文本搜索 | "winter coats for women" | 直接文本搜索 | `search_products("winter coats women")` |
| 风格分析+搜索 | [上传复古夹克] "what style? find matching pants" | 先分析风格再搜索 | `analyze_image_style(path)` → `search_products("vintage pants casual")` |
| 多轮上下文 | [第1轮] "show me red dresses"<br>[第2轮] "make them formal" | 结合上下文 | `search_products("red formal dresses")` |

---

## 七、设计要点总结

1. **ReAct 模式**：Agent 自主决定何时调用工具、调用哪些工具、是否继续调用。
2. **LangGraph 状态图**：`START → Agent → [条件] → Tools → Agent → END`，支持多轮工具调用。
3. **搜索与风格分析**：Search API 文本搜索 + VLM 图像风格分析。
4. **会话持久化**：`MemorySaver` + `thread_id` 实现多轮对话记忆。
5. **格式约束**：System prompt 严格限制产品输出格式，便于前端解析和展示。

---

## 八、附录：项目结构

```
ShopAgent/
├── app/
│   ├── agents/
│   │   └── shopping_agent.py
│   ├── config.py
│   ├── services/
│   └── tools/
│       └── search_tools.py
├── scripts/
│   ├── download_dataset.py
│   └── index_data.py
├── app.py
├── docker-compose.yml
└── requirements.txt
```