# ShopAgent 项目技术实现报告 ## 一、项目概述 ShopAgent 是一个基于 **LangGraph** 和 **ReAct 模式** 的自主多模态时尚购物智能体。系统能够自主决定调用哪些工具、维护对话状态、判断何时回复,实现智能化的商品发现与推荐。 ### 核心特性 - **自主工具选择与执行**:Agent 根据用户意图自主选择并调用工具 - **文本搜索**:通过 Search API 进行商品搜索 - **对话上下文感知**:多轮对话中保持上下文记忆 - **实时视觉分析**:基于 VLM 的图片风格分析 --- ## 二、技术栈 | 组件 | 技术选型 | |------|----------| | 运行环境 | Python 3.12 | | Agent 框架 | LangGraph 1.x | | LLM 框架 | LangChain 1.x(支持任意 LLM,默认 gpt-4o-mini) | | 搜索服务 | Search API (HTTP) | | 前端 | Streamlit | | 数据集 | Kaggle Fashion Products | --- ## 三、系统架构 ### 3.1 整体架构图 ``` ┌─────────────────────────────────────────────────────────────────┐ │ Streamlit 前端 (app.py) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ShoppingAgent (shopping_agent.py) │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ LangGraph StateGraph + ReAct Pattern │ │ │ │ START → Agent → [Has tool_calls?] → Tools → Agent → END │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ │ │ ▼ ▼ ┌──────────────┐ ┌─────────────────────┐ │ search_ │ │ analyze_image_style │ │ products │ │ (OpenAI Vision) │ └──────┬───────┘ └──────────┬──────────┘ │ │ ▼ │ ┌──────────────────┐ │ │ Search API │ │ │ (HTTP POST) │ │ └──────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ OpenAI API (VLM 风格分析) │ └─────────────────────────────────────────────────────────────────┘ ``` ### 3.2 Agent 流程图(LangGraph) ```mermaid graph LR START --> Agent Agent -->|Has tool_calls| Tools Agent -->|No tool_calls| END Tools --> Agent ``` --- ## 四、关键代码实现 ### 4.1 Agent 核心实现(shopping_agent.py) #### 4.1.1 状态定义 ```python from typing_extensions import Annotated, TypedDict from langgraph.graph.message import add_messages class AgentState(TypedDict): """State for the shopping agent with message accumulation""" messages: Annotated[Sequence[BaseMessage], add_messages] current_image_path: Optional[str] # Track uploaded image ``` - `messages` 使用 `add_messages` 实现消息累加,支持多轮对话 - `current_image_path` 存储当前上传的图片路径供工具使用 #### 4.1.2 LangGraph 图构建 ```python def _build_graph(self): """Build the LangGraph StateGraph""" def agent_node(state: AgentState): """Agent decision node - decides which tools to call or when to respond""" messages = state["messages"] if not any(isinstance(m, SystemMessage) for m in messages): messages = [SystemMessage(content=system_prompt)] + list(messages) response = self.llm_with_tools.invoke(messages) return {"messages": [response]} tool_node = ToolNode(self.tools) def should_continue(state: AgentState): """Determine if agent should continue or end""" last_message = state["messages"][-1] if hasattr(last_message, "tool_calls") and last_message.tool_calls: return "tools" return END workflow = StateGraph(AgentState) workflow.add_node("agent", agent_node) workflow.add_node("tools", tool_node) workflow.add_edge(START, "agent") workflow.add_conditional_edges("agent", should_continue, ["tools", END]) workflow.add_edge("tools", "agent") checkpointer = MemorySaver() return workflow.compile(checkpointer=checkpointer) ``` 关键点: - **agent_node**:将消息传入 LLM,由 LLM 决定是否调用工具 - **should_continue**:若有 `tool_calls` 则进入工具节点,否则结束 - **MemorySaver**:按 `thread_id` 持久化对话状态 #### 4.1.3 System Prompt 设计 ```python system_prompt = """You are an intelligent fashion shopping assistant. You can: 1. Search for products by text description (use search_products) 2. Analyze image style and attributes (use analyze_image_style) When a user asks about products: - For text queries: use search_products directly - For image uploads: use analyze_image_style first to understand the product, then use search_products with the extracted description - You can call multiple tools in sequence if needed - Always provide helpful, friendly responses CRITICAL FORMATTING RULES: When presenting product results, you MUST use this EXACT format for EACH product: 1. [Product Name] ID: [Product ID Number] Category: [Category] Color: [Color] Gender: [Gender] (Include Season, Usage, Relevance if available) ...""" ``` 通过 system prompt 约束工具使用和输出格式,保证前端可正确解析产品信息。 #### 4.1.4 对话入口与流式处理 ```python def chat(self, query: str, image_path: Optional[str] = None) -> dict: # Build input message message_content = query if image_path: message_content = f"{query}\n[User uploaded image: {image_path}]" config = {"configurable": {"thread_id": self.session_id}} input_state = { "messages": [HumanMessage(content=message_content)], "current_image_path": image_path, } tool_calls = [] for event in self.graph.stream(input_state, config=config): if "agent" in event: for msg in event["agent"].get("messages", []): if hasattr(msg, "tool_calls") and msg.tool_calls: for tc in msg.tool_calls: tool_calls.append({"name": tc["name"], "args": tc.get("args", {})}) if "tools" in event: # 记录工具执行结果 ... final_state = self.graph.get_state(config) response_text = final_state.values["messages"][-1].content return {"response": response_text, "tool_calls": tool_calls, "error": False} ``` --- ### 4.2 搜索工具实现(search_tools.py) #### 4.2.1 文本搜索(Search API) ```python @tool def search_products(query: str, limit: int = 5) -> str: """Search for fashion products using natural language descriptions.""" try: url = f"{settings.search_api_base_url.rstrip('/')}/search/" headers = { "Content-Type": "application/json", "X-Tenant-ID": settings.search_api_tenant_id, } payload = { "query": query, "size": min(limit, 20), "from": 0, "language": "zh", } response = requests.post(url, json=payload, headers=headers, timeout=60) data = response.json() results = data.get("results", []) if not results: return "No products found matching your search." output = f"Found {len(results)} product(s):\n\n" for idx, product in enumerate(results, 1): output += f"{idx}. {product.get('title', 'Unknown Product')}\n" output += f" ID: {product.get('spu_id', 'N/A')}\n" output += f" Category: {product.get('category_path', 'N/A')}\n" output += f" Price: {product.get('price')}\n" output += "\n" return output.strip() except Exception as e: return f"Error searching products: {str(e)}" ``` #### 4.2.2 视觉分析(VLM) ```python @tool def analyze_image_style(image_path: str) -> str: """Analyze a fashion product image using AI vision to extract detailed style information.""" with open(img_path, "rb") as image_file: image_data = base64.b64encode(image_file.read()).decode("utf-8") prompt = """Analyze this fashion product image and provide a detailed description. Include: - Product type (e.g., shirt, dress, shoes, pants, bag) - Primary colors - Style/design (e.g., casual, formal, sporty, vintage, modern) - Pattern or texture (e.g., plain, striped, checked, floral) - Key features (e.g., collar type, sleeve length, fit) - Material appearance (if obvious, e.g., denim, cotton, leather) - Suitable occasion (e.g., office wear, party, casual, sports) Provide a comprehensive yet concise description (3-4 sentences).""" client = get_openai_client() response = client.chat.completions.create( model="gpt-4o-mini", messages=[{ "role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}", "detail": "high"}}, ], }], max_tokens=500, temperature=0.3, ) return response.choices[0].message.content.strip() ``` --- ### 4.3 Streamlit 前端(app.py) #### 4.3.1 会话与 Agent 初始化 ```python def initialize_session(): if "session_id" not in st.session_state: st.session_state.session_id = str(uuid.uuid4()) if "shopping_agent" not in st.session_state: st.session_state.shopping_agent = ShoppingAgent(session_id=st.session_state.session_id) if "messages" not in st.session_state: st.session_state.messages = [] if "uploaded_image" not in st.session_state: st.session_state.uploaded_image = None ``` #### 4.3.2 产品信息解析 ```python def extract_products_from_response(response: str) -> list: """从 Agent 回复中解析产品信息""" products = [] for line in response.split("\n"): if re.match(r"^\*?\*?\d+\.\s+", line): if current_product: products.append(current_product) current_product = {"name": re.sub(r"^\*?\*?\d+\.\s+", "", line).replace("**", "").strip()} elif "ID:" in line: id_match = re.search(r"(?:ID|id):\s*(\d+)", line) if id_match: current_product["id"] = id_match.group(1) elif "Category:" in line: cat_match = re.search(r"Category:\s*(.+?)(?:\n|$)", line) if cat_match: current_product["category"] = cat_match.group(1).strip() # ... Color, Gender, Season, Usage, Similarity/Relevance return products ``` #### 4.3.3 多轮对话中的图片引用 ```python # 用户输入 "make them formal" 时,若上一条消息有图片,则引用该图片 if any(ref in query_lower for ref in ["this", "that", "the image", "it"]): for msg in reversed(st.session_state.messages): if msg.get("role") == "user" and msg.get("image_path"): image_path = msg["image_path"] break ``` --- ### 4.4 配置管理(config.py) ```python class Settings(BaseSettings): openai_api_key: str openai_model: str = "gpt-4o-mini" search_api_base_url: str = "http://120.76.41.98:6002" search_api_tenant_id: str = "162" class Config: env_file = ".env" ``` --- ## 五、部署与运行 ### 5.1 依赖服务 - **Search API**:外部搜索服务(HTTP) - **OpenAI API**:LLM 与 VLM 图像分析 ### 5.2 启动流程 ```bash # 1. 环境 pip install -r requirements.txt cp .env.example .env # 配置 OPENAI_API_KEY、SEARCH_API_* 等 # 2. (可选)下载数据 python scripts/download_dataset.py # Kaggle Fashion Product Images Dataset # 3. 启动应用 streamlit run app.py # 或 ./scripts/start.sh ``` --- ## 六、典型交互流程 | 场景 | 用户输入 | Agent 行为 | 工具调用 | |------|----------|------------|----------| | 文本搜索 | "winter coats for women" | 直接文本搜索 | `search_products("winter coats women")` | | 风格分析+搜索 | [上传复古夹克] "what style? find matching pants" | 先分析风格再搜索 | `analyze_image_style(path)` → `search_products("vintage pants casual")` | | 多轮上下文 | [第1轮] "show me red dresses"
[第2轮] "make them formal" | 结合上下文 | `search_products("red formal dresses")` | --- ## 七、设计要点总结 1. **ReAct 模式**:Agent 自主决定何时调用工具、调用哪些工具、是否继续调用。 2. **LangGraph 状态图**:`START → Agent → [条件] → Tools → Agent → END`,支持多轮工具调用。 3. **搜索与风格分析**:Search API 文本搜索 + VLM 图像风格分析。 4. **会话持久化**:`MemorySaver` + `thread_id` 实现多轮对话记忆。 5. **格式约束**:System prompt 严格限制产品输出格式,便于前端解析和展示。 --- ## 八、附录:项目结构 ``` ShopAgent/ ├── app/ │ ├── agents/ │ │ └── shopping_agent.py │ ├── config.py │ ├── services/ │ └── tools/ │ └── search_tools.py ├── scripts/ │ ├── download_dataset.py │ └── index_data.py ├── app.py ├── docker-compose.yml └── requirements.txt ```