Blame view

docs/技术实现报告.md 14.5 KB
bad17b15   tangwang   调通baseline
1
  # ShopAgent 项目技术实现报告
e7f2b240   tangwang   first commit
2
3
4
  
  ## 一、项目概述
  
bad17b15   tangwang   调通baseline
5
  ShopAgent 是一个基于 **LangGraph****ReAct 模式** 的自主多模态时尚购物智能体。系统能够自主决定调用哪些工具、维护对话状态、判断何时回复,实现智能化的商品发现与推荐。
e7f2b240   tangwang   first commit
6
7
8
9
  
  ### 核心特性
  
  - **自主工具选择与执行**:Agent 根据用户意图自主选择并调用工具
8810a6fa   tangwang   重构
10
  - **文本搜索**:通过 Search API 进行商品搜索
e7f2b240   tangwang   first commit
11
12
13
14
15
16
17
18
19
20
21
22
  - **对话上下文感知**:多轮对话中保持上下文记忆
  - **实时视觉分析**:基于 VLM 的图片风格分析
  
  ---
  
  ## 二、技术栈
  
  | 组件 | 技术选型 |
  |------|----------|
  | 运行环境 | Python 3.12 |
  | Agent 框架 | LangGraph 1.x |
  | LLM 框架 | LangChain 1.x(支持任意 LLM,默认 gpt-4o-mini) |
8810a6fa   tangwang   重构
23
  | 搜索服务 | Search API (HTTP) |
e7f2b240   tangwang   first commit
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
  | 前端 | Streamlit |
  | 数据集 | Kaggle Fashion Products |
  
  ---
  
  ## 三、系统架构
  
  ### 3.1 整体架构图
  
  ```
  ┌─────────────────────────────────────────────────────────────────┐
  │                     Streamlit 前端 (app.py)                       │
  └─────────────────────────────────────────────────────────────────┘
  
  
  ┌─────────────────────────────────────────────────────────────────┐
  │              ShoppingAgent (shopping_agent.py)                    │
  │  ┌───────────────────────────────────────────────────────────┐  │
  │  │  LangGraph StateGraph + ReAct Pattern                      │  │
  │  │  START → Agent → [Has tool_calls?] → Tools → Agent → END   │  │
  │  └───────────────────────────────────────────────────────────┘  │
  └─────────────────────────────────────────────────────────────────┘
8810a6fa   tangwang   重构
46
47
48
49
50
51
52
53
54
55
56
57
58
          │                    │
          ▼                    ▼
  ┌──────────────┐   ┌─────────────────────┐
  │ search_      │   │ analyze_image_style  │
  │ products     │   │ (OpenAI Vision)      │
  └──────┬───────┘   └──────────┬──────────┘
         │                      │
         ▼                      │
  ┌──────────────────┐          │
  │   Search API     │          │
  │ (HTTP POST)      │          │
  └──────────────────┘          │
  
e7f2b240   tangwang   first commit
59
  ┌─────────────────────────────────────────────────────────────────┐
8810a6fa   tangwang   重构
60
  │           OpenAI API (VLM 风格分析)                               │
e7f2b240   tangwang   first commit
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
  └─────────────────────────────────────────────────────────────────┘
  ```
  
  ### 3.2 Agent 流程图(LangGraph)
  
  ```mermaid
  graph LR
      START --> Agent
      Agent -->|Has tool_calls| Tools
      Agent -->|No tool_calls| END
      Tools --> Agent
  ```
  
  ---
  
  ## 四、关键代码实现
  
  ### 4.1 Agent 核心实现(shopping_agent.py)
  
  #### 4.1.1 状态定义
  
  ```python
  from typing_extensions import Annotated, TypedDict
  from langgraph.graph.message import add_messages
  
  class AgentState(TypedDict):
      """State for the shopping agent with message accumulation"""
      messages: Annotated[Sequence[BaseMessage], add_messages]
      current_image_path: Optional[str]  # Track uploaded image
  ```
  
  - `messages` 使用 `add_messages` 实现消息累加,支持多轮对话
  - `current_image_path` 存储当前上传的图片路径供工具使用
  
  #### 4.1.2 LangGraph 图构建
  
  ```python
  def _build_graph(self):
      """Build the LangGraph StateGraph"""
      
      def agent_node(state: AgentState):
          """Agent decision node - decides which tools to call or when to respond"""
          messages = state["messages"]
          if not any(isinstance(m, SystemMessage) for m in messages):
              messages = [SystemMessage(content=system_prompt)] + list(messages)
          response = self.llm_with_tools.invoke(messages)
          return {"messages": [response]}
  
      tool_node = ToolNode(self.tools)
  
      def should_continue(state: AgentState):
          """Determine if agent should continue or end"""
          last_message = state["messages"][-1]
          if hasattr(last_message, "tool_calls") and last_message.tool_calls:
              return "tools"
          return END
  
      workflow = StateGraph(AgentState)
      workflow.add_node("agent", agent_node)
      workflow.add_node("tools", tool_node)
      workflow.add_edge(START, "agent")
      workflow.add_conditional_edges("agent", should_continue, ["tools", END])
      workflow.add_edge("tools", "agent")
  
      checkpointer = MemorySaver()
      return workflow.compile(checkpointer=checkpointer)
  ```
  
  关键点:
  - **agent_node**:将消息传入 LLM,由 LLM 决定是否调用工具
  - **should_continue**:若有 `tool_calls` 则进入工具节点,否则结束
  - **MemorySaver**:按 `thread_id` 持久化对话状态
  
  #### 4.1.3 System Prompt 设计
  
  ```python
  system_prompt = """You are an intelligent fashion shopping assistant. You can:
  1. Search for products by text description (use search_products)
8810a6fa   tangwang   重构
139
  2. Analyze image style and attributes (use analyze_image_style)
e7f2b240   tangwang   first commit
140
141
142
  
  When a user asks about products:
  - For text queries: use search_products directly
8810a6fa   tangwang   重构
143
  - For image uploads: use analyze_image_style first to understand the product, then use search_products with the extracted description
e7f2b240   tangwang   first commit
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
  - You can call multiple tools in sequence if needed
  - Always provide helpful, friendly responses
  
  CRITICAL FORMATTING RULES:
  When presenting product results, you MUST use this EXACT format for EACH product:
  1. [Product Name]
     ID: [Product ID Number]
     Category: [Category]
     Color: [Color]
     Gender: [Gender]
     (Include Season, Usage, Relevance if available)
  ..."""
  ```
  
  通过 system prompt 约束工具使用和输出格式,保证前端可正确解析产品信息。
  
  #### 4.1.4 对话入口与流式处理
  
  ```python
  def chat(self, query: str, image_path: Optional[str] = None) -> dict:
      # Build input message
      message_content = query
      if image_path:
          message_content = f"{query}\n[User uploaded image: {image_path}]"
  
      config = {"configurable": {"thread_id": self.session_id}}
      input_state = {
          "messages": [HumanMessage(content=message_content)],
          "current_image_path": image_path,
      }
  
      tool_calls = []
      for event in self.graph.stream(input_state, config=config):
          if "agent" in event:
              for msg in event["agent"].get("messages", []):
                  if hasattr(msg, "tool_calls") and msg.tool_calls:
                      for tc in msg.tool_calls:
                          tool_calls.append({"name": tc["name"], "args": tc.get("args", {})})
          if "tools" in event:
              # 记录工具执行结果
              ...
  
      final_state = self.graph.get_state(config)
      response_text = final_state.values["messages"][-1].content
  
      return {"response": response_text, "tool_calls": tool_calls, "error": False}
  ```
  
  ---
  
  ### 4.2 搜索工具实现(search_tools.py)
  
8810a6fa   tangwang   重构
196
  #### 4.2.1 文本搜索(Search API)
e7f2b240   tangwang   first commit
197
198
199
200
201
202
  
  ```python
  @tool
  def search_products(query: str, limit: int = 5) -> str:
      """Search for fashion products using natural language descriptions."""
      try:
8810a6fa   tangwang   重构
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
          url = f"{settings.search_api_base_url.rstrip('/')}/search/"
          headers = {
              "Content-Type": "application/json",
              "X-Tenant-ID": settings.search_api_tenant_id,
          }
          payload = {
              "query": query,
              "size": min(limit, 20),
              "from": 0,
              "language": "zh",
          }
  
          response = requests.post(url, json=payload, headers=headers, timeout=60)
          data = response.json()
          results = data.get("results", [])
e7f2b240   tangwang   first commit
218
219
220
221
222
223
  
          if not results:
              return "No products found matching your search."
  
          output = f"Found {len(results)} product(s):\n\n"
          for idx, product in enumerate(results, 1):
8810a6fa   tangwang   重构
224
225
226
227
              output += f"{idx}. {product.get('title', 'Unknown Product')}\n"
              output += f"   ID: {product.get('spu_id', 'N/A')}\n"
              output += f"   Category: {product.get('category_path', 'N/A')}\n"
              output += f"   Price: {product.get('price')}\n"
e7f2b240   tangwang   first commit
228
229
230
231
232
233
234
              output += "\n"
  
          return output.strip()
      except Exception as e:
          return f"Error searching products: {str(e)}"
  ```
  
8810a6fa   tangwang   重构
235
  #### 4.2.2 视觉分析(VLM)
e7f2b240   tangwang   first commit
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
  
  ```python
  @tool
  def analyze_image_style(image_path: str) -> str:
      """Analyze a fashion product image using AI vision to extract detailed style information."""
      with open(img_path, "rb") as image_file:
          image_data = base64.b64encode(image_file.read()).decode("utf-8")
  
      prompt = """Analyze this fashion product image and provide a detailed description.
  Include:
  - Product type (e.g., shirt, dress, shoes, pants, bag)
  - Primary colors
  - Style/design (e.g., casual, formal, sporty, vintage, modern)
  - Pattern or texture (e.g., plain, striped, checked, floral)
  - Key features (e.g., collar type, sleeve length, fit)
  - Material appearance (if obvious, e.g., denim, cotton, leather)
  - Suitable occasion (e.g., office wear, party, casual, sports)
  Provide a comprehensive yet concise description (3-4 sentences)."""
  
      client = get_openai_client()
      response = client.chat.completions.create(
          model="gpt-4o-mini",
          messages=[{
              "role": "user",
              "content": [
                  {"type": "text", "text": prompt},
                  {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}", "detail": "high"}},
              ],
          }],
          max_tokens=500,
          temperature=0.3,
      )
  
      return response.choices[0].message.content.strip()
  ```
  
  ---
  
8810a6fa   tangwang   重构
274
  ### 4.3 Streamlit 前端(app.py)
e7f2b240   tangwang   first commit
275
  
8810a6fa   tangwang   重构
276
  #### 4.3.1 会话与 Agent 初始化
e7f2b240   tangwang   first commit
277
278
279
280
281
282
283
284
285
286
287
288
289
  
  ```python
  def initialize_session():
      if "session_id" not in st.session_state:
          st.session_state.session_id = str(uuid.uuid4())
      if "shopping_agent" not in st.session_state:
          st.session_state.shopping_agent = ShoppingAgent(session_id=st.session_state.session_id)
      if "messages" not in st.session_state:
          st.session_state.messages = []
      if "uploaded_image" not in st.session_state:
          st.session_state.uploaded_image = None
  ```
  
8810a6fa   tangwang   重构
290
  #### 4.3.2 产品信息解析
e7f2b240   tangwang   first commit
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
  
  ```python
  def extract_products_from_response(response: str) -> list:
      """从 Agent 回复中解析产品信息"""
      products = []
      for line in response.split("\n"):
          if re.match(r"^\*?\*?\d+\.\s+", line):
              if current_product:
                  products.append(current_product)
              current_product = {"name": re.sub(r"^\*?\*?\d+\.\s+", "", line).replace("**", "").strip()}
          elif "ID:" in line:
              id_match = re.search(r"(?:ID|id):\s*(\d+)", line)
              if id_match:
                  current_product["id"] = id_match.group(1)
          elif "Category:" in line:
              cat_match = re.search(r"Category:\s*(.+?)(?:\n|$)", line)
              if cat_match:
                  current_product["category"] = cat_match.group(1).strip()
          # ... Color, Gender, Season, Usage, Similarity/Relevance
      return products
  ```
  
8810a6fa   tangwang   重构
313
  #### 4.3.3 多轮对话中的图片引用
e7f2b240   tangwang   first commit
314
315
316
317
318
319
320
321
322
323
324
325
  
  ```python
  # 用户输入 "make them formal" 时,若上一条消息有图片,则引用该图片
  if any(ref in query_lower for ref in ["this", "that", "the image", "it"]):
      for msg in reversed(st.session_state.messages):
          if msg.get("role") == "user" and msg.get("image_path"):
              image_path = msg["image_path"]
              break
  ```
  
  ---
  
8810a6fa   tangwang   重构
326
  ### 4.4 配置管理(config.py)
e7f2b240   tangwang   first commit
327
328
329
330
331
  
  ```python
  class Settings(BaseSettings):
      openai_api_key: str
      openai_model: str = "gpt-4o-mini"
8810a6fa   tangwang   重构
332
333
      search_api_base_url: str = "http://120.76.41.98:6002"
      search_api_tenant_id: str = "162"
e7f2b240   tangwang   first commit
334
335
336
337
338
339
340
341
342
343
344
  
      class Config:
          env_file = ".env"
  ```
  
  ---
  
  ## 五、部署与运行
  
  ### 5.1 依赖服务
  
8810a6fa   tangwang   重构
345
346
  - **Search API**:外部搜索服务(HTTP)
  - **OpenAI API**:LLM 与 VLM 图像分析
e7f2b240   tangwang   first commit
347
348
349
350
351
352
  
  ### 5.2 启动流程
  
  ```bash
  # 1. 环境
  pip install -r requirements.txt
8810a6fa   tangwang   重构
353
  cp .env.example .env  # 配置 OPENAI_API_KEY、SEARCH_API_*
e7f2b240   tangwang   first commit
354
  
8810a6fa   tangwang   重构
355
  # 2. (可选)下载数据
e7f2b240   tangwang   first commit
356
357
  python scripts/download_dataset.py  # Kaggle Fashion Product Images Dataset
  
8810a6fa   tangwang   重构
358
  # 3. 启动应用
e7f2b240   tangwang   first commit
359
  streamlit run app.py
8810a6fa   tangwang   重构
360
  # 或 ./scripts/start.sh
e7f2b240   tangwang   first commit
361
362
363
364
365
366
367
368
369
  ```
  
  ---
  
  ## 六、典型交互流程
  
  | 场景 | 用户输入 | Agent 行为 | 工具调用 |
  |------|----------|------------|----------|
  | 文本搜索 | "winter coats for women" | 直接文本搜索 | `search_products("winter coats women")` |
e7f2b240   tangwang   first commit
370
371
372
373
374
375
376
377
378
  | 风格分析+搜索 | [上传复古夹克] "what style? find matching pants" | 先分析风格再搜索 | `analyze_image_style(path)` → `search_products("vintage pants casual")` |
  | 多轮上下文 | [第1轮] "show me red dresses"<br>[第2轮] "make them formal" | 结合上下文 | `search_products("red formal dresses")` |
  
  ---
  
  ## 七、设计要点总结
  
  1. **ReAct 模式**:Agent 自主决定何时调用工具、调用哪些工具、是否继续调用。
  2. **LangGraph 状态图**`START → Agent → [条件] → Tools → Agent → END`,支持多轮工具调用。
8810a6fa   tangwang   重构
379
380
381
  3. **搜索与风格分析**:Search API 文本搜索 + VLM 图像风格分析。
  4. **会话持久化**`MemorySaver` + `thread_id` 实现多轮对话记忆。
  5. **格式约束**:System prompt 严格限制产品输出格式,便于前端解析和展示。
e7f2b240   tangwang   first commit
382
383
384
385
386
387
  
  ---
  
  ## 八、附录:项目结构
  
  ```
bad17b15   tangwang   调通baseline
388
  ShopAgent/
e7f2b240   tangwang   first commit
389
390
391
392
393
  ├── app/
  │   ├── agents/
  │   │   └── shopping_agent.py
  │   ├── config.py
  │   ├── services/
e7f2b240   tangwang   first commit
394
395
396
397
398
399
400
401
402
  │   └── tools/
  │       └── search_tools.py
  ├── scripts/
  │   ├── download_dataset.py
  │   └── index_data.py
  ├── app.py
  ├── docker-compose.yml
  └── requirements.txt
  ```