技术实现报告.md
14.5 KB
OmniShopAgent 项目技术实现报告
一、项目概述
OmniShopAgent 是一个基于 LangGraph 和 ReAct 模式 的自主多模态时尚购物智能体。系统能够自主决定调用哪些工具、维护对话状态、判断何时回复,实现智能化的商品发现与推荐。
核心特性
- 自主工具选择与执行:Agent 根据用户意图自主选择并调用工具
- 文本搜索:通过 Search API 进行商品搜索
- 对话上下文感知:多轮对话中保持上下文记忆
- 实时视觉分析:基于 VLM 的图片风格分析
二、技术栈
| 组件 | 技术选型 |
|---|---|
| 运行环境 | Python 3.12 |
| Agent 框架 | LangGraph 1.x |
| LLM 框架 | LangChain 1.x(支持任意 LLM,默认 gpt-4o-mini) |
| 搜索服务 | Search API (HTTP) |
| 前端 | Streamlit |
| 数据集 | Kaggle Fashion Products |
三、系统架构
3.1 整体架构图
┌─────────────────────────────────────────────────────────────────┐
│ Streamlit 前端 (app.py) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ShoppingAgent (shopping_agent.py) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ LangGraph StateGraph + ReAct Pattern │ │
│ │ START → Agent → [Has tool_calls?] → Tools → Agent → END │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌─────────────────────┐
│ search_ │ │ analyze_image_style │
│ products │ │ (OpenAI Vision) │
└──────┬───────┘ └──────────┬──────────┘
│ │
▼ │
┌──────────────────┐ │
│ Search API │ │
│ (HTTP POST) │ │
└──────────────────┘ │
▼
┌─────────────────────────────────────────────────────────────────┐
│ OpenAI API (VLM 风格分析) │
└─────────────────────────────────────────────────────────────────┘
3.2 Agent 流程图(LangGraph)
graph LR
START --> Agent
Agent -->|Has tool_calls| Tools
Agent -->|No tool_calls| END
Tools --> Agent
四、关键代码实现
4.1 Agent 核心实现(shopping_agent.py)
4.1.1 状态定义
from typing_extensions import Annotated, TypedDict
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
"""State for the shopping agent with message accumulation"""
messages: Annotated[Sequence[BaseMessage], add_messages]
current_image_path: Optional[str] # Track uploaded image
messages使用add_messages实现消息累加,支持多轮对话current_image_path存储当前上传的图片路径供工具使用
4.1.2 LangGraph 图构建
def _build_graph(self):
"""Build the LangGraph StateGraph"""
def agent_node(state: AgentState):
"""Agent decision node - decides which tools to call or when to respond"""
messages = state["messages"]
if not any(isinstance(m, SystemMessage) for m in messages):
messages = [SystemMessage(content=system_prompt)] + list(messages)
response = self.llm_with_tools.invoke(messages)
return {"messages": [response]}
tool_node = ToolNode(self.tools)
def should_continue(state: AgentState):
"""Determine if agent should continue or end"""
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
return END
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)
workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", should_continue, ["tools", END])
workflow.add_edge("tools", "agent")
checkpointer = MemorySaver()
return workflow.compile(checkpointer=checkpointer)
关键点:
- agent_node:将消息传入 LLM,由 LLM 决定是否调用工具
- should_continue:若有
tool_calls则进入工具节点,否则结束 - MemorySaver:按
thread_id持久化对话状态
4.1.3 System Prompt 设计
system_prompt = """You are an intelligent fashion shopping assistant. You can:
1. Search for products by text description (use search_products)
2. Analyze image style and attributes (use analyze_image_style)
When a user asks about products:
- For text queries: use search_products directly
- For image uploads: use analyze_image_style first to understand the product, then use search_products with the extracted description
- You can call multiple tools in sequence if needed
- Always provide helpful, friendly responses
CRITICAL FORMATTING RULES:
When presenting product results, you MUST use this EXACT format for EACH product:
1. [Product Name]
ID: [Product ID Number]
Category: [Category]
Color: [Color]
Gender: [Gender]
(Include Season, Usage, Relevance if available)
..."""
通过 system prompt 约束工具使用和输出格式,保证前端可正确解析产品信息。
4.1.4 对话入口与流式处理
def chat(self, query: str, image_path: Optional[str] = None) -> dict:
# Build input message
message_content = query
if image_path:
message_content = f"{query}\n[User uploaded image: {image_path}]"
config = {"configurable": {"thread_id": self.session_id}}
input_state = {
"messages": [HumanMessage(content=message_content)],
"current_image_path": image_path,
}
tool_calls = []
for event in self.graph.stream(input_state, config=config):
if "agent" in event:
for msg in event["agent"].get("messages", []):
if hasattr(msg, "tool_calls") and msg.tool_calls:
for tc in msg.tool_calls:
tool_calls.append({"name": tc["name"], "args": tc.get("args", {})})
if "tools" in event:
# 记录工具执行结果
...
final_state = self.graph.get_state(config)
response_text = final_state.values["messages"][-1].content
return {"response": response_text, "tool_calls": tool_calls, "error": False}
4.2 搜索工具实现(search_tools.py)
4.2.1 文本搜索(Search API)
@tool
def search_products(query: str, limit: int = 5) -> str:
"""Search for fashion products using natural language descriptions."""
try:
url = f"{settings.search_api_base_url.rstrip('/')}/search/"
headers = {
"Content-Type": "application/json",
"X-Tenant-ID": settings.search_api_tenant_id,
}
payload = {
"query": query,
"size": min(limit, 20),
"from": 0,
"language": "zh",
}
response = requests.post(url, json=payload, headers=headers, timeout=60)
data = response.json()
results = data.get("results", [])
if not results:
return "No products found matching your search."
output = f"Found {len(results)} product(s):\n\n"
for idx, product in enumerate(results, 1):
output += f"{idx}. {product.get('title', 'Unknown Product')}\n"
output += f" ID: {product.get('spu_id', 'N/A')}\n"
output += f" Category: {product.get('category_path', 'N/A')}\n"
output += f" Price: {product.get('price')}\n"
output += "\n"
return output.strip()
except Exception as e:
return f"Error searching products: {str(e)}"
4.2.2 视觉分析(VLM)
@tool
def analyze_image_style(image_path: str) -> str:
"""Analyze a fashion product image using AI vision to extract detailed style information."""
with open(img_path, "rb") as image_file:
image_data = base64.b64encode(image_file.read()).decode("utf-8")
prompt = """Analyze this fashion product image and provide a detailed description.
Include:
- Product type (e.g., shirt, dress, shoes, pants, bag)
- Primary colors
- Style/design (e.g., casual, formal, sporty, vintage, modern)
- Pattern or texture (e.g., plain, striped, checked, floral)
- Key features (e.g., collar type, sleeve length, fit)
- Material appearance (if obvious, e.g., denim, cotton, leather)
- Suitable occasion (e.g., office wear, party, casual, sports)
Provide a comprehensive yet concise description (3-4 sentences)."""
client = get_openai_client()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}", "detail": "high"}},
],
}],
max_tokens=500,
temperature=0.3,
)
return response.choices[0].message.content.strip()
4.3 Streamlit 前端(app.py)
4.3.1 会话与 Agent 初始化
def initialize_session():
if "session_id" not in st.session_state:
st.session_state.session_id = str(uuid.uuid4())
if "shopping_agent" not in st.session_state:
st.session_state.shopping_agent = ShoppingAgent(session_id=st.session_state.session_id)
if "messages" not in st.session_state:
st.session_state.messages = []
if "uploaded_image" not in st.session_state:
st.session_state.uploaded_image = None
4.3.2 产品信息解析
def extract_products_from_response(response: str) -> list:
"""从 Agent 回复中解析产品信息"""
products = []
for line in response.split("\n"):
if re.match(r"^\*?\*?\d+\.\s+", line):
if current_product:
products.append(current_product)
current_product = {"name": re.sub(r"^\*?\*?\d+\.\s+", "", line).replace("**", "").strip()}
elif "ID:" in line:
id_match = re.search(r"(?:ID|id):\s*(\d+)", line)
if id_match:
current_product["id"] = id_match.group(1)
elif "Category:" in line:
cat_match = re.search(r"Category:\s*(.+?)(?:\n|$)", line)
if cat_match:
current_product["category"] = cat_match.group(1).strip()
# ... Color, Gender, Season, Usage, Similarity/Relevance
return products
4.3.3 多轮对话中的图片引用
# 用户输入 "make them formal" 时,若上一条消息有图片,则引用该图片
if any(ref in query_lower for ref in ["this", "that", "the image", "it"]):
for msg in reversed(st.session_state.messages):
if msg.get("role") == "user" and msg.get("image_path"):
image_path = msg["image_path"]
break
4.4 配置管理(config.py)
class Settings(BaseSettings):
openai_api_key: str
openai_model: str = "gpt-4o-mini"
search_api_base_url: str = "http://120.76.41.98:6002"
search_api_tenant_id: str = "162"
class Config:
env_file = ".env"
五、部署与运行
5.1 依赖服务
- Search API:外部搜索服务(HTTP)
- OpenAI API:LLM 与 VLM 图像分析
5.2 启动流程
# 1. 环境
pip install -r requirements.txt
cp .env.example .env # 配置 OPENAI_API_KEY、SEARCH_API_* 等
# 2. (可选)下载数据
python scripts/download_dataset.py # Kaggle Fashion Product Images Dataset
# 3. 启动应用
streamlit run app.py
# 或 ./scripts/start.sh
六、典型交互流程
| 场景 | 用户输入 | Agent 行为 | 工具调用 |
|---|---|---|---|
| 文本搜索 | "winter coats for women" | 直接文本搜索 | search_products("winter coats women") |
| 风格分析+搜索 | [上传复古夹克] "what style? find matching pants" | 先分析风格再搜索 | analyze_image_style(path) → search_products("vintage pants casual") |
| 多轮上下文 | [第1轮] "show me red dresses"[第2轮] "make them formal" | 结合上下文 | search_products("red formal dresses") |
七、设计要点总结
- ReAct 模式:Agent 自主决定何时调用工具、调用哪些工具、是否继续调用。
- LangGraph 状态图:
START → Agent → [条件] → Tools → Agent → END,支持多轮工具调用。 - 搜索与风格分析:Search API 文本搜索 + VLM 图像风格分析。
- 会话持久化:
MemorySaver+thread_id实现多轮对话记忆。 - 格式约束:System prompt 严格限制产品输出格式,便于前端解析和展示。
八、附录:项目结构
OmniShopAgent/
├── app/
│ ├── agents/
│ │ └── shopping_agent.py
│ ├── config.py
│ ├── services/
│ └── tools/
│ └── search_tools.py
├── scripts/
│ ├── download_dataset.py
│ └── index_data.py
├── app.py
├── docker-compose.yml
└── requirements.txt