技术实现报告.md 14.5 KB
Edit Raw Blame History


OmniShopAgent 项目技术实现报告
一、项目概述
OmniShopAgent 是一个基于 LangGraph 和 ReAct 模式 的自主多模态时尚购物智能体。系统能够自主决定调用哪些工具、维护对话状态、判断何时回复，实现智能化的商品发现与推荐。
核心特性

自主工具选择与执行：Agent 根据用户意图自主选择并调用工具
文本搜索：通过 Search API 进行商品搜索
对话上下文感知：多轮对话中保持上下文记忆
实时视觉分析：基于 VLM 的图片风格分析


二、技术栈


组件
技术选型


运行环境
Python 3.12


Agent 框架
LangGraph 1.x


LLM 框架
LangChain 1.x（支持任意 LLM，默认 gpt-4o-mini）


搜索服务
Search API (HTTP)


前端
Streamlit


数据集
Kaggle Fashion Products


三、系统架构
3.1 整体架构图
┌─────────────────────────────────────────────────────────────────┐
│                     Streamlit 前端 (app.py)                       │
└─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────┐
│              ShoppingAgent (shopping_agent.py)                    │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  LangGraph StateGraph + ReAct Pattern                      │  │
│  │  START → Agent → [Has tool_calls?] → Tools → Agent → END   │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
        │                    │
        ▼                    ▼
┌──────────────┐   ┌─────────────────────┐
│ search_      │   │ analyze_image_style  │
│ products     │   │ (OpenAI Vision)      │
└──────┬───────┘   └──────────┬──────────┘
       │                      │
       ▼                      │
┌──────────────────┐          │
│   Search API     │          │
│ (HTTP POST)      │          │
└──────────────────┘          │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│           OpenAI API (VLM 风格分析)                               │
└─────────────────────────────────────────────────────────────────┘

3.2 Agent 流程图（LangGraph）
graph LR
    START --> Agent
    Agent -->|Has tool_calls| Tools
    Agent -->|No tool_calls| END
    Tools --> Agent


四、关键代码实现
4.1 Agent 核心实现（shopping_agent.py）
4.1.1 状态定义
from typing_extensions import Annotated, TypedDict
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    """State for the shopping agent with message accumulation"""
    messages: Annotated[Sequence[BaseMessage], add_messages]
    current_image_path: Optional[str]  # Track uploaded image


messages 使用 add_messages 实现消息累加，支持多轮对话
current_image_path 存储当前上传的图片路径供工具使用

4.1.2 LangGraph 图构建
def _build_graph(self):
    """Build the LangGraph StateGraph"""

    def agent_node(state: AgentState):
        """Agent decision node - decides which tools to call or when to respond"""
        messages = state["messages"]
        if not any(isinstance(m, SystemMessage) for m in messages):
            messages = [SystemMessage(content=system_prompt)] + list(messages)
        response = self.llm_with_tools.invoke(messages)
        return {"messages": [response]}

    tool_node = ToolNode(self.tools)

    def should_continue(state: AgentState):
        """Determine if agent should continue or end"""
        last_message = state["messages"][-1]
        if hasattr(last_message, "tool_calls") and last_message.tool_calls:
            return "tools"
        return END

    workflow = StateGraph(AgentState)
    workflow.add_node("agent", agent_node)
    workflow.add_node("tools", tool_node)
    workflow.add_edge(START, "agent")
    workflow.add_conditional_edges("agent", should_continue, ["tools", END])
    workflow.add_edge("tools", "agent")

    checkpointer = MemorySaver()
    return workflow.compile(checkpointer=checkpointer)


关键点：


agent_node：将消息传入 LLM，由 LLM 决定是否调用工具
should_continue：若有 tool_calls 则进入工具节点，否则结束
MemorySaver：按 thread_id 持久化对话状态

4.1.3 System Prompt 设计
system_prompt = """You are an intelligent fashion shopping assistant. You can:
1. Search for products by text description (use search_products)
2. Analyze image style and attributes (use analyze_image_style)

When a user asks about products:
- For text queries: use search_products directly
- For image uploads: use analyze_image_style first to understand the product, then use search_products with the extracted description
- You can call multiple tools in sequence if needed
- Always provide helpful, friendly responses

CRITICAL FORMATTING RULES:
When presenting product results, you MUST use this EXACT format for EACH product:
1. [Product Name]
   ID: [Product ID Number]
   Category: [Category]
   Color: [Color]
   Gender: [Gender]
   (Include Season, Usage, Relevance if available)
..."""


通过 system prompt 约束工具使用和输出格式，保证前端可正确解析产品信息。
4.1.4 对话入口与流式处理
def chat(self, query: str, image_path: Optional[str] = None) -> dict:
    # Build input message
    message_content = query
    if image_path:
        message_content = f"{query}\n[User uploaded image: {image_path}]"

    config = {"configurable": {"thread_id": self.session_id}}
    input_state = {
        "messages": [HumanMessage(content=message_content)],
        "current_image_path": image_path,
    }

    tool_calls = []
    for event in self.graph.stream(input_state, config=config):
        if "agent" in event:
            for msg in event["agent"].get("messages", []):
                if hasattr(msg, "tool_calls") and msg.tool_calls:
                    for tc in msg.tool_calls:
                        tool_calls.append({"name": tc["name"], "args": tc.get("args", {})})
        if "tools" in event:
            # 记录工具执行结果
            ...

    final_state = self.graph.get_state(config)
    response_text = final_state.values["messages"][-1].content

    return {"response": response_text, "tool_calls": tool_calls, "error": False}


4.2 搜索工具实现（search_tools.py）
4.2.1 文本搜索（Search API）
@tool
def search_products(query: str, limit: int = 5) -> str:
    """Search for fashion products using natural language descriptions."""
    try:
        url = f"{settings.search_api_base_url.rstrip('/')}/search/"
        headers = {
            "Content-Type": "application/json",
            "X-Tenant-ID": settings.search_api_tenant_id,
        }
        payload = {
            "query": query,
            "size": min(limit, 20),
            "from": 0,
            "language": "zh",
        }

        response = requests.post(url, json=payload, headers=headers, timeout=60)
        data = response.json()
        results = data.get("results", [])

        if not results:
            return "No products found matching your search."

        output = f"Found {len(results)} product(s):\n\n"
        for idx, product in enumerate(results, 1):
            output += f"{idx}. {product.get('title', 'Unknown Product')}\n"
            output += f"   ID: {product.get('spu_id', 'N/A')}\n"
            output += f"   Category: {product.get('category_path', 'N/A')}\n"
            output += f"   Price: {product.get('price')}\n"
            output += "\n"

        return output.strip()
    except Exception as e:
        return f"Error searching products: {str(e)}"

4.2.2 视觉分析（VLM）
@tool
def analyze_image_style(image_path: str) -> str:
    """Analyze a fashion product image using AI vision to extract detailed style information."""
    with open(img_path, "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode("utf-8")

    prompt = """Analyze this fashion product image and provide a detailed description.
Include:
- Product type (e.g., shirt, dress, shoes, pants, bag)
- Primary colors
- Style/design (e.g., casual, formal, sporty, vintage, modern)
- Pattern or texture (e.g., plain, striped, checked, floral)
- Key features (e.g., collar type, sleeve length, fit)
- Material appearance (if obvious, e.g., denim, cotton, leather)
- Suitable occasion (e.g., office wear, party, casual, sports)
Provide a comprehensive yet concise description (3-4 sentences)."""

    client = get_openai_client()
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}", "detail": "high"}},
            ],
        }],
        max_tokens=500,
        temperature=0.3,
    )

    return response.choices[0].message.content.strip()


4.3 Streamlit 前端（app.py）
4.3.1 会话与 Agent 初始化
def initialize_session():
    if "session_id" not in st.session_state:
        st.session_state.session_id = str(uuid.uuid4())
    if "shopping_agent" not in st.session_state:
        st.session_state.shopping_agent = ShoppingAgent(session_id=st.session_state.session_id)
    if "messages" not in st.session_state:
        st.session_state.messages = []
    if "uploaded_image" not in st.session_state:
        st.session_state.uploaded_image = None

4.3.2 产品信息解析
def extract_products_from_response(response: str) -> list:
    """从 Agent 回复中解析产品信息"""
    products = []
    for line in response.split("\n"):
        if re.match(r"^\*?\*?\d+\.\s+", line):
            if current_product:
                products.append(current_product)
            current_product = {"name": re.sub(r"^\*?\*?\d+\.\s+", "", line).replace("**", "").strip()}
        elif "ID:" in line:
            id_match = re.search(r"(?:ID|id):\s*(\d+)", line)
            if id_match:
                current_product["id"] = id_match.group(1)
        elif "Category:" in line:
            cat_match = re.search(r"Category:\s*(.+?)(?:\n|$)", line)
            if cat_match:
                current_product["category"] = cat_match.group(1).strip()
        # ... Color, Gender, Season, Usage, Similarity/Relevance
    return products

4.3.3 多轮对话中的图片引用
# 用户输入 "make them formal" 时，若上一条消息有图片，则引用该图片
if any(ref in query_lower for ref in ["this", "that", "the image", "it"]):
    for msg in reversed(st.session_state.messages):
        if msg.get("role") == "user" and msg.get("image_path"):
            image_path = msg["image_path"]
            break


4.4 配置管理（config.py）
class Settings(BaseSettings):
    openai_api_key: str
    openai_model: str = "gpt-4o-mini"
    search_api_base_url: str = "http://120.76.41.98:6002"
    search_api_tenant_id: str = "162"

    class Config:
        env_file = ".env"


五、部署与运行
5.1 依赖服务

Search API：外部搜索服务（HTTP）
OpenAI API：LLM 与 VLM 图像分析

5.2 启动流程
# 1. 环境
pip install -r requirements.txt
cp .env.example .env  # 配置 OPENAI_API_KEY、SEARCH_API_* 等

# 2. （可选）下载数据
python scripts/download_dataset.py  # Kaggle Fashion Product Images Dataset

# 3. 启动应用
streamlit run app.py
# 或 ./scripts/start.sh


六、典型交互流程


场景
用户输入
Agent 行为
工具调用


文本搜索
"winter coats for women"
直接文本搜索
search_products("winter coats women")


风格分析+搜索
[上传复古夹克] "what style? find matching pants"
先分析风格再搜索
analyze_image_style(path) → search_products("vintage pants casual")


多轮上下文
[第1轮] "show me red dresses"[第2轮] "make them formal"
结合上下文
search_products("red formal dresses")


七、设计要点总结

ReAct 模式：Agent 自主决定何时调用工具、调用哪些工具、是否继续调用。
LangGraph 状态图：START → Agent → [条件] → Tools → Agent → END，支持多轮工具调用。
搜索与风格分析：Search API 文本搜索 + VLM 图像风格分析。
会话持久化：MemorySaver + thread_id 实现多轮对话记忆。
格式约束：System prompt 严格限制产品输出格式，便于前端解析和展示。


八、附录：项目结构
OmniShopAgent/
├── app/
│   ├── agents/
│   │   └── shopping_agent.py
│   ├── config.py
│   ├── services/
│   └── tools/
│       └── search_tools.py
├── scripts/
│   ├── download_dataset.py
│   └── index_data.py
├── app.py
├── docker-compose.yml
└── requirements.txt