README.md 3.76 KB
Edit Raw Blame History


OmniShopAgent
An autonomous multi-modal fashion shopping agent powered by LangGraph and ReAct pattern.
Demo
📄 demo.pdf
Overview
OmniShopAgent autonomously decides which tools to call, maintains conversation state, and determines when to respond. Built with LangGraph, it uses agentic patterns for intelligent product discovery.

Key Features:


Autonomous tool selection and execution
Multi-modal search (text + image)
Conversational context awareness
Real-time visual analysis 

Tech Stack


Component
Technology


Agent Framework
LangGraph


LLM
any LLM supported by LangChain


Text Embedding
text-embedding-3-small


Image Embedding
CLIP ViT-B/32


Vector Database
Milvus


Frontend
Streamlit


Dataset
Kaggle Fashion Products


Architecture
Agent Flow:
graph LR
    START --> Agent
    Agent -->|Has tool_calls| Tools
    Agent -->|No tool_calls| END
    Tools --> Agent

    subgraph "Agent Node"
        A[Receive Messages] --> B[LLM Reasoning]
        B --> C{Need Tools?}
        C -->|Yes| D[Generate tool_calls]
        C -->|No| E[Generate Response]
    end

    subgraph "Tool Node"
        F[Execute Tools] --> G[Return ToolMessage]
    end


Available Tools:


search_products(query) - Text-based semantic search
search_by_image(image_path) - Visual similarity search

analyze_image_style(image_path) - VLM style analysis

Examples
Text Search:
User: "winter coats for women"
Agent: search_products("winter coats women") → Returns 5 products


Image Upload:
User: [uploads sneaker photo] "find similar"
Agent: search_by_image(path) → Returns visually similar shoes


Style Analysis + Search:
User: [uploads vintage jacket] "what style is this? find matching pants"
Agent: analyze_image_style(path) → "Vintage denim bomber..."
       search_products("vintage pants casual") → Returns matching items


Multi-turn Context:
Turn 1: "show me red dresses"
Agent: search_products("red dresses") → Results

Turn 2: "make them formal"
Agent: [remembers context] → search_products("red formal dresses") → Results


Complex Reasoning:
User: [uploads office outfit] "I like the shirt but need something more casual"
Agent: analyze_image_style(path) → Extracts shirt details
       search_products("casual shirt [color] [style]") → Returns casual alternatives

Installation
Prerequisites:


Python 3.12+ (LangChain 1.x 要求 Python 3.10+)
OpenAI API Key
Docker & Docker Compose

1. Setup Environment
# Clone and install dependencies
git clone <repository-url>
cd OmniShopAgent
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

2. Download Dataset
Download the Fashion Product Images Dataset from Kaggle and extract to ./data/:
python scripts/download_dataset.py


Expected structure:
data/
├── images/       # ~44k product images
├── styles.csv    # Product metadata
└── images.csv    # Image filenames

3. Start Services
docker-compose up
python -m clip_server

4. Index Data
python scripts/index_data.py


This generates and stores text/image embeddings for all 44k products in Milvus.
5. Launch Application
# 使用启动脚本（推荐）
./scripts/start.sh

# 或直接运行
streamlit run app.py


Opens at http://localhost:8501
CentOS 8 部署
详见 docs/DEPLOY_CENTOS8.md