README.md 3.76 KB

OmniShopAgent

An autonomous multi-modal fashion shopping agent powered by LangGraph and ReAct pattern.

Demo

πŸ“„ demo.pdf

Overview

OmniShopAgent autonomously decides which tools to call, maintains conversation state, and determines when to respond. Built with LangGraph, it uses agentic patterns for intelligent product discovery.

Key Features:

  • Autonomous tool selection and execution
  • Multi-modal search (text + image)
  • Conversational context awareness
  • Real-time visual analysis

Tech Stack

Component Technology
Agent Framework LangGraph
LLM any LLM supported by LangChain
Text Embedding text-embedding-3-small
Image Embedding CLIP ViT-B/32
Vector Database Milvus
Frontend Streamlit
Dataset Kaggle Fashion Products

Architecture

Agent Flow:

graph LR
    START --> Agent
    Agent -->|Has tool_calls| Tools
    Agent -->|No tool_calls| END
    Tools --> Agent

    subgraph "Agent Node"
        A[Receive Messages] --> B[LLM Reasoning]
        B --> C{Need Tools?}
        C -->|Yes| D[Generate tool_calls]
        C -->|No| E[Generate Response]
    end

    subgraph "Tool Node"
        F[Execute Tools] --> G[Return ToolMessage]
    end

Available Tools:

  • search_products(query) - Text-based semantic search
  • search_by_image(image_path) - Visual similarity search
  • analyze_image_style(image_path) - VLM style analysis

Examples

Text Search:

User: "winter coats for women"
Agent: search_products("winter coats women") β†’ Returns 5 products

Image Upload:

User: [uploads sneaker photo] "find similar"
Agent: search_by_image(path) β†’ Returns visually similar shoes

Style Analysis + Search:

User: [uploads vintage jacket] "what style is this? find matching pants"
Agent: analyze_image_style(path) β†’ "Vintage denim bomber..."
       search_products("vintage pants casual") β†’ Returns matching items

Multi-turn Context:

Turn 1: "show me red dresses"
Agent: search_products("red dresses") β†’ Results

Turn 2: "make them formal"
Agent: [remembers context] β†’ search_products("red formal dresses") β†’ Results

Complex Reasoning:

User: [uploads office outfit] "I like the shirt but need something more casual"
Agent: analyze_image_style(path) β†’ Extracts shirt details
       search_products("casual shirt [color] [style]") β†’ Returns casual alternatives

Installation

Prerequisites:

  • Python 3.12+ (LangChain 1.x 要求 Python 3.10+)
  • OpenAI API Key
  • Docker & Docker Compose

1. Setup Environment

# Clone and install dependencies
git clone <repository-url>
cd OmniShopAgent
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

2. Download Dataset

Download the Fashion Product Images Dataset from Kaggle and extract to ./data/:

python scripts/download_dataset.py

Expected structure:

data/
β”œβ”€β”€ images/       # ~44k product images
β”œβ”€β”€ styles.csv    # Product metadata
└── images.csv    # Image filenames

3. Start Services

docker-compose up
python -m clip_server

4. Index Data

python scripts/index_data.py

This generates and stores text/image embeddings for all 44k products in Milvus.

5. Launch Application

# δ½Ώη”¨ε―εŠ¨θ„šζœ¬οΌˆζŽ¨θοΌ‰
./scripts/start.sh

# ζˆ–η›΄ζŽ₯运葌
streamlit run app.py

Opens at http://localhost:8501

CentOS 8 部署

详见 docs/DEPLOY_CENTOS8.md