Blame view

README.md 3.76 KB
e7f2b240   tangwang   first commit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
  # OmniShopAgent
  
  An autonomous multi-modal fashion shopping agent powered by **LangGraph** and **ReAct pattern**.
  
  ## Demo
  
  πŸ“„ **[demo.pdf](./demo.pdf)**
  
  ## Overview
  
  OmniShopAgent autonomously decides which tools to call, maintains conversation state, and determines when to respond. Built with **LangGraph**, it uses agentic patterns for intelligent product discovery.
  
  **Key Features:**
  - Autonomous tool selection and execution
  - Multi-modal search (text + image)
  - Conversational context awareness
  - Real-time visual analysis 
  
  ## Tech Stack
  
  | Component | Technology |
  |-----------|-----------|
  | **Agent Framework** | LangGraph |
  | **LLM** | any LLM supported by LangChain |
  | **Text Embedding** | text-embedding-3-small |
  | **Image Embedding** | CLIP ViT-B/32 |
  | **Vector Database** | Milvus |
  | **Frontend** | Streamlit |
  | **Dataset** | Kaggle Fashion Products |
  
  ## Architecture
  
  **Agent Flow:**
  
  ```mermaid
  graph LR
      START --> Agent
      Agent -->|Has tool_calls| Tools
      Agent -->|No tool_calls| END
      Tools --> Agent
      
      subgraph "Agent Node"
          A[Receive Messages] --> B[LLM Reasoning]
          B --> C{Need Tools?}
          C -->|Yes| D[Generate tool_calls]
          C -->|No| E[Generate Response]
      end
      
      subgraph "Tool Node"
          F[Execute Tools] --> G[Return ToolMessage]
      end
  ```
  
  **Available Tools:**
  - `search_products(query)` - Text-based semantic search
  - `search_by_image(image_path)` - Visual similarity search  
  - `analyze_image_style(image_path)` - VLM style analysis
  
  
  
  ## Examples
  
  **Text Search:**
  ```
  User: "winter coats for women"
  Agent: search_products("winter coats women") β†’ Returns 5 products
  ```
  
  **Image Upload:**
  ```
  User: [uploads sneaker photo] "find similar"
  Agent: search_by_image(path) β†’ Returns visually similar shoes
  ```
  
  **Style Analysis + Search:**
  ```
  User: [uploads vintage jacket] "what style is this? find matching pants"
  Agent: analyze_image_style(path) β†’ "Vintage denim bomber..."
         search_products("vintage pants casual") β†’ Returns matching items
  ```
  
  **Multi-turn Context:**
  ```
  Turn 1: "show me red dresses"
  Agent: search_products("red dresses") β†’ Results
  
  Turn 2: "make them formal"
  Agent: [remembers context] β†’ search_products("red formal dresses") β†’ Results
  ```
  
  **Complex Reasoning:**
  ```
  User: [uploads office outfit] "I like the shirt but need something more casual"
  Agent: analyze_image_style(path) β†’ Extracts shirt details
         search_products("casual shirt [color] [style]") β†’ Returns casual alternatives
  ```
  
  ## Installation
  
  **Prerequisites:**
  - Python 3.12+ (LangChain 1.x 要求 Python 3.10+)
  - OpenAI API Key
  - Docker & Docker Compose
  
  ### 1. Setup Environment
  ```bash
  # Clone and install dependencies
  git clone <repository-url>
  cd OmniShopAgent
  python -m venv venv
  source venv/bin/activate  # Windows: venv\Scripts\activate
  pip install -r requirements.txt
  
  # Configure environment variables
  cp .env.example .env
  # Edit .env and add your OPENAI_API_KEY
  ```
  
  ### 2. Download Dataset
  Download the [Fashion Product Images Dataset](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) from Kaggle and extract to `./data/`:
  
  ```python
  python scripts/download_dataset.py
  ```
  
  Expected structure:
  ```
  data/
  β”œβ”€β”€ images/       # ~44k product images
  β”œβ”€β”€ styles.csv    # Product metadata
  └── images.csv    # Image filenames
  ```
  
  ### 3. Start Services
  
  ```bash
  docker-compose up
  python -m clip_server
  ```
  
  
  ### 4. Index Data 
  
  ```bash
  python scripts/index_data.py
  ```
  
  This generates and stores text/image embeddings for all 44k products in Milvus.
  
  ### 5. Launch Application
  ```bash
  # δ½Ώη”¨ε―εŠ¨θ„šζœ¬οΌˆζŽ¨θοΌ‰
  ./scripts/start.sh
  
  # ζˆ–η›΄ζŽ₯运葌
  streamlit run app.py
  ```
  Opens at `http://localhost:8501`
  
  ### CentOS 8 部署
  详见 [docs/DEPLOY_CENTOS8.md](docs/DEPLOY_CENTOS8.md)