# Quick Start Guide

## Prerequisites

1. **Python 3.8+**
2. **Elasticsearch 8.x** (running on localhost:9200 or remote)
3. **Optional**: CUDA-enabled GPU for faster embeddings

## Installation

### 1. Install Dependencies

```bash
cd /data/tw/SearchEngine
pip install -r requirements.txt
```

### 2. Set Environment Variables (Optional)

```bash
# Elasticsearch
export ES_HOST="http://localhost:9200"

# DeepL API (for translation)
export DEEPL_API_KEY="your-api-key-here"

# Customer ID
export CUSTOMER_ID="customer1"
```

## Running the System

### Option 1: Quick Test (Without Full Data)

```bash
# 1. Start Elasticsearch (if not running)
docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:8.11.0

# 2. Ingest sample data (100 documents for quick test)
cd data/customer1
python ingest_customer1.py \
  --limit 100 \
  --recreate-index \
  --es-host http://localhost:9200 \
  --skip-embeddings

# 3. Start API service
cd ../..
python -m api.app --host 0.0.0.0 --port 6002

# 4. Test search
curl -X POST http://localhost:6002/search/ \
  -H "Content-Type: application/json" \
  -d '{"query": "消防", "size": 5}'
```

### Option 2: Full System with Embeddings

```bash
# 1. Start Elasticsearch
docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms4g -Xmx4g" elasticsearch:8.11.0

# 2. Ingest full dataset with embeddings (requires GPU, takes ~10-30 min)
cd data/customer1
python ingest_customer1.py \
  --csv goods_with_pic.5years_congku.csv.shuf.1w \
  --recreate-index \
  --batch-size 100 \
  --es-host http://localhost:9200

# 3. Start API service
cd ../..
python -m api.app \
  --host 0.0.0.0 \
  --port 6002 \
  --customer customer1 \
  --es-host http://localhost:9200

# 4. Test various searches
# Simple search
curl -X POST http://localhost:6002/search/ \
  -H "Content-Type: application/json" \
  -d '{"query": "芭比娃娃", "size": 10}'

# Boolean search
curl -X POST http://localhost:6002/search/ \
  -H "Content-Type: application/json" \
  -d '{"query": "toy AND (barbie OR doll)", "size": 10}'

# With filters
curl -X POST http://localhost:6002/search/ \
  -H "Content-Type: application/json" \
  -d '{"query": "娃娃", "size": 10, "filters": {"categoryName_keyword": "芭比"}}'
```

## API Documentation

Once the service is running, visit:
- **Swagger UI**: http://localhost:6002/docs
- **ReDoc**: http://localhost:6002/redoc

## Common Issues

### Issue: Elasticsearch connection failed
**Solution**: Ensure Elasticsearch is running and accessible
```bash
curl http://localhost:9200
```

### Issue: Model download fails
**Solution**: Check internet connection, models are downloaded from Hugging Face/ModelScope
```bash
# Pre-download models (optional)
python -c "from embeddings import BgeEncoder; BgeEncoder()"
python -c "from embeddings import CLIPImageEncoder; CLIPImageEncoder()"
```

### Issue: Out of memory during embedding generation
**Solution**: Reduce batch size or skip embeddings initially
```bash
python ingest_customer1.py --skip-embeddings --limit 1000
```

### Issue: Translation not working
**Solution**: Set DeepL API key or translations will use mock mode (returns original text)
```bash
export DEEPL_API_KEY="your-key"
```

## Testing

### Test Health
```bash
curl http://localhost:6002/admin/health
```

### Test Configuration
```bash
curl http://localhost:6002/admin/config
```

### Test Index Stats
```bash
curl http://localhost:6002/admin/stats
```

### Test Search
```bash
# Chinese query (auto-translates to English/Russian)
curl -X POST http://localhost:6002/search/ \
  -H "Content-Type: application/json" \
  -d '{"query": "消防套", "size": 5}'

# English query
curl -X POST http://localhost:6002/search/ \
  -H "Content-Type: application/json" \
  -d '{"query": "fire control set", "size": 5}'

# Russian query
curl -X POST http://localhost:6002/search/ \
  -H "Content-Type: application/json" \
  -d '{"query": "Наборы для пожаротушения", "size": 5}'
```

## What's Next?

1. **Customize Configuration**: Edit `config/schema/customer1_config.yaml`
2. **Add More Data**: Ingest your own product data
3. **Tune Ranking**: Adjust ranking expression in config
4. **Add Rewrite Rules**: Update via API `/admin/rewrite-rules`
5. **Monitor Performance**: Check `/admin/stats` endpoint
6. **Scale**: Deploy to production with proper ES cluster

## Architecture Quick Reference

```
Query Flow:
User Query → QueryParser (normalize, rewrite, translate, embed)
         → Searcher (boolean parse, build ES query)
         → Elasticsearch (BM25 + KNN)
         → RankingEngine (custom scoring)
         → Results

Indexing Flow:
CSV Data → DataTransformer (field mapping, embeddings)
        → BulkIndexer (batch processing)
        → Elasticsearch
```

## Support

For issues or questions, refer to:
- **README.md**: Comprehensive documentation
- **IMPLEMENTATION_SUMMARY.md**: Technical details
- **CLAUDE.md**: Development guidelines
- **API Docs**: http://localhost:6002/docs