# 快速开始 - 新版本

## 🚀 5分钟快速上手

### 1. 安装依赖

```bash
cd /home/tw/recommendation
pip install -r requirements.txt
```

**新增依赖**: `elasticsearch>=8.0.0`

### 2. 测试ES连接

```bash
cd offline_tasks
python scripts/test_es_connection.py
```

如果看到 ✓ 表示测试通过。

### 3. 运行所有任务

```bash
python run_all.py
```

就这么简单！不需要任何参数。

### 4. 加载到Redis

```bash
python scripts/load_index_to_redis.py
```

## 📋 运行单个任务

### i2i相似索引

```bash
# Swing算法
python scripts/i2i_swing.py --lookback_days 30 --top_n 50 --time_decay

# Session W2V
python scripts/i2i_session_w2v.py --lookback_days 30 --top_n 50 --save_model

# DeepWalk
python scripts/i2i_deepwalk.py --lookback_days 30 --top_n 50 --save_model

# 内容相似（ES向量）- 无需参数！
python scripts/i2i_content_similar.py
```

### 兴趣聚合

```bash
python scripts/interest_aggregation.py --lookback_days 30 --top_n 1000
```

## 🎯 主要变化

### 简化！简化！简化！

#### 之前 (v1.0)
```bash
python run_all.py \
  --lookback_days 30 \
  --top_n 50 \
  --skip-interest \
  --only-content \
  --debug
```

#### 现在 (v2.0)
```bash
python run_all.py
# 或
python run_all.py --debug  # 启用debug模式
```

### 内容相似索引

#### 之前
- 1个索引: `i2i_content_hybrid_*.txt`
- 基于: 商品属性（分类、供应商等）
- 参数: `--method hybrid --top_n 50`

#### 现在
- **2个索引**: 
  - `i2i_content_name_*.txt` (名称向量)
  - `i2i_content_pic_*.txt` (图片向量)
- 基于: Elasticsearch深度学习向量
- 参数: **无需参数！**

## 📊 输出文件

### 文件位置
```
offline_tasks/output/
├── i2i_swing_20251017.txt          # Swing相似索引
├── i2i_session_w2v_20251017.txt    # Session W2V相似索引
├── i2i_deepwalk_20251017.txt       # DeepWalk相似索引
├── i2i_content_name_20251017.txt   # 名称向量相似索引 ⭐新
├── i2i_content_pic_20251017.txt    # 图片向量相似索引 ⭐新
├── interest_aggregation_hot_20251017.txt    # 热门商品
├── interest_aggregation_cart_20251017.txt   # 加购商品
├── interest_aggregation_new_20251017.txt    # 新品
└── interest_aggregation_global_20251017.txt # 全局热门
```

### 文件格式
```
item_id \t item_name \t similar_id1:score1,similar_id2:score2,...
```

## 🔍 查询示例

### Python查询

```python
import redis
import json

# 连接Redis
r = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)

# 1. 获取Swing相似商品
similar = json.loads(r.get('item:similar:swing:123456'))
# 返回: [[234567, 0.8523], [345678, 0.7842], ...]

# 2. 获取名称向量相似商品 ⭐新
similar = json.loads(r.get('item:similar:content_name:123456'))
# 返回: [[234567, 0.9234], [345678, 0.8756], ...]

# 3. 获取图片向量相似商品 ⭐新
similar = json.loads(r.get('item:similar:content_pic:123456'))
# 返回: [[567890, 0.8123], [678901, 0.7856], ...]

# 4. 获取热门商品
hot_items = json.loads(r.get('interest:hot:platform:PC'))
# 返回: [123456, 234567, 345678, ...]
```

### Redis CLI查询

```bash
# 连接Redis
redis-cli

# 查看Swing相似商品
GET item:similar:swing:123456

# 查看名称向量相似商品 ⭐新
GET item:similar:content_name:123456

# 查看图片向量相似商品 ⭐新
GET item:similar:content_pic:123456

# 查看热门商品
GET interest:hot:platform:PC
```

## ⚙️ 配置说明

### ES配置 (i2i_content_similar.py)

```python
ES_CONFIG = {
    'host': 'http://localhost:9200',
    'index_name': 'spu',
    'username': 'essa',
    'password': '4hOaLaf41y2VuI8y'
}
```

### 算法参数 (i2i_content_similar.py)

```python
TOP_N = 50          # 每个商品返回50个相似商品
KNN_K = 100         # KNN查询返回100个候选
KNN_CANDIDATES = 200  # 候选池大小200
```

### 全局配置 (offline_config.py)

```python
DEFAULT_LOOKBACK_DAYS = 30    # 回看天数
DEFAULT_I2I_TOP_N = 50        # i2i Top N
DEFAULT_INTEREST_TOP_N = 1000 # 兴趣聚合 Top N
```

## 🔧 故障排查

### ES连接失败

```bash
# 1. 检查ES是否运行
curl -u essa:4hOaLaf41y2VuI8y http://localhost:9200

# 2. 运行测试脚本
python scripts/test_es_connection.py

# 3. 检查配置
# 编辑 scripts/i2i_content_similar.py 中的 ES_CONFIG
```

### 商品ID不存在

测试脚本默认使用 `item_id = "3302275"`，如果不存在：

```python
# 编辑 test_es_connection.py
test_item_id = "你的商品ID"
```

### Redis连接失败

```bash
# 检查Redis配置
cat offline_tasks/config/offline_config.py | grep REDIS

# 测试Redis连接
redis-cli ping
```

### 文件不存在

```bash
# 检查output目录
ls -lh offline_tasks/output/

# 查看最新生成的文件
ls -lht offline_tasks/output/ | head -10
```

## 📚 详细文档

- **ES向量相似度**: `scripts/ES_VECTOR_SIMILARITY.md`
- **更新说明**: `CONTENT_SIMILARITY_UPDATE.md`
- **变更总结**: `CHANGES_SUMMARY.md`
- **Redis规范**: `REDIS_DATA_SPEC.md`

## 🎓 学习路径

### 新用户
1. 阅读本文档 ✓
2. 运行 `test_es_connection.py`
3. 运行 `run_all.py`
4. 查看 `output/` 目录
5. 加载到Redis并查询

### 进阶使用
1. 阅读 `ES_VECTOR_SIMILARITY.md`
2. 了解向量相似度原理
3. 优化ES查询性能
4. 自定义算法参数

### 开发者
1. 阅读 `CONTENT_SIMILARITY_UPDATE.md`
2. 了解技术架构
3. 阅读源代码注释
4. 贡献代码改进

## 🚨 注意事项

### ⚠️ 破坏性变化

1. **i2i_content_similar.py 参数全部改变**
   - 旧: `--method`, `--top_n`, `--debug`
   - 新: 无参数

2. **Redis Key格式改变**
   - 旧: `item:similar:content:{item_id}`
   - 新: `item:similar:content_name:{item_id}` 和 `item:similar:content_pic:{item_id}`

3. **输出文件改变**
   - 旧: `i2i_content_hybrid_*.txt`
   - 新: `i2i_content_name_*.txt` 和 `i2i_content_pic_*.txt`

### ✅ 向后兼容

- Swing、W2V、DeepWalk 算法不受影响
- 兴趣聚合不受影响
- Redis加载器向后兼容
- 其他i2i索引继续工作

## 💡 最佳实践

### 运行频率
- **行为相似** (Swing, W2V, DeepWalk): 每天
- **内容相似** (名称向量, 图片向量): 每周
- **兴趣聚合**: 每天

### Redis TTL
- **行为相似**: 7天
- **内容相似**: 30天
- **兴趣聚合**: 3-7天

### 性能优化
1. 使用 `--debug` 模式调试
2. 先用小数据集测试
3. 定期清理过期数据
4. 监控ES查询性能

## 🎉 总结

新版本大幅简化了使用，主要改进：

1. ✅ **无需参数**: `run_all.py` 和 `i2i_content_similar.py` 无需参数
2. ✅ **更强大**: 基于深度学习向量，更准确
3. ✅ **多维度**: 名称 + 图片两个维度
4. ✅ **更快**: ES KNN查询性能优秀
5. ✅ **易维护**: 代码简洁，配置清晰

开始使用新版本，享受更简单、更强大的推荐系统！

---

**问题反馈**: 如有问题请查看详细文档或联系开发团队