diff --git a/.cursor/plans/所有租户共用一套统一配置.tenantID只在请求层级.服务层级没有tenantID相关的独立配置.md b/.cursor/plans/所有租户共用一套统一配置.tenantID只在请求层级.服务层级没有tenantID相关的独立配置.md
new file mode 100644
index 0000000..e427dfb
--- /dev/null
+++ b/.cursor/plans/所有租户共用一套统一配置.tenantID只在请求层级.服务层级没有tenantID相关的独立配置.md
@@ -0,0 +1,342 @@
+
+# 多租户架构重构计划
+
+## 概述
+
+将搜索服务从按租户启动改造为真正的多租户架构:
+
+- 服务启动时不指定租户ID,所有租户共用一套配置
+- 删除customer1配置,去掉base层级,统一为config.yaml
+- 统一脚本接口:启动、停止、重启、数据灌入
+- 统一数据灌入流程,ES只有一份索引
+- 前端支持在搜索框左侧输入租户ID
+
+## Phase 1: 配置文件体系重构
+
+### 1.1 创建统一配置文件
+
+**文件**: `config/config.yaml` (NEW)
+
+- 将 `config/schema/base/config.yaml` 移动到 `config/config.yaml`
+- 删除 `customer_name` 字段(不再需要)
+- 删除 `customer_id` 相关逻辑
+- 固定索引名称为 `search_products`
+- 确保包含 `tenant_id` 字段(必需)
+
+### 1.2 删除customer1配置
+
+**删除文件**:
+
+- `config/schema/customer1/config.yaml`
+- `config/schema/customer1/` 目录(如果为空)
+
+### 1.3 更新ConfigLoader
+
+**文件**: `config/config_loader.py`
+
+修改 `load_customer_config()` 方法:
+
+- 移除 `customer_id` 参数
+- 改为 `load_config()` 方法
+- 直接加载 `config/config.yaml`
+- 移除对 `config/schema/{customer_id}/config.yaml` 的查找逻辑
+- 移除 `customer_id` 字段验证
+- 更新 `CustomerConfig` 类:移除 `customer_id` 字段
+
+### 1.4 更新配置验证
+
+**文件**: `config/config_loader.py`
+
+修改 `validate_config()` 方法:
+
+- 确保 `tenant_id` 字段存在且为必需
+- 移除对 `customer_id` 的验证
+
+## Phase 2: 服务启动改造
+
+### 2.1 更新API应用初始化
+
+**文件**: `api/app.py`
+
+修改 `init_service()` 方法:
+
+- 移除 `customer_id` 参数
+- 直接加载统一配置(`config/config.yaml`)
+- 移除 `CUSTOMER_ID` 环境变量依赖
+- 更新日志输出(不再显示customer_id)
+
+修改 `startup_event()` 方法:
+
+- 移除 `CUSTOMER_ID` 环境变量读取
+- 直接调用 `init_service()` 不传参数
+
+### 2.2 更新main.py
+
+**文件**: `main.py`
+
+修改 `cmd_serve()` 方法:
+
+- 移除 `--customer` 参数
+- 移除 `CUSTOMER_ID` 环境变量设置
+- 更新帮助信息
+
+### 2.3 更新启动脚本
+
+**文件**: `scripts/start_backend.sh`
+
+修改:
+
+- 移除 `CUSTOMER_ID` 环境变量
+- 移除 `--customer` 参数
+- 简化启动命令
+
+**文件**: `scripts/start_servers.py`
+
+修改 `start_api_server()` 方法:
+
+- 移除 `customer` 参数
+- 移除 `CUSTOMER_ID` 环境变量设置
+- 简化启动命令
+
+## Phase 3: 脚本体系统一
+
+### 3.1 创建统一启动脚本
+
+**文件**: `scripts/start.sh` (NEW)
+
+功能:
+
+- 启动后端服务(调用 `scripts/start_backend.sh`)
+- 启动前端服务(调用 `scripts/start_frontend.sh`)
+- 等待服务就绪
+- 显示服务状态和访问地址
+
+### 3.2 创建统一停止脚本
+
+**文件**: `scripts/stop.sh` (已存在,需更新)
+
+功能:
+
+- 停止后端服务(端口6002)
+- 停止前端服务(端口6003)
+- 清理PID文件
+- 显示停止状态
+
+### 3.3 创建统一重启脚本
+
+**文件**: `scripts/restart.sh` (已存在,需更新)
+
+功能:
+
+- 调用 `scripts/stop.sh` 停止服务
+- 等待服务完全停止
+- 调用 `scripts/start.sh` 启动服务
+
+### 3.4 创建数据灌入脚本
+
+**文件**: `scripts/ingest.sh` (已存在,需更新)
+
+功能:
+
+- 从MySQL读取数据
+- 转换数据格式(统一处理base和customer1数据源)
+- 灌入到ES索引 `search_products`
+- 支持指定租户ID过滤数据
+- 自动处理字段映射:缺失字段随机生成,多余字段忽略
+
+### 3.5 创建Mock数据脚本
+
+**文件**: `scripts/mock_data.sh` (NEW)
+
+功能:
+
+- 生成测试数据到MySQL
+- 支持指定租户ID
+- 支持指定数据量
+- 调用 `scripts/generate_test_data.py` 和 `scripts/import_test_data.py`
+
+### 3.6 更新根目录脚本
+
+**文件**: `run.sh` (已存在,需更新)
+
+功能:
+
+- 调用 `scripts/start.sh` 启动服务
+
+**文件**: `restart.sh` (已存在,需更新)
+
+功能:
+
+- 调用 `scripts/restart.sh` 重启服务
+
+**文件**: `setup.sh` (已存在,需更新)
+
+功能:
+
+- 设置环境
+- 检查依赖
+- 不包含服务启动逻辑
+
+**文件**: `test_all.sh` (已存在,需更新)
+
+功能:
+
+- 运行完整测试流程
+- 包含数据灌入、服务启动、API测试
+
+### 3.7 清理废弃脚本
+
+**删除文件**:
+
+- `scripts/demo_base.sh`
+- `scripts/stop_base.sh`
+- `scripts/start_test_environment.sh`
+- `scripts/stop_test_environment.sh`
+- 其他不再需要的脚本
+
+## Phase 4: 数据灌入统一
+
+### 4.1 更新数据灌入脚本
+
+**文件**: `scripts/ingest_shoplazza.py`
+
+修改:
+
+- 移除 `--config` 参数(不再需要)
+- 直接加载统一配置(`config/config.yaml`)
+- 统一处理所有数据源(不再区分base和customer1)
+- 支持 `--tenant-id` 参数过滤数据
+- 字段映射逻辑:
+- 如果字段在配置中但数据源中没有,随机生成
+- 如果字段在数据源中但配置中没有,忽略
+- 确保 `tenant_id` 字段正确设置
+
+### 4.2 更新数据转换器
+
+**文件**: `indexer/spu_transformer.py`
+
+修改:
+
+- 移除对配置中 `customer_id` 的依赖
+- 统一处理所有数据源
+- 确保字段映射正确(缺失字段随机生成,多余字段忽略)
+
+### 4.3 统一测试数据生成
+
+**文件**: `scripts/generate_test_data.py`
+
+修改:
+
+- 支持生成符合统一索引结构的测试数据
+- 支持指定租户ID
+- 确保生成的数据包含所有必需字段
+
+## Phase 5: 前端改造
+
+### 5.1 更新前端HTML
+
+**文件**: `frontend/index.html`
+
+修改:
+
+- 在搜索框左侧添加租户ID输入框
+- 添加租户ID标签
+- 更新布局样式
+
+### 5.2 更新前端JavaScript
+
+**文件**: `frontend/static/js/app_base.js`
+
+修改:
+
+- 移除硬编码的 `TENANT_ID = '1'`
+- 从输入框读取租户ID
+- 在搜索请求中发送租户ID(通过 `X-Tenant-ID` header)
+- 添加租户ID验证(不能为空)
+- 更新UI显示
+
+### 5.3 更新前端CSS
+
+**文件**: `frontend/static/css/style.css`
+
+修改:
+
+- 添加租户ID输入框样式
+- 更新搜索栏布局(支持租户ID输入框)
+
+## Phase 6: 更新文档和测试
+
+### 6.1 更新README
+
+**文件**: `README.md`
+
+修改:
+
+- 更新启动说明(不再需要指定租户ID)
+- 更新配置说明(统一配置文件)
+- 更新脚本使用说明
+
+### 6.2 更新API文档
+
+**文件**: `API_DOCUMENTATION.md`
+
+修改:
+
+- 更新租户ID说明(必须通过请求提供)
+- 更新配置说明(统一配置)
+
+### 6.3 更新测试脚本
+
+**文件**: `test_all.sh`
+
+修改:
+
+- 更新测试流程(不再需要指定租户ID)
+- 更新数据灌入测试(统一数据源)
+- 更新API测试(包含租户ID参数)
+
+## Phase 7: 清理和验证
+
+### 7.1 清理废弃代码
+
+- 删除所有对 `customer_id` 的引用
+- 删除所有对 `customer1` 配置的引用
+- 删除所有对 `base` 配置层级的引用
+- 清理不再使用的脚本
+
+### 7.2 验证功能
+
+- 验证服务启动(不指定租户ID)
+- 验证配置加载(统一配置)
+- 验证数据灌入(统一数据源)
+- 验证搜索功能(通过请求提供租户ID)
+- 验证前端功能(租户ID输入)
+
+## 关键文件清单
+
+### 需要修改的文件:
+
+1. `config/config_loader.py` - 移除customer_id逻辑
+2. `config/config.yaml` - 统一配置文件(从base移动)
+3. `api/app.py` - 移除customer_id参数
+4. `main.py` - 移除customer参数
+5. `scripts/start_backend.sh` - 移除CUSTOMER_ID
+6. `scripts/start_servers.py` - 移除customer参数
+7. `scripts/ingest_shoplazza.py` - 统一数据灌入
+8. `frontend/index.html` - 添加租户ID输入框
+9. `frontend/static/js/app_base.js` - 读取租户ID
+10. `run.sh`, `restart.sh`, `setup.sh`, `test_all.sh` - 更新脚本
+
+### 需要删除的文件:
+
+1. `config/schema/customer1/config.yaml`
+2. `config/schema/customer1/` 目录
+3. `scripts/demo_base.sh`
+4. `scripts/stop_base.sh`
+5. 其他废弃脚本
+
+### 需要创建的文件:
+
+1. `config/config.yaml` - 统一配置文件
+2. `scripts/start.sh` - 统一启动脚本
+3. `scripts/mock_data.sh` - Mock数据脚本
\ No newline at end of file
diff --git a/api/app.py b/api/app.py
index fda892d..16eab02 100644
--- a/api/app.py
+++ b/api/app.py
@@ -51,28 +51,27 @@ _searcher: Optional[Searcher] = None
_query_parser: Optional[QueryParser] = None
-def init_service(customer_id: str = "customer1", es_host: str = "http://localhost:9200"):
+def init_service(es_host: str = "http://localhost:9200"):
"""
- Initialize search service with configuration.
+ Initialize search service with unified configuration.
Args:
- customer_id: Customer configuration ID
es_host: Elasticsearch host URL
"""
global _config, _es_client, _searcher, _query_parser
- print(f"Initializing search service for customer: {customer_id}")
+ print("Initializing search service (multi-tenant)")
- # Load configuration
- config_loader = ConfigLoader("config/schema")
- _config = config_loader.load_customer_config(customer_id)
+ # Load unified configuration
+ config_loader = ConfigLoader("config/config.yaml")
+ _config = config_loader.load_config()
# Validate configuration
errors = config_loader.validate_config(_config)
if errors:
raise ValueError(f"Configuration validation failed: {errors}")
- print(f"Configuration loaded: {_config.customer_name}")
+ print(f"Configuration loaded: {_config.es_index_name}")
# Get ES credentials from environment variables or .env file
es_username = os.getenv('ES_USERNAME')
@@ -113,7 +112,7 @@ def init_service(customer_id: str = "customer1", es_host: str = "http://localhos
def get_config() -> CustomerConfig:
- """Get customer configuration."""
+ """Get search engine configuration."""
if _config is None:
raise RuntimeError("Service not initialized")
return _config
@@ -184,15 +183,13 @@ app.add_middleware(
@app.on_event("startup")
async def startup_event():
"""Initialize service on startup."""
- customer_id = os.getenv("CUSTOMER_ID", "customer1")
es_host = os.getenv("ES_HOST", "http://localhost:9200")
- logger.info(f"Starting E-Commerce Search API")
- logger.info(f"Customer ID: {customer_id}")
+ logger.info("Starting E-Commerce Search API (Multi-Tenant)")
logger.info(f"Elasticsearch Host: {es_host}")
try:
- init_service(customer_id=customer_id, es_host=es_host)
+ init_service(es_host=es_host)
logger.info("Service initialized successfully")
except Exception as e:
logger.error(f"Failed to initialize service: {e}")
@@ -310,16 +307,14 @@ else:
if __name__ == "__main__":
import uvicorn
- parser = argparse.ArgumentParser(description='Start search API service')
+ parser = argparse.ArgumentParser(description='Start search API service (multi-tenant)')
parser.add_argument('--host', default='0.0.0.0', help='Host to bind to')
parser.add_argument('--port', type=int, default=6002, help='Port to bind to')
- parser.add_argument('--customer', default='customer1', help='Customer ID')
parser.add_argument('--es-host', default='http://localhost:9200', help='Elasticsearch host')
parser.add_argument('--reload', action='store_true', help='Enable auto-reload')
args = parser.parse_args()
- # Set environment variables
- os.environ['CUSTOMER_ID'] = args.customer
+ # Set environment variable
os.environ['ES_HOST'] = args.es_host
# Run server
diff --git a/api/models.py b/api/models.py
index 9998da7..59de609 100644
--- a/api/models.py
+++ b/api/models.py
@@ -250,7 +250,6 @@ class HealthResponse(BaseModel):
"""Health check response model."""
status: str = Field(..., description="Service status")
elasticsearch: str = Field(..., description="Elasticsearch status")
- customer_id: str = Field(..., description="Customer configuration ID")
class ErrorResponse(BaseModel):
diff --git a/api/routes/admin.py b/api/routes/admin.py
index 8802156..1b889ac 100644
--- a/api/routes/admin.py
+++ b/api/routes/admin.py
@@ -28,15 +28,13 @@ async def health_check():
return HealthResponse(
status="healthy" if es_status == "connected" else "unhealthy",
- elasticsearch=es_status,
- customer_id=config.customer_id
+ elasticsearch=es_status
)
except Exception as e:
return HealthResponse(
status="unhealthy",
- elasticsearch="error",
- customer_id="unknown"
+ elasticsearch="error"
)
@@ -51,8 +49,6 @@ async def get_configuration():
config = get_config()
return {
- "customer_id": config.customer_id,
- "customer_name": config.customer_name,
"es_index_name": config.es_index_name,
"num_fields": len(config.fields),
"num_indexes": len(config.indexes),
diff --git a/config/config.yaml b/config/config.yaml
new file mode 100644
index 0000000..509a5e5
--- /dev/null
+++ b/config/config.yaml
@@ -0,0 +1,269 @@
+# Unified Configuration for Multi-Tenant Search Engine
+# 统一配置文件,所有租户共用一套索引配置
+# 注意:此配置不包含MySQL相关配置,只包含ES搜索相关配置
+
+# Elasticsearch Index
+es_index_name: "search_products"
+
+# ES Index Settings
+es_settings:
+ number_of_shards: 1
+ number_of_replicas: 0
+ refresh_interval: "30s"
+
+# Field Definitions (SPU级别,只包含对搜索有帮助的字段)
+fields:
+ # 租户隔离字段(必需)
+ - name: "tenant_id"
+ type: "KEYWORD"
+ required: true
+ index: true
+ store: true
+
+ # 商品标识字段
+ - name: "product_id"
+ type: "KEYWORD"
+ required: true
+ index: true
+ store: true
+
+ - name: "handle"
+ type: "KEYWORD"
+ index: true
+ store: true
+
+ # 文本搜索字段
+ - name: "title"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 3.0
+ index: true
+ store: true
+
+ - name: "brief"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 1.5
+ index: true
+ store: true
+
+ - name: "description"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 1.0
+ index: true
+ store: true
+
+ # SEO字段(提升相关性)
+ - name: "seo_title"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 2.0
+ index: true
+ store: true
+
+ - name: "seo_description"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 1.5
+ index: true
+ store: true
+
+ - name: "seo_keywords"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 2.0
+ index: true
+ store: true
+
+ # 分类和标签字段(TEXT + KEYWORD双重索引)
+ - name: "vendor"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 1.5
+ index: true
+ store: true
+
+ - name: "vendor_keyword"
+ type: "KEYWORD"
+ index: true
+ store: false
+
+ - name: "product_type"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 1.5
+ index: true
+ store: true
+
+ - name: "product_type_keyword"
+ type: "KEYWORD"
+ index: true
+ store: false
+
+ - name: "tags"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 1.0
+ index: true
+ store: true
+
+ - name: "tags_keyword"
+ type: "KEYWORD"
+ index: true
+ store: false
+
+ - name: "category"
+ type: "TEXT"
+ analyzer: "chinese_ecommerce"
+ boost: 1.5
+ index: true
+ store: true
+
+ - name: "category_keyword"
+ type: "KEYWORD"
+ index: true
+ store: false
+
+ # 价格字段(扁平化)
+ - name: "min_price"
+ type: "FLOAT"
+ index: true
+ store: true
+
+ - name: "max_price"
+ type: "FLOAT"
+ index: true
+ store: true
+
+ - name: "compare_at_price"
+ type: "FLOAT"
+ index: true
+ store: true
+
+ # 图片字段(用于显示,不参与搜索)
+ - name: "image_url"
+ type: "KEYWORD"
+ index: false
+ store: true
+
+ # 嵌套variants字段
+ - name: "variants"
+ type: "JSON"
+ nested: true
+ nested_properties:
+ variant_id:
+ type: "keyword"
+ index: true
+ store: true
+ title:
+ type: "text"
+ analyzer: "chinese_ecommerce"
+ index: true
+ store: true
+ price:
+ type: "float"
+ index: true
+ store: true
+ compare_at_price:
+ type: "float"
+ index: true
+ store: true
+ sku:
+ type: "keyword"
+ index: true
+ store: true
+ stock:
+ type: "long"
+ index: true
+ store: true
+ options:
+ type: "object"
+ enabled: true
+
+# Index Structure (Query Domains)
+indexes:
+ - name: "default"
+ label: "默认索引"
+ fields:
+ - "title"
+ - "brief"
+ - "description"
+ - "seo_title"
+ - "seo_description"
+ - "seo_keywords"
+ - "vendor"
+ - "product_type"
+ - "tags"
+ - "category"
+ analyzer: "chinese_ecommerce"
+ boost: 1.0
+
+ - name: "title"
+ label: "标题索引"
+ fields:
+ - "title"
+ - "seo_title"
+ analyzer: "chinese_ecommerce"
+ boost: 2.0
+
+ - name: "vendor"
+ label: "品牌索引"
+ fields:
+ - "vendor"
+ analyzer: "chinese_ecommerce"
+ boost: 1.5
+
+ - name: "category"
+ label: "类目索引"
+ fields:
+ - "category"
+ analyzer: "chinese_ecommerce"
+ boost: 1.5
+
+ - name: "tags"
+ label: "标签索引"
+ fields:
+ - "tags"
+ - "seo_keywords"
+ analyzer: "chinese_ecommerce"
+ boost: 1.0
+
+# Query Configuration
+query_config:
+ supported_languages:
+ - "zh"
+ - "en"
+ default_language: "zh"
+ enable_translation: true
+ enable_text_embedding: true
+ enable_query_rewrite: true
+
+ # Translation API (DeepL)
+ translation_service: "deepl"
+ translation_api_key: null # Set via environment variable
+
+# Ranking Configuration
+ranking:
+ expression: "bm25() + 0.2*text_embedding_relevance()"
+ description: "BM25 text relevance combined with semantic embedding similarity"
+
+# Function Score配置(ES层打分规则)
+function_score:
+ score_mode: "sum"
+ boost_mode: "multiply"
+
+ functions: []
+
+# Rerank配置(本地重排,当前禁用)
+rerank:
+ enabled: false
+ expression: ""
+ description: "Local reranking (disabled, use ES function_score instead)"
+
+# SPU配置(已启用,使用嵌套variants)
+spu_config:
+ enabled: true
+ spu_field: "product_id"
+ inner_hits_size: 10
+
diff --git a/config/config_loader.py b/config/config_loader.py
index 9b2eb92..40c19b0 100644
--- a/config/config_loader.py
+++ b/config/config_loader.py
@@ -86,10 +86,7 @@ class RerankConfig:
@dataclass
class CustomerConfig:
- """Complete configuration for a customer."""
- customer_id: str
- customer_name: str
-
+ """Complete configuration for search engine (multi-tenant)."""
# Field definitions
fields: List[FieldConfig]
@@ -122,22 +119,20 @@ class ConfigurationError(Exception):
class ConfigLoader:
- """Loads and validates customer configurations from YAML files."""
+ """Loads and validates unified search engine configuration from YAML file."""
- def __init__(self, config_dir: str = "config/schema"):
- self.config_dir = Path(config_dir)
+ def __init__(self, config_file: str = "config/config.yaml"):
+ self.config_file = Path(config_file)
- def _load_rewrite_dictionary(self, customer_id: str) -> Dict[str, str]:
+ def _load_rewrite_dictionary(self) -> Dict[str, str]:
"""
Load query rewrite dictionary from external file.
- Args:
- customer_id: Customer identifier
-
Returns:
Dictionary mapping query terms to rewritten queries
"""
- dict_file = self.config_dir / customer_id / "query_rewrite.dict"
+ # Try config/query_rewrite.dict first
+ dict_file = self.config_file.parent / "query_rewrite.dict"
if not dict_file.exists():
# Dictionary file is optional, return empty dict if not found
@@ -166,16 +161,9 @@ class ConfigLoader:
return rewrite_dict
- def load_customer_config(self, customer_id: str) -> CustomerConfig:
+ def load_config(self) -> CustomerConfig:
"""
- Load customer configuration from YAML file.
-
- Supports two directory structures:
- 1. New structure: config/schema/{customer_id}/config.yaml
- 2. Old structure: config/schema/{customer_id}_config.yaml (for backward compatibility)
-
- Args:
- customer_id: Customer identifier (used to find config file)
+ Load unified configuration from YAML file.
Returns:
CustomerConfig object
@@ -183,25 +171,18 @@ class ConfigLoader:
Raises:
ConfigurationError: If config file not found or invalid
"""
- # Try new directory structure first
- config_file = self.config_dir / customer_id / "config.yaml"
-
- # Fall back to old structure if new one doesn't exist
- if not config_file.exists():
- config_file = self.config_dir / f"{customer_id}_config.yaml"
-
- if not config_file.exists():
- raise ConfigurationError(f"Configuration file not found: {config_file}")
+ if not self.config_file.exists():
+ raise ConfigurationError(f"Configuration file not found: {self.config_file}")
try:
- with open(config_file, 'r', encoding='utf-8') as f:
+ with open(self.config_file, 'r', encoding='utf-8') as f:
config_data = yaml.safe_load(f)
except yaml.YAMLError as e:
- raise ConfigurationError(f"Invalid YAML in {config_file}: {e}")
+ raise ConfigurationError(f"Invalid YAML in {self.config_file}: {e}")
- return self._parse_config(config_data, customer_id)
+ return self._parse_config(config_data)
- def _parse_config(self, config_data: Dict[str, Any], customer_id: str) -> CustomerConfig:
+ def _parse_config(self, config_data: Dict[str, Any]) -> CustomerConfig:
"""Parse configuration dictionary into CustomerConfig object."""
# Parse fields
@@ -218,7 +199,7 @@ class ConfigLoader:
query_config_data = config_data.get("query_config", {})
# Load rewrite dictionary from external file instead of config
- rewrite_dictionary = self._load_rewrite_dictionary(customer_id)
+ rewrite_dictionary = self._load_rewrite_dictionary()
query_config = QueryConfig(
supported_languages=query_config_data.get("supported_languages", ["zh", "en"]),
@@ -263,8 +244,6 @@ class ConfigLoader:
)
return CustomerConfig(
- customer_id=customer_id,
- customer_name=config_data.get("customer_name", customer_id),
fields=fields,
indexes=indexes,
query_config=query_config,
@@ -272,7 +251,7 @@ class ConfigLoader:
function_score=function_score,
rerank=rerank,
spu_config=spu_config,
- es_index_name=config_data.get("es_index_name", f"search_{customer_id}"),
+ es_index_name=config_data.get("es_index_name", "search_products"),
es_settings=config_data.get("es_settings", {})
)
@@ -430,23 +409,21 @@ class ConfigLoader:
def save_config(self, config: CustomerConfig, output_path: Optional[str] = None) -> None:
"""
- Save customer configuration to YAML file.
+ Save configuration to YAML file.
Note: rewrite_dictionary is saved separately to query_rewrite.dict file
Args:
config: Configuration to save
- output_path: Optional output path (defaults to new directory structure)
+ output_path: Optional output path (defaults to config/config.yaml)
"""
if output_path is None:
- # Use new directory structure by default
- customer_dir = self.config_dir / config.customer_id
- customer_dir.mkdir(parents=True, exist_ok=True)
- output_path = customer_dir / "config.yaml"
+ output_path = self.config_file
+ else:
+ output_path = Path(output_path)
# Convert config back to dictionary format
config_dict = {
- "customer_name": config.customer_name,
"es_index_name": config.es_index_name,
"es_settings": config.es_settings,
"fields": [self._field_to_dict(field) for field in config.fields],
@@ -482,23 +459,22 @@ class ConfigLoader:
}
}
+ output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
yaml.dump(config_dict, f, default_flow_style=False, allow_unicode=True)
# Save rewrite dictionary to separate file
- self._save_rewrite_dictionary(config.customer_id, config.query_config.rewrite_dictionary)
+ self._save_rewrite_dictionary(config.query_config.rewrite_dictionary)
- def _save_rewrite_dictionary(self, customer_id: str, rewrite_dict: Dict[str, str]) -> None:
+ def _save_rewrite_dictionary(self, rewrite_dict: Dict[str, str]) -> None:
"""
Save rewrite dictionary to external file.
Args:
- customer_id: Customer identifier
rewrite_dict: Dictionary to save
"""
- customer_dir = self.config_dir / customer_id
- customer_dir.mkdir(parents=True, exist_ok=True)
- dict_file = customer_dir / "query_rewrite.dict"
+ dict_file = self.config_file.parent / "query_rewrite.dict"
+ dict_file.parent.mkdir(parents=True, exist_ok=True)
with open(dict_file, 'w', encoding='utf-8') as f:
for key, value in rewrite_dict.items():
diff --git a/config/query_rewrite.dict b/config/query_rewrite.dict
new file mode 100644
index 0000000..8e5ce37
--- /dev/null
+++ b/config/query_rewrite.dict
@@ -0,0 +1,4 @@
+芭比 brand:芭比 OR name:芭比娃娃
+玩具 category:玩具
+消防 category:消防 OR name:消防
+
diff --git a/frontend/index.html b/frontend/index.html
index ba1afbc..53fe9f8 100644
--- a/frontend/index.html
+++ b/frontend/index.html
@@ -21,6 +21,10 @@
+
+
+
+
diff --git a/frontend/static/css/style.css b/frontend/static/css/style.css
index 7c426b4..c9b9ba7 100644
--- a/frontend/static/css/style.css
+++ b/frontend/static/css/style.css
@@ -69,6 +69,32 @@ body {
padding: 20px 30px;
background: white;
border-bottom: 1px solid #e0e0e0;
+ align-items: center;
+}
+
+.tenant-input-wrapper {
+ display: flex;
+ align-items: center;
+ gap: 8px;
+}
+
+.tenant-input-wrapper label {
+ font-size: 14px;
+ color: #666;
+ white-space: nowrap;
+}
+
+#tenantInput {
+ width: 120px;
+ padding: 10px 15px;
+ font-size: 14px;
+ border: 1px solid #ddd;
+ border-radius: 4px;
+ outline: none;
+}
+
+#tenantInput:focus {
+ border-color: #e74c3c;
}
#searchInput {
diff --git a/frontend/static/js/app.js b/frontend/static/js/app.js
index 08e5cad..44fa819 100644
--- a/frontend/static/js/app.js
+++ b/frontend/static/js/app.js
@@ -1,8 +1,17 @@
-// SearchEngine Frontend - Modern UI
+// SearchEngine Frontend - Modern UI (Multi-Tenant)
-const API_BASE_URL = 'http://120.76.41.98:6002';
+const API_BASE_URL = 'http://localhost:6002';
document.getElementById('apiUrl').textContent = API_BASE_URL;
+// Get tenant ID from input
+function getTenantId() {
+ const tenantInput = document.getElementById('tenantInput');
+ if (tenantInput) {
+ return tenantInput.value.trim();
+ }
+ return '1'; // Default fallback
+}
+
// State Management
let state = {
query: '',
@@ -42,12 +51,18 @@ function toggleFilters() {
// Perform search
async function performSearch(page = 1) {
const query = document.getElementById('searchInput').value.trim();
+ const tenantId = getTenantId();
if (!query) {
alert('Please enter search keywords');
return;
}
+ if (!tenantId) {
+ alert('Please enter tenant ID');
+ return;
+ }
+
state.query = query;
state.currentPage = page;
state.pageSize = parseInt(document.getElementById('resultSize').value);
@@ -57,22 +72,22 @@ async function performSearch(page = 1) {
// Define facets (简化配置)
const facets = [
{
- "field": "categoryName_keyword",
+ "field": "category_keyword",
"size": 15,
"type": "terms"
},
{
- "field": "brandName_keyword",
+ "field": "vendor_keyword",
"size": 15,
"type": "terms"
},
{
- "field": "supplierName_keyword",
+ "field": "tags_keyword",
"size": 10,
"type": "terms"
},
{
- "field": "price",
+ "field": "min_price",
"type": "range",
"ranges": [
{"key": "0-50", "to": 50},
@@ -92,6 +107,7 @@ async function performSearch(page = 1) {
method: 'POST',
headers: {
'Content-Type': 'application/json',
+ 'X-Tenant-ID': tenantId,
},
body: JSON.stringify({
query: query,
@@ -140,7 +156,7 @@ async function performSearch(page = 1) {
function displayResults(data) {
const grid = document.getElementById('productGrid');
- if (!data.hits || data.hits.length === 0) {
+ if (!data.results || data.results.length === 0) {
grid.innerHTML = `
No Results Found
@@ -152,16 +168,20 @@ function displayResults(data) {
let html = '';
- data.hits.forEach((hit) => {
- const source = hit._source;
- const score = hit._custom_score || hit._score;
+ data.results.forEach((result) => {
+ const product = result;
+ const title = product.title || product.name || 'N/A';
+ const price = product.min_price || product.price || 'N/A';
+ const imageUrl = product.image_url || product.imageUrl || '';
+ const category = product.category || product.categoryName || '';
+ const vendor = product.vendor || product.brandName || '';
html += `
- ${source.imageUrl ? `
-
})
` : `
@@ -170,31 +190,17 @@ function displayResults(data) {
- ${source.price ? `${source.price} ₽` : 'N/A'}
-
-
-
- MOQ ${source.moq || 1} Box
-
-
-
- ${source.quantity || 'N/A'} pcs / Box
+ ${price !== 'N/A' ? `¥${price}` : 'N/A'}
- ${escapeHtml(source.name || source.enSpuName || 'N/A')}
+ ${escapeHtml(title)}
- ${source.categoryName ? escapeHtml(source.categoryName) : ''}
- ${source.brandName ? ' | ' + escapeHtml(source.brandName) : ''}
+ ${category ? escapeHtml(category) : ''}
+ ${vendor ? ' | ' + escapeHtml(vendor) : ''}
-
- ${source.create_time ? `
-
- Listed: ${formatDate(source.create_time)}
-
- ` : ''}
`;
});
@@ -211,13 +217,13 @@ function displayFacets(facets) {
let containerId = null;
let maxDisplay = 10;
- if (facet.field === 'categoryName_keyword') {
+ if (facet.field === 'category_keyword') {
containerId = 'categoryTags';
maxDisplay = 10;
- } else if (facet.field === 'brandName_keyword') {
+ } else if (facet.field === 'vendor_keyword') {
containerId = 'brandTags';
maxDisplay = 10;
- } else if (facet.field === 'supplierName_keyword') {
+ } else if (facet.field === 'tags_keyword') {
containerId = 'supplierTags';
maxDisplay = 8;
}
@@ -269,7 +275,7 @@ function toggleFilter(field, value) {
// Handle price filter (重构版 - 使用 rangeFilters)
function handlePriceFilter(value) {
if (!value) {
- delete state.rangeFilters.price;
+ delete state.rangeFilters.min_price;
} else {
const priceRanges = {
'0-50': { lt: 50 },
@@ -279,7 +285,7 @@ function handlePriceFilter(value) {
};
if (priceRanges[value]) {
- state.rangeFilters.price = priceRanges[value];
+ state.rangeFilters.min_price = priceRanges[value];
}
}
diff --git a/frontend/static/js/app_base.js b/frontend/static/js/app_base.js
index 22aadb0..715f4d4 100644
--- a/frontend/static/js/app_base.js
+++ b/frontend/static/js/app_base.js
@@ -1,9 +1,17 @@
-// SearchEngine Frontend - Modern UI
+// SearchEngine Frontend - Modern UI (Multi-Tenant)
-const TENANT_ID = '1';
const API_BASE_URL = 'http://localhost:6002';
document.getElementById('apiUrl').textContent = API_BASE_URL;
+// Get tenant ID from input
+function getTenantId() {
+ const tenantInput = document.getElementById('tenantInput');
+ if (tenantInput) {
+ return tenantInput.value.trim();
+ }
+ return '1'; // Default fallback
+}
+
// State Management
let state = {
query: '',
@@ -43,12 +51,18 @@ function toggleFilters() {
// Perform search
async function performSearch(page = 1) {
const query = document.getElementById('searchInput').value.trim();
+ const tenantId = getTenantId();
if (!query) {
alert('Please enter search keywords');
return;
}
+ if (!tenantId) {
+ alert('Please enter tenant ID');
+ return;
+ }
+
state.query = query;
state.currentPage = page;
state.pageSize = parseInt(document.getElementById('resultSize').value);
@@ -93,7 +107,7 @@ async function performSearch(page = 1) {
method: 'POST',
headers: {
'Content-Type': 'application/json',
- 'X-Tenant-ID': TENANT_ID,
+ 'X-Tenant-ID': tenantId,
},
body: JSON.stringify({
query: query,
diff --git a/frontend/unified.html b/frontend/unified.html
new file mode 100644
index 0000000..bc396ed
--- /dev/null
+++ b/frontend/unified.html
@@ -0,0 +1,138 @@
+
+
+
+
+
+
统一搜索界面 - Unified Search
+
+
+
+
+
+
+
+
+
+
+
+
+ 当前: Base - Tenant 1
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Results per page:
+
+
+
+
+
+
+
+
+
+
+
+
Welcome to Unified Search
+
Select a tenant and enter keywords to search for products
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/main.py b/main.py
index 817ee70..e6e4901 100755
--- a/main.py
+++ b/main.py
@@ -27,11 +27,11 @@ from search import Searcher
def cmd_ingest(args):
"""Run data ingestion."""
- print(f"Starting ingestion for customer: {args.customer}")
+ print("Starting data ingestion")
# Load config
- config_loader = ConfigLoader("config/schema")
- config = config_loader.load_customer_config(args.customer)
+ config_loader = ConfigLoader("config/config.yaml")
+ config = config_loader.load_config()
# Initialize ES
es_client = ESClient(hosts=[args.es_host])
@@ -65,11 +65,9 @@ def cmd_ingest(args):
def cmd_serve(args):
"""Start API service."""
- os.environ['CUSTOMER_ID'] = args.customer
os.environ['ES_HOST'] = args.es_host
- print(f"Starting API service...")
- print(f" Customer: {args.customer}")
+ print("Starting API service (multi-tenant)...")
print(f" Host: {args.host}:{args.port}")
print(f" Elasticsearch: {args.es_host}")
@@ -84,8 +82,8 @@ def cmd_serve(args):
def cmd_search(args):
"""Test search from command line."""
# Load config
- config_loader = ConfigLoader("config/schema")
- config = config_loader.load_customer_config(args.customer)
+ config_loader = ConfigLoader("config/config.yaml")
+ config = config_loader.load_config()
# Initialize ES and searcher
es_client = ESClient(hosts=[args.es_host])
@@ -93,15 +91,16 @@ def cmd_search(args):
print(f"ERROR: Cannot connect to Elasticsearch at {args.es_host}")
return 1
- searcher = Searcher(config, es_client)
+ from query import QueryParser
+ query_parser = QueryParser(config)
+ searcher = Searcher(config, es_client, query_parser)
# Execute search
- print(f"Searching for: '{args.query}'")
+ print(f"Searching for: '{args.query}' (tenant: {args.tenant_id})")
result = searcher.search(
query=args.query,
- size=args.size,
- enable_translation=not args.no_translation,
- enable_embedding=not args.no_embedding
+ tenant_id=args.tenant_id,
+ size=args.size
)
# Display results
@@ -136,7 +135,6 @@ def main():
# Ingest command
ingest_parser = subparsers.add_parser('ingest', help='Ingest data into Elasticsearch')
ingest_parser.add_argument('csv_file', help='Path to CSV data file')
- ingest_parser.add_argument('--customer', default='customer1', help='Customer ID')
ingest_parser.add_argument('--es-host', default='http://localhost:9200', help='Elasticsearch host')
ingest_parser.add_argument('--limit', type=int, help='Limit number of documents')
ingest_parser.add_argument('--batch-size', type=int, default=100, help='Batch size')
@@ -144,8 +142,7 @@ def main():
ingest_parser.add_argument('--skip-embeddings', action='store_true', help='Skip embeddings')
# Serve command
- serve_parser = subparsers.add_parser('serve', help='Start API service')
- serve_parser.add_argument('--customer', default='customer1', help='Customer ID')
+ serve_parser = subparsers.add_parser('serve', help='Start API service (multi-tenant)')
serve_parser.add_argument('--host', default='0.0.0.0', help='Host to bind to')
serve_parser.add_argument('--port', type=int, default=6002, help='Port to bind to')
serve_parser.add_argument('--es-host', default='http://localhost:9200', help='Elasticsearch host')
@@ -154,7 +151,7 @@ def main():
# Search command
search_parser = subparsers.add_parser('search', help='Test search from command line')
search_parser.add_argument('query', help='Search query')
- search_parser.add_argument('--customer', default='customer1', help='Customer ID')
+ search_parser.add_argument('--tenant-id', required=True, help='Tenant ID (required)')
search_parser.add_argument('--es-host', default='http://localhost:9200', help='Elasticsearch host')
search_parser.add_argument('--size', type=int, default=10, help='Number of results')
search_parser.add_argument('--no-translation', action='store_true', help='Disable translation')
diff --git a/restart.sh b/restart.sh
index cfdc746..2092901 100755
--- a/restart.sh
+++ b/restart.sh
@@ -34,8 +34,8 @@ sleep 3
# Step 2: Start all services
echo -e "\n${YELLOW}Step 2/2: 重新启动服务${NC}"
-if [ -f "./run.sh" ]; then
- ./run.sh
+if [ -f "./scripts/start.sh" ]; then
+ ./scripts/start.sh
if [ $? -eq 0 ]; then
echo -e "${GREEN}========================================${NC}"
echo -e "${GREEN}服务重启完成!${NC}"
diff --git a/run.sh b/run.sh
index e02f64c..97fab3d 100755
--- a/run.sh
+++ b/run.sh
@@ -17,95 +17,5 @@ echo -e "${GREEN}========================================${NC}"
# Create logs directory if it doesn't exist
mkdir -p logs
-# Step 1: Start backend in background
-echo -e "\n${YELLOW}Step 1/2: 启动后端服务${NC}"
-echo -e "${YELLOW}后端服务将在后台运行...${NC}"
-
-nohup ./scripts/start_backend.sh > logs/backend.log 2>&1 &
-BACKEND_PID=$!
-echo $BACKEND_PID > logs/backend.pid
-echo -e "${GREEN}后端服务已启动 (PID: $BACKEND_PID)${NC}"
-echo -e "${GREEN}日志文件: logs/backend.log${NC}"
-
-# Wait for backend to start
-echo -e "${YELLOW}等待后端服务启动...${NC}"
-MAX_RETRIES=30
-RETRY_COUNT=0
-BACKEND_READY=false
-
-while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
- sleep 2
- if curl -s http://localhost:6002/ > /dev/null 2>&1; then
- BACKEND_READY=true
- break
- fi
- RETRY_COUNT=$((RETRY_COUNT + 1))
- echo -e "${YELLOW} 等待中... ($RETRY_COUNT/$MAX_RETRIES)${NC}"
-done
-
-# Check if backend is running
-if [ "$BACKEND_READY" = true ]; then
- echo -e "${GREEN}✓ 后端服务运行正常${NC}"
- # Try health check
- if curl -s http://localhost:6002/admin/health > /dev/null 2>&1; then
- echo -e "${GREEN}✓ 健康检查通过${NC}"
- else
- echo -e "${YELLOW}⚠ 健康检查未通过,但服务已启动${NC}"
- fi
-else
- echo -e "${RED}✗ 后端服务启动失败,请检查日志: logs/backend.log${NC}"
- echo -e "${YELLOW}提示: 后端服务可能需要更多时间启动,或者检查端口是否被占用${NC}"
- exit 1
-fi
-
-# Step 2: Start frontend in background
-echo -e "\n${YELLOW}Step 2/2: 启动前端服务${NC}"
-echo -e "${YELLOW}前端服务将在后台运行...${NC}"
-
-nohup ./scripts/start_frontend.sh > logs/frontend.log 2>&1 &
-FRONTEND_PID=$!
-echo $FRONTEND_PID > logs/frontend.pid
-echo -e "${GREEN}前端服务已启动 (PID: $FRONTEND_PID)${NC}"
-echo -e "${GREEN}日志文件: logs/frontend.log${NC}"
-
-# Wait for frontend to start
-echo -e "${YELLOW}等待前端服务启动...${NC}"
-MAX_RETRIES=15
-RETRY_COUNT=0
-FRONTEND_READY=false
-
-while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
- sleep 2
- if curl -s http://localhost:6003/ > /dev/null 2>&1; then
- FRONTEND_READY=true
- break
- fi
- RETRY_COUNT=$((RETRY_COUNT + 1))
- echo -e "${YELLOW} 等待中... ($RETRY_COUNT/$MAX_RETRIES)${NC}"
-done
-
-# Check if frontend is running
-if [ "$FRONTEND_READY" = true ]; then
- echo -e "${GREEN}✓ 前端服务运行正常${NC}"
-else
- echo -e "${YELLOW}⚠ 前端服务可能还在启动中,请稍后访问${NC}"
-fi
-
-echo -e "${GREEN}========================================${NC}"
-echo -e "${GREEN}所有服务启动完成!${NC}"
-echo -e "${GREEN}========================================${NC}"
-echo ""
-echo -e "访问地址:"
-echo -e " ${GREEN}前端界面: http://localhost:6003${NC}"
-echo -e " ${GREEN}后端API: http://localhost:6002${NC}"
-echo -e " ${GREEN}API文档: http://localhost:6002/docs${NC}"
-echo ""
-echo -e "日志文件:"
-echo -e " 后端: logs/backend.log"
-echo -e " 前端: logs/frontend.log"
-echo ""
-echo -e "停止服务:"
-echo -e " 所有服务: ./stop.sh"
-echo -e " 单独停止后端: kill \$(cat logs/backend.pid)"
-echo -e " 单独停止前端: kill \$(cat logs/frontend.pid)"
-echo ""
\ No newline at end of file
+# Call unified start script
+./scripts/start.sh
\ No newline at end of file
diff --git a/scripts/demo_base.sh b/scripts/demo_base.sh
index 6d266d1..3cf1f2a 100755
--- a/scripts/demo_base.sh
+++ b/scripts/demo_base.sh
@@ -178,7 +178,7 @@ echo -e "${GREEN}演示环境启动完成!${NC}"
echo -e "${GREEN}========================================${NC}"
echo ""
echo -e "访问地址:"
-echo -e " ${GREEN}前端界面: http://localhost:$FRONTEND_PORT/base${NC}"
+echo -e " ${GREEN}前端界面: http://localhost:$FRONTEND_PORT/base${NC} (或 http://localhost:$FRONTEND_PORT/base.html)"
echo -e " ${GREEN}后端API: http://localhost:$API_PORT${NC}"
echo -e " ${GREEN}API文档: http://localhost:$API_PORT/docs${NC}"
echo ""
diff --git a/scripts/frontend_server.py b/scripts/frontend_server.py
index 02c1cec..20ca6d1 100755
--- a/scripts/frontend_server.py
+++ b/scripts/frontend_server.py
@@ -47,13 +47,18 @@ class MyHTTPRequestHandler(http.server.SimpleHTTPRequestHandler, RateLimitingMix
def do_GET(self):
"""Handle GET requests with support for base.html."""
- # Route /base to base.html
- if self.path == '/base' or self.path == '/base/':
- self.path = '/base.html'
+ # Parse path (handle query strings)
+ path = self.path.split('?')[0] # Remove query string if present
+
+ # Route /base to base.html (handle both with and without trailing slash)
+ if path == '/base' or path == '/base/':
+ self.path = '/base.html' + (self.path.split('?', 1)[1] if '?' in self.path else '')
# Route / to index.html (default)
- elif self.path == '/':
- self.path = '/index.html'
- return super().do_GET()
+ elif path == '/' or path == '':
+ self.path = '/index.html' + (self.path.split('?', 1)[1] if '?' in self.path else '')
+
+ # Call parent do_GET with modified path
+ super().do_GET()
def setup(self):
"""Setup with error handling."""
@@ -125,6 +130,18 @@ class ThreadedTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
daemon_threads = True
if __name__ == '__main__':
+ # Check if port is already in use
+ import socket
+ sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ try:
+ sock.bind(("", PORT))
+ sock.close()
+ except OSError:
+ print(f"ERROR: Port {PORT} is already in use.")
+ print(f"Please stop the existing server or use a different port.")
+ print(f"To stop existing server: kill $(lsof -t -i:{PORT})")
+ sys.exit(1)
+
# Create threaded server for better concurrency
with ThreadedTCPServer(("", PORT), MyHTTPRequestHandler) as httpd:
print(f"Frontend server started at http://localhost:{PORT}")
diff --git a/scripts/ingest.sh b/scripts/ingest.sh
index f048ed0..a81b827 100755
--- a/scripts/ingest.sh
+++ b/scripts/ingest.sh
@@ -1,8 +1,7 @@
#!/bin/bash
-# Data Ingestion Script for Customer1
-
-set -e
+# Unified data ingestion script for SearchEngine
+# Ingests data from MySQL to Elasticsearch
cd "$(dirname "$0")/.."
source /home/tw/miniconda3/etc/profile.d/conda.sh
@@ -10,41 +9,75 @@ conda activate searchengine
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
+RED='\033[0;31m'
NC='\033[0m'
echo -e "${GREEN}========================================${NC}"
-echo -e "${GREEN}Customer1 Data Ingestion${NC}"
+echo -e "${GREEN}数据灌入脚本${NC}"
echo -e "${GREEN}========================================${NC}"
-# Default values
-LIMIT=${1:-1000}
-SKIP_EMBEDDINGS=${2:-false}
+# Load config from .env file if it exists
+if [ -f .env ]; then
+ set -a
+ source .env
+ set +a
+fi
+
+# Parameters
+TENANT_ID=${1:-"1"}
+DB_HOST=${DB_HOST:-"120.79.247.228"}
+DB_PORT=${DB_PORT:-"3316"}
+DB_DATABASE=${DB_DATABASE:-"saas"}
+DB_USERNAME=${DB_USERNAME:-"saas"}
+DB_PASSWORD=${DB_PASSWORD:-"P89cZHS5d7dFyc9R"}
+ES_HOST=${ES_HOST:-"http://localhost:9200"}
+BATCH_SIZE=${BATCH_SIZE:-500}
+RECREATE=${RECREATE:-false}
echo -e "\n${YELLOW}Configuration:${NC}"
-echo " Limit: $LIMIT documents"
-echo " Skip embeddings: $SKIP_EMBEDDINGS"
+echo " Tenant ID: $TENANT_ID"
+echo " MySQL: $DB_HOST:$DB_PORT/$DB_DATABASE"
+echo " Elasticsearch: $ES_HOST"
+echo " Batch Size: $BATCH_SIZE"
+echo " Recreate Index: $RECREATE"
-CSV_FILE="data/customer1/goods_with_pic.5years_congku.csv.shuf.1w"
+# Validate parameters
+if [ -z "$TENANT_ID" ]; then
+ echo -e "${RED}ERROR: Tenant ID is required${NC}"
+ echo "Usage: $0
[batch_size] [recreate]"
+ exit 1
+fi
-if [ ! -f "$CSV_FILE" ]; then
- echo "Error: CSV file not found: $CSV_FILE"
+if [ -z "$DB_PASSWORD" ]; then
+ echo -e "${RED}ERROR: DB_PASSWORD未设置,请检查.env文件或环境变量${NC}"
exit 1
fi
# Build command
-CMD="python data/customer1/ingest_customer1.py \
- --csv $CSV_FILE \
- --limit $LIMIT \
- --recreate-index \
- --batch-size 100"
-
-if [ "$SKIP_EMBEDDINGS" = "true" ]; then
- CMD="$CMD --skip-embeddings"
+CMD="python scripts/ingest_shoplazza.py \
+ --db-host $DB_HOST \
+ --db-port $DB_PORT \
+ --db-database $DB_DATABASE \
+ --db-username $DB_USERNAME \
+ --db-password $DB_PASSWORD \
+ --tenant-id $TENANT_ID \
+ --es-host $ES_HOST \
+ --batch-size $BATCH_SIZE"
+
+if [ "$RECREATE" = "true" ] || [ "$RECREATE" = "1" ]; then
+ CMD="$CMD --recreate"
fi
-echo -e "\n${YELLOW}Starting ingestion...${NC}"
+echo -e "\n${YELLOW}Starting data ingestion...${NC}"
eval $CMD
-echo -e "\n${GREEN}========================================${NC}"
-echo -e "${GREEN}Ingestion Complete!${NC}"
-echo -e "${GREEN}========================================${NC}"
+if [ $? -eq 0 ]; then
+ echo -e "\n${GREEN}========================================${NC}"
+ echo -e "${GREEN}数据灌入完成!${NC}"
+ echo -e "${GREEN}========================================${NC}"
+else
+ echo -e "\n${RED}========================================${NC}"
+ echo -e "${RED}数据灌入失败!${NC}"
+ echo -e "${RED}========================================${NC}"
+ exit 1
+fi
diff --git a/scripts/ingest_shoplazza.py b/scripts/ingest_shoplazza.py
index 37b5e22..12878af 100644
--- a/scripts/ingest_shoplazza.py
+++ b/scripts/ingest_shoplazza.py
@@ -33,7 +33,6 @@ def main():
# Tenant and index
parser.add_argument('--tenant-id', required=True, help='Tenant ID (required)')
- parser.add_argument('--config', default='base', help='Configuration ID (default: base)')
parser.add_argument('--es-host', default='http://localhost:9200', help='Elasticsearch host')
# Options
@@ -44,11 +43,11 @@ def main():
print(f"Starting Shoplazza data ingestion for tenant: {args.tenant_id}")
- # Load configuration
- config_loader = ConfigLoader("config/schema")
+ # Load unified configuration
+ config_loader = ConfigLoader("config/config.yaml")
try:
- config = config_loader.load_customer_config(args.config)
- print(f"Loaded configuration: {config.customer_name}")
+ config = config_loader.load_config()
+ print(f"Loaded configuration: {config.es_index_name}")
except Exception as e:
print(f"ERROR: Failed to load configuration: {e}")
return 1
diff --git a/scripts/mock_data.sh b/scripts/mock_data.sh
new file mode 100755
index 0000000..222faa4
--- /dev/null
+++ b/scripts/mock_data.sh
@@ -0,0 +1,88 @@
+#!/bin/bash
+
+# Mock data script for SearchEngine
+# Generates test data and imports to MySQL
+
+cd "$(dirname "$0")/.."
+source /home/tw/miniconda3/etc/profile.d/conda.sh
+conda activate searchengine
+
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+RED='\033[0;31m'
+NC='\033[0m'
+
+echo -e "${GREEN}========================================${NC}"
+echo -e "${GREEN}Mock Data Script${NC}"
+echo -e "${GREEN}========================================${NC}"
+
+# Load config from .env file if it exists
+if [ -f .env ]; then
+ set -a
+ source .env
+ set +a
+fi
+
+# Parameters
+TENANT_ID=${1:-"1"}
+NUM_SPUS=${2:-100}
+DB_HOST=${DB_HOST:-"120.79.247.228"}
+DB_PORT=${DB_PORT:-"3316"}
+DB_DATABASE=${DB_DATABASE:-"saas"}
+DB_USERNAME=${DB_USERNAME:-"saas"}
+DB_PASSWORD=${DB_PASSWORD:-"P89cZHS5d7dFyc9R"}
+SQL_FILE="test_data.sql"
+
+echo -e "\n${YELLOW}Configuration:${NC}"
+echo " Tenant ID: $TENANT_ID"
+echo " Number of SPUs: $NUM_SPUS"
+echo " MySQL: $DB_HOST:$DB_PORT/$DB_DATABASE"
+echo " SQL File: $SQL_FILE"
+
+# Step 1: Generate test data
+echo -e "\n${YELLOW}Step 1/2: 生成测试数据${NC}"
+python scripts/generate_test_data.py \
+ --num-spus $NUM_SPUS \
+ --tenant-id "$TENANT_ID" \
+ --start-spu-id 1 \
+ --start-sku-id 1 \
+ --output "$SQL_FILE"
+
+if [ $? -ne 0 ]; then
+ echo -e "${RED}✗ 生成测试数据失败${NC}"
+ exit 1
+fi
+
+echo -e "${GREEN}✓ 测试数据已生成: $SQL_FILE${NC}"
+
+# Step 2: Import test data to MySQL
+echo -e "\n${YELLOW}Step 2/2: 导入测试数据到MySQL${NC}"
+if [ -z "$DB_PASSWORD" ]; then
+ echo -e "${RED}ERROR: DB_PASSWORD未设置,请检查.env文件或环境变量${NC}"
+ exit 1
+fi
+
+python scripts/import_test_data.py \
+ --db-host "$DB_HOST" \
+ --db-port "$DB_PORT" \
+ --db-database "$DB_DATABASE" \
+ --db-username "$DB_USERNAME" \
+ --db-password "$DB_PASSWORD" \
+ --sql-file "$SQL_FILE" \
+ --tenant-id "$TENANT_ID"
+
+if [ $? -ne 0 ]; then
+ echo -e "${RED}✗ 导入测试数据失败${NC}"
+ exit 1
+fi
+
+echo -e "${GREEN}✓ 测试数据已导入MySQL${NC}"
+
+echo -e "\n${GREEN}========================================${NC}"
+echo -e "${GREEN}Mock数据完成!${NC}"
+echo -e "${GREEN}========================================${NC}"
+echo ""
+echo -e "下一步:"
+echo -e " ${YELLOW}./scripts/ingest.sh --tenant-id $TENANT_ID${NC} - 从MySQL灌入数据到ES"
+echo ""
+
diff --git a/scripts/start.sh b/scripts/start.sh
new file mode 100755
index 0000000..4a40d9e
--- /dev/null
+++ b/scripts/start.sh
@@ -0,0 +1,106 @@
+#!/bin/bash
+
+# Unified startup script for SearchEngine services
+# This script starts both frontend and backend services
+
+cd "$(dirname "$0")/.."
+
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+RED='\033[0;31m'
+NC='\033[0m'
+
+echo -e "${GREEN}========================================${NC}"
+echo -e "${GREEN}SearchEngine服务启动脚本${NC}"
+echo -e "${GREEN}========================================${NC}"
+
+# Create logs directory if it doesn't exist
+mkdir -p logs
+
+# Step 1: Start backend in background
+echo -e "\n${YELLOW}Step 1/2: 启动后端服务${NC}"
+echo -e "${YELLOW}后端服务将在后台运行...${NC}"
+
+nohup ./scripts/start_backend.sh > logs/backend.log 2>&1 &
+BACKEND_PID=$!
+echo $BACKEND_PID > logs/backend.pid
+echo -e "${GREEN}后端服务已启动 (PID: $BACKEND_PID)${NC}"
+echo -e "${GREEN}日志文件: logs/backend.log${NC}"
+
+# Wait for backend to start
+echo -e "${YELLOW}等待后端服务启动...${NC}"
+MAX_RETRIES=30
+RETRY_COUNT=0
+BACKEND_READY=false
+
+while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
+ sleep 2
+ if curl -s http://localhost:6002/health > /dev/null 2>&1; then
+ BACKEND_READY=true
+ break
+ fi
+ RETRY_COUNT=$((RETRY_COUNT + 1))
+ echo -e "${YELLOW} 等待中... ($RETRY_COUNT/$MAX_RETRIES)${NC}"
+done
+
+# Check if backend is running
+if [ "$BACKEND_READY" = true ]; then
+ echo -e "${GREEN}✓ 后端服务运行正常${NC}"
+else
+ echo -e "${RED}✗ 后端服务启动失败,请检查日志: logs/backend.log${NC}"
+ echo -e "${YELLOW}提示: 后端服务可能需要更多时间启动,或者检查端口是否被占用${NC}"
+ exit 1
+fi
+
+# Step 2: Start frontend in background
+echo -e "\n${YELLOW}Step 2/2: 启动前端服务${NC}"
+echo -e "${YELLOW}前端服务将在后台运行...${NC}"
+
+nohup ./scripts/start_frontend.sh > logs/frontend.log 2>&1 &
+FRONTEND_PID=$!
+echo $FRONTEND_PID > logs/frontend.pid
+echo -e "${GREEN}前端服务已启动 (PID: $FRONTEND_PID)${NC}"
+echo -e "${GREEN}日志文件: logs/frontend.log${NC}"
+
+# Wait for frontend to start
+echo -e "${YELLOW}等待前端服务启动...${NC}"
+MAX_RETRIES=15
+RETRY_COUNT=0
+FRONTEND_READY=false
+
+while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
+ sleep 2
+ if curl -s http://localhost:6003/ > /dev/null 2>&1; then
+ FRONTEND_READY=true
+ break
+ fi
+ RETRY_COUNT=$((RETRY_COUNT + 1))
+ echo -e "${YELLOW} 等待中... ($RETRY_COUNT/$MAX_RETRIES)${NC}"
+done
+
+# Check if frontend is running
+if [ "$FRONTEND_READY" = true ]; then
+ echo -e "${GREEN}✓ 前端服务运行正常${NC}"
+else
+ echo -e "${YELLOW}⚠ 前端服务可能还在启动中,请稍后访问${NC}"
+fi
+
+echo -e "${GREEN}========================================${NC}"
+echo -e "${GREEN}所有服务启动完成!${NC}"
+echo -e "${GREEN}========================================${NC}"
+echo ""
+echo -e "访问地址:"
+echo -e " ${GREEN}前端界面: http://localhost:6003${NC}"
+echo -e " ${GREEN}后端API: http://localhost:6002${NC}"
+echo -e " ${GREEN}API文档: http://localhost:6002/docs${NC}"
+echo ""
+echo -e "日志文件:"
+echo -e " 后端: logs/backend.log"
+echo -e " 前端: logs/frontend.log"
+echo ""
+echo -e "停止服务:"
+echo -e " 所有服务: ./scripts/stop.sh"
+echo -e " 单独停止后端: kill \$(cat logs/backend.pid)"
+echo -e " 单独停止前端: kill \$(cat logs/frontend.pid)"
+echo ""
+
diff --git a/scripts/start_backend.sh b/scripts/start_backend.sh
index a7e4ac1..4a36327 100755
--- a/scripts/start_backend.sh
+++ b/scripts/start_backend.sh
@@ -24,16 +24,14 @@ if [ -f .env ]; then
fi
echo -e "\n${YELLOW}Configuration:${NC}"
-echo " Customer: ${CUSTOMER_ID:-customer1}"
echo " API Host: ${API_HOST:-0.0.0.0}"
echo " API Port: ${API_PORT:-6002}"
echo " ES Host: ${ES_HOST:-http://localhost:9200}"
echo " ES Username: ${ES_USERNAME:-not set}"
-echo -e "\n${YELLOW}Starting service...${NC}"
+echo -e "\n${YELLOW}Starting service (multi-tenant)...${NC}"
# Export environment variables for the Python process
-export CUSTOMER_ID=${CUSTOMER_ID:-customer1}
export API_HOST=${API_HOST:-0.0.0.0}
export API_PORT=${API_PORT:-6002}
export ES_HOST=${ES_HOST:-http://localhost:9200}
@@ -43,6 +41,5 @@ export ES_PASSWORD=${ES_PASSWORD:-}
python -m api.app \
--host $API_HOST \
--port $API_PORT \
- --customer $CUSTOMER_ID \
--es-host $ES_HOST
diff --git a/scripts/start_servers.py b/scripts/start_servers.py
index 140d5d6..50db5dc 100755
--- a/scripts/start_servers.py
+++ b/scripts/start_servers.py
@@ -9,6 +9,7 @@ import signal
import time
import subprocess
import logging
+import argparse
from typing import Dict, List, Optional
import multiprocessing
import threading
@@ -65,12 +66,11 @@ class ServerManager:
logger.error(f"Failed to start frontend server: {e}")
return False
- def start_api_server(self, customer: str = "customer1", es_host: str = "http://localhost:9200") -> bool:
+ def start_api_server(self, es_host: str = "http://localhost:9200") -> bool:
"""Start the API server."""
try:
cmd = [
sys.executable, 'main.py', 'serve',
- '--customer', customer,
'--es-host', es_host,
'--host', '0.0.0.0',
'--port', '6002'
@@ -78,7 +78,6 @@ class ServerManager:
env = os.environ.copy()
env['PYTHONUNBUFFERED'] = '1'
- env['CUSTOMER_ID'] = customer
env['ES_HOST'] = es_host
process = subprocess.Popen(
@@ -179,14 +178,12 @@ def main():
"""Main function to start all servers."""
global manager
- parser = argparse.ArgumentParser(description='Start SearchEngine servers')
- parser.add_argument('--customer', default='customer1', help='Customer ID')
+ parser = argparse.ArgumentParser(description='Start SearchEngine servers (multi-tenant)')
parser.add_argument('--es-host', default='http://localhost:9200', help='Elasticsearch host')
parser.add_argument('--check-dependencies', action='store_true', help='Check dependencies before starting')
args = parser.parse_args()
- logger.info("Starting SearchEngine servers...")
- logger.info(f"Customer: {args.customer}")
+ logger.info("Starting SearchEngine servers (multi-tenant)...")
logger.info(f"Elasticsearch: {args.es_host}")
# Check dependencies if requested
@@ -209,7 +206,7 @@ def main():
try:
# Start servers
- if not manager.start_api_server(args.customer, args.es_host):
+ if not manager.start_api_server(args.es_host):
logger.error("Failed to start API server")
sys.exit(1)
diff --git a/test_all.sh b/test_all.sh
index 3c62b66..8d7200d 100755
--- a/test_all.sh
+++ b/test_all.sh
@@ -43,8 +43,8 @@ try:
es_config = get_es_config()
es_client = ESClient(hosts=[es_config['host']], username=es_config.get('username'), password=es_config.get('password'))
- config_loader = ConfigLoader('config/schema')
- config = config_loader.load_customer_config('customer1')
+ config_loader = ConfigLoader('config/config.yaml')
+ config = config_loader.load_config()
if es_client.index_exists(config.es_index_name):
doc_count = es_client.count(config.es_index_name)
diff --git a/tests/conftest.py b/tests/conftest.py
index d0d1c3c..28646c8 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -15,7 +15,8 @@ from unittest.mock import Mock, MagicMock
project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, project_root)
-from config import CustomerConfig, QueryConfig, IndexConfig, FieldConfig, SPUConfig, RankingConfig
+from config import CustomerConfig, QueryConfig, IndexConfig, FieldConfig, SPUConfig, RankingConfig, FunctionScoreConfig, RerankConfig
+from config.field_types import FieldType, AnalyzerType
from utils.es_client import ESClient
from search import Searcher
from query import QueryParser
@@ -39,7 +40,9 @@ def sample_index_config() -> IndexConfig:
"""样例索引配置"""
return IndexConfig(
name="default",
- match_fields=["name", "brand_name", "tags"],
+ label="默认索引",
+ fields=["name", "brand_name", "tags"],
+ analyzer=AnalyzerType.CHINESE_ECOMMERCE,
language_field_mapping={
"zh": ["name", "brand_name"],
"en": ["name_en", "brand_name_en"]
@@ -64,23 +67,29 @@ def sample_customer_config(sample_index_config) -> CustomerConfig:
)
ranking_config = RankingConfig(
- expression="static_bm25() + text_embedding_relevance() * 0.2"
+ expression="static_bm25() + text_embedding_relevance() * 0.2",
+ description="Test ranking"
)
+ function_score_config = FunctionScoreConfig()
+ rerank_config = RerankConfig()
+
return CustomerConfig(
- customer_id="test_customer",
es_index_name="test_products",
- query=query_config,
+ fields=[
+ FieldConfig(name="tenant_id", field_type=FieldType.KEYWORD, required=True),
+ FieldConfig(name="name", field_type=FieldType.TEXT, analyzer=AnalyzerType.CHINESE_ECOMMERCE),
+ FieldConfig(name="brand_name", field_type=FieldType.TEXT, analyzer=AnalyzerType.CHINESE_ECOMMERCE),
+ FieldConfig(name="tags", field_type=FieldType.TEXT, analyzer=AnalyzerType.CHINESE_ECOMMERCE),
+ FieldConfig(name="price", field_type=FieldType.DOUBLE),
+ FieldConfig(name="category_id", field_type=FieldType.INT),
+ ],
indexes=[sample_index_config],
- spu=spu_config,
+ query_config=query_config,
ranking=ranking_config,
- fields=[
- FieldConfig(name="name", type="TEXT", analyzer="ansj"),
- FieldConfig(name="brand_name", type="TEXT", analyzer="ansj"),
- FieldConfig(name="tags", type="TEXT", analyzer="ansj"),
- FieldConfig(name="price", type="DOUBLE"),
- FieldConfig(name="category_id", type="INT"),
- ]
+ function_score=function_score_config,
+ rerank=rerank_config,
+ spu_config=spu_config
)
@@ -165,31 +174,48 @@ def temp_config_file() -> Generator[str, None, None]:
import yaml
config_data = {
- "customer_id": "test_customer",
"es_index_name": "test_products",
- "query": {
+ "query_config": {
"enable_query_rewrite": True,
"enable_translation": True,
"enable_text_embedding": True,
"supported_languages": ["zh", "en"]
},
+ "fields": [
+ {"name": "tenant_id", "type": "KEYWORD", "required": True},
+ {"name": "name", "type": "TEXT", "analyzer": "ansj"},
+ {"name": "brand_name", "type": "TEXT", "analyzer": "ansj"}
+ ],
"indexes": [
{
"name": "default",
- "match_fields": ["name", "brand_name"],
+ "label": "默认索引",
+ "fields": ["name", "brand_name"],
+ "analyzer": "ansj",
"language_field_mapping": {
"zh": ["name", "brand_name"],
"en": ["name_en", "brand_name_en"]
}
}
],
- "spu": {
+ "spu_config": {
"enabled": True,
"spu_field": "spu_id",
"inner_hits_size": 3
},
"ranking": {
- "expression": "static_bm25() + text_embedding_relevance() * 0.2"
+ "expression": "static_bm25() + text_embedding_relevance() * 0.2",
+ "description": "Test ranking"
+ },
+ "function_score": {
+ "score_mode": "sum",
+ "boost_mode": "multiply",
+ "functions": []
+ },
+ "rerank": {
+ "enabled": False,
+ "expression": "",
+ "description": ""
}
}
@@ -209,7 +235,6 @@ def mock_env_variables(monkeypatch):
monkeypatch.setenv("ES_HOST", "http://localhost:9200")
monkeypatch.setenv("ES_USERNAME", "elastic")
monkeypatch.setenv("ES_PASSWORD", "changeme")
- monkeypatch.setenv("CUSTOMER_ID", "test_customer")
# 标记配置
--
libgit2 0.21.2