From 15e63baff6d2af1954047809de30dcfad04159d8 Mon Sep 17 00:00:00 2001 From: tangwang Date: Tue, 18 Nov 2025 14:03:15 +0800 Subject: [PATCH] 索引文档修改 --- .gitignore | 2 ++ docs/商品导入模板说明.md | 346 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ docs/索引字段说明.md | 357 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------------------------------------------------------------------------------- scripts/csv_to_excel.py | 355 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ scripts/tenant3__csv_to_shoplazza_xlsx.sh | 18 ++++++++++++++++++ 5 files changed, 956 insertions(+), 122 deletions(-) create mode 100644 docs/商品导入模板说明.md create mode 100755 scripts/csv_to_excel.py create mode 100644 scripts/tenant3__csv_to_shoplazza_xlsx.sh diff --git a/.gitignore b/.gitignore index 0803b09..4f22f21 100644 --- a/.gitignore +++ b/.gitignore @@ -67,3 +67,5 @@ data.* *.log log/ + +*.xlsx diff --git a/docs/商品导入模板说明.md b/docs/商品导入模板说明.md new file mode 100644 index 0000000..49b9512 --- /dev/null +++ b/docs/商品导入模板说明.md @@ -0,0 +1,346 @@ +# 商品导入模板说明文档 + +本文档完整描述了商品导入Excel模板的结构、字段说明和示例数据。 + +## 字段列表 + +| 列号 | 字段名称 | 字段说明 | +|------|----------|----------| +| 1 | 商品ID | (此行导入时不可删除 )商品 ID 是系统生成的唯一标识符,新增商品无需填写 | +| 2 | 创建时间 | 创建时间的时区为当前店铺设置的时区 | +| 3 | 商品标题* | 最多255字符(同一商品的子款式标题与商品标题需一致,且中间请勿插入其他商品) | +| 4 | 商品属性* | 1.多款式商品:
同一个商品的首行标题填入「M」(商品主体);
同一个商品的非首行标题填入「P」(子款式);
2.单一款式商品(Single):填入「S」 | +| 5 | 商品副标题 | 最多600字符 | +| 6 | 商品描述 | 上传不带样式,若需要样式,则输入html代码 | +| 7 | SEO标题 | 最多5000字符 | +| 8 | SEO描述 | 最多5000字符 | +| 9 | SEO URL Handle | 最多支持输入255字符
(SEO URL handle只对SEO URL的「URL参数」部分进行更改,即“products/”后的内容,如:products/「URL参数」
) | +| 10 | SEO URL 重定向 | 创建URL重定向,访问修改前链接可跳转到修改后的新链接页面
「Y」:TRUE
「N」:FALSE | +| 11 | SEO关键词 | 多个关键词请用「英文逗号」隔开 | +| 12 | 商品上架 | 「Y」:上架状态
「N」:下架状态
默认为「N」 | +| 13 | 需要物流 | 「Y」:需要
「N」:不需要
默认为「Y」 | +| 14 | 商品收税 | 「Y」:需要
「N」:不需要
默认为「N」 | +| 15 | 商品spu | 最多100字符 | +| 16 | 启用虚拟销量 | 「Y」:启用
「N」:不启用
默认为「N」 | +| 17 | 虚拟销量值 | 最多输入六位数的自然数 | +| 18 | 跟踪库存 | 「Y」:跟踪
「N」:不跟踪
默认为「N」 | +| 19 | 库存规则* | 若跟踪库存为Y,则该项必填:
填入「1」表示:库存为0允许购买
填入「2」表示:库存为0不允许购买
填入「3」表示:库存为0自动下架 | +| 20 | 专辑名称 | 请填入已创建的手动专辑名称;
多个专辑请用「英文逗号」隔开 | +| 21 | 标签 | 最多输入250个标签,每个不得超过500字符,多个标签请用「英文逗号」隔开 | +| 22 | 供应商名称 | 最多20字符 | +| 23 | 供应商URL | 请输入供应商URL | +| 24 | 款式1 | 最多255字符 | +| 25 | 款式2 | 最多255字符 | +| 26 | 款式3 | 最多255字符 | +| 27 | 商品售价* | 最多输入9位正整数,2位小数 | +| 28 | 商品原价 | 最多输入9位正整数,2位小数 | +| 29 | 成本价 | 最多输入9位正整数,2位小数 | +| 30 | 商品SKU | 最多255字符 | +| 31 | 商品重量 | 最多输入10位正整数,2位小数 | +| 32 | 重量单位 | 可选单位有:kg, lb, g, oz;
默认为kg | +| 33 | 商品条形码 | 最多100字符 | +| 34 | 商品库存 | 只支持导入默认地点库存,如需导入其他地点的库存,请到【库存列表】导入。最多输入9位整数 | +| 35 | 尺寸信息 | 按长宽高顺序填写,用「英文逗号」隔开,单位默认为英寸 | +| 36 | 原产地国别 | 请填写国家代码 | +| 37 | HS(协调制度)代码 | 6-12纯数字 | +| 38 | 商品图片* | 商品属性为M(商品主体)与S(单一款式商品)的行可填入商品图URL:
1.若商品主图字段有值,则填入的图片URL为商品副图(可选填)
2.若商品主图字段为空,则必须至少填入一个图片URL,且首个URL为商品主图URL,其他为商品副图
3.填多个URL可用「英文逗号」隔开

商品属性为P(子款式)的行可填入子款式图URL:
1.仅支持填入一个URL作为子款式图片,填入多个时默认导入第一个;若不填入,则默认不需要图片
2.同一商品的部分子款式上传图片,则其余款式也需要上传,否则默认填入商品主图
| +| 39 | 商品备注 | 最多输入500个字 | +| 40 | 款式备注 | 最多输入20个字 | +| 41 | 商品主图 | 1.只需商品属性为M(商品主体)与S(单一款式商品)的行填写(可选填)
2.仅能填入一个图片URL作为商品主图,填入多个时默认只导入第一个 | +| 42 | 如需导入元字段, 请在此行增添【元字段-命名空间和密钥】作为表头,例:元字段-test.111 | 1. 请确认导入的元字段在店铺后台已创建;
2. 只需商品属性为M(商品主体)与S(单一款式商品)的行填写元字段值(可选填)
3.各类型元字段填写规范详见帮助文档:https://helpcenter.shoplazza.com/hc/zh-cn/articles/37520605874457 | + +## 示例数据 + +模板中包含以下示例数据,展示了多款式商品(M+P)和单一款式商品(S)的格式: + +### 商品示例:Legendary Whitetails Men's Buck Camp Flannel Shirt + +#### 行 4 - 商品属性: M + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `M` +- 商品副标题: `100% Cotton +We recommend ordering a size down if you prefer a closer fit +Purchase from Amazon seller Legendary Whitetails to ensure you receive an authentic Legendary Whitetails branded Buck Camp F...` +- 商品描述: `

A hunter’s wardrobe is not complete without a great flannel. Our exclusive plaids are made from 100% cotton soft brushed flannel. Featuring double pleat back for ease of movement and contr...` + +**SEO信息:** +- SEO标题: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- SEO描述: `Featuring double pleat back for ease of movement and contrasting corduroy lined collar and cuffs for a great look and lasting durability.` +- SEO URL Handle: `products/maternity-lace-cardigan-dress-photo-shoot_3a09` +- SEO关键词: `新品,热卖,爆款` + +**状态设置:** +- 商品上架: `Y` +- 需要物流: `Y` +- 商品收税: `Y` +- 启用虚拟销量: `N` +- 虚拟销量值: `100` +- 跟踪库存: `Y` +- 库存规则*: `1` + +**分类信息:** +- 专辑名称: `衬衣,热卖` +- 标签: `新品,热卖,爆款` +- 供应商名称: `Amazon` +- 供应商URL: `https://www.amazon.com/Legendary-Whitetails-Buck-Flannels-Large/dp/B01KTUMBOI/ref=sr_1_1?s=fashion-mens-intl-ship&ie=UTF8&qid=1543038722&sr=1-1` + +**款式信息:** +- 款式1: `SIZE` +- 款式2: `COLOR` + +**其他信息:** +- 商品图片*: `https://m.media-amazon.com/images/S/aplus-seller-content-images-us-east-1/ATVPDKIKX0DER/A1ABYS4IVXNT9X/27b061cf-ccd1-4c4c-82fd-6519a84b60c3.jpg` + +#### 行 5 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `S` +- 款式2: `red` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-RD-S1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +#### 行 6 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `S` +- 款式2: `black` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-BK-S1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +#### 行 7 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `S` +- 款式2: `army` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-AM-S1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +#### 行 8 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `L` +- 款式2: `red` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-RD-M1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +#### 行 9 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `L` +- 款式2: `black` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-BK-M1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +#### 行 10 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `L` +- 款式2: `army` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-AM-M1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +#### 行 11 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `XL` +- 款式2: `red` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-RD-L1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +#### 行 12 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `XL` +- 款式2: `black` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-BK-L1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +#### 行 13 - 商品属性: P + +**基本信息:** +- 商品标题*: `Legendary Whitetails Men's Buck Camp Flannel Shirt` +- 商品属性*: `P` + +**款式信息:** +- 款式1: `XL` +- 款式2: `army` + +**价格库存:** +- 商品售价*: `88.99` +- 商品原价: `149.99` +- 商品SKU: `LW-TS-AM-L1` +- 商品重量: `0.2` +- 重量单位: `kg` +- 商品库存: `100` + +--- + +### 商品示例:Jabra Move Wireless Stereo Headphones + +#### 行 14 - 商品属性: S + +**基本信息:** +- 商品标题*: `Jabra Move Wireless Stereo Headphones` +- 商品属性*: `S` +- 商品副标题: `Listen to your tunes and never miss a call +Ultra-lightweight and adjustable headband fits all head-types +Durable stainless steel headband and dirt resistant fabric for life on the move.40mm Dynamic...` +- 商品描述: `Clean Scandinavian design meets crisp digital sound. Get a perfect fit with the ultra-light, comfortable headband. Choose from four modern colors: Cayenne (Red), Cobalt (Blue), Coal (Black) and Gold.` + +**SEO信息:** +- SEO URL Handle: `products/maternity-rainbow-stripe-skinny-bodycon-dress_5027` + +**状态设置:** +- 商品上架: `Y` +- 需要物流: `Y` +- 商品收税: `N` +- 启用虚拟销量: `N` +- 虚拟销量值: `100` +- 跟踪库存: `N` +- 库存规则*: `2` + +**分类信息:** +- 专辑名称: `头戴式耳机` +- 标签: `耳机,头戴式,爆款` +- 供应商名称: `Amazon` +- 供应商URL: `https://www.amazon.com/Jabra-Move-Wireless-Stereo-Headphones/dp/B00MR8Z28S/ref=br_asw_pdt-2?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=&pf_rd_r=8NAPN9R973J85P2KDX4G&pf_rd_t=36701&pf_rd_p=a08731ea-e1c2-4f7f-a56e...` + +**价格库存:** +- 商品售价*: `49.99` +- 商品原价: `99.99` +- 商品SKU: `B00MR8Z28S` +- 商品重量: `0.3` +- 重量单位: `lb` +- 商品库存: `500` + +**其他信息:** +- 商品图片*: `https://images-na.ssl-images-amazon.com/images/I/416k5ZUd6lL.jpg` + +--- + + +## 重要说明 + +### 商品属性字段说明 + +商品属性字段是必填项,用于标识商品类型: + +- **M (商品主体)**: 多款式商品的首行,填写商品主体信息 + - 需要填写:商品标题、描述、SEO信息、分类等商品主体信息 + - 不需要填写:价格、库存、SKU等子款式信息 + +- **P (子款式)**: 多款式商品的非首行,填写子款式信息 + - 需要填写:商品标题(与M行一致)、款式值、价格、库存、SKU等 + - 不需要填写:商品描述、SEO信息等商品主体信息 + +- **S (单一款式商品)**: 单一款式商品,一行包含所有信息 + - 需要填写:所有商品信息,包括标题、描述、价格、库存等 + +### 字段填写规则 + +1. **商品ID**: 系统自动生成,新增商品无需填写 +2. **商品标题***: 必填,最多255字符。同一商品的子款式标题必须与商品主体标题一致 +3. **商品属性***: 必填,只能填写 M、P 或 S +4. **商品售价***: 必填,最多9位正整数,2位小数 +5. **商品SKU**: 必填,最多255字符 +6. **商品图片***: 必填(M和S类型),可填写多个图片URL,用逗号分隔 +7. **库存规则***: 当跟踪库存为Y时必填,可选值:1(库存为0允许购买)、2(库存为0不允许购买)、3(库存为0自动下架) + +### 数据格式要求 + +- 日期时间格式:`YYYY-MM-DD HH:MM:SS` +- 布尔值:使用 `Y` 或 `N` +- 多个值:使用英文逗号分隔 +- HTML内容:商品描述支持HTML代码 +- URL格式:图片URL和供应商URL需为完整URL + +### 注意事项 + +1. 同一商品的子款式行必须紧跟在商品主体行之后,中间不能插入其他商品 +2. 商品标题在所有子款式行中必须保持一致 +3. 图片URL必须是可访问的完整URL +4. 价格、库存等数值字段不能包含非数字字符(除小数点外) +5. 导入前请确保所有必填字段都已正确填写 + diff --git a/docs/索引字段说明.md b/docs/索引字段说明.md index 807f46e..bfc5c9a 100644 --- a/docs/索引字段说明.md +++ b/docs/索引字段说明.md @@ -16,10 +16,127 @@ - 返回的结果格式约定为店匠系列的 SPU/SKU嵌套结构。 - 支撑 facet/过滤/排序业务需求:用户可以选择任何一个 keyword 或 HKText 类型的字段做筛选、聚合;也可以选择任何一个数值型字段做 Range 过滤或排序。 -## 数据源调研 +## 关键字段 -店匠的商品结构: +参考1:spu表 & sku表、数据源《商品导入模板》 +### SPU表(shoplazza_product_spu) + +主要字段: +- `id`: BIGINT - 主键ID +- `tenant_id`: BIGINT - 租户ID +- `handle`: VARCHAR(255) - URL handle +- `title`: VARCHAR(512) - 商品标题 +- `brief`: VARCHAR(512) - 商品简介 +- `description`: TEXT - 商品描述 +- `vendor`: VARCHAR(255) - 供应商/品牌 +- `category`: VARCHAR(255) - 类目 +- `tags`: VARCHAR(1024) - 标签 +- `seo_title`: VARCHAR(512) - SEO标题 +- `seo_description`: TEXT - SEO描述 +- `seo_keywords`: VARCHAR(1024) - SEO关键词 +- `image_src`: VARCHAR(500) - 图片URL +- `create_time`: DATETIME - 创建时间 +- `update_time`: DATETIME - 更新时间 +- `shoplazza_created_at`: DATETIME - 店匠创建时间 +- `shoplazza_updated_at`: DATETIME - 店匠更新时间 + +spu表全部字段 +"Field" "Type" "Null" "Key" "Default" "Extra" +"id" "bigint(20)" "NO" "PRI" "auto_increment" +"shop_id" "bigint(20)" "NO" "MUL" "" +"shoplazza_id" "varchar(64)" "NO" "" "" +"handle" "varchar(255)" "YES" "MUL" "" +"title" "varchar(500)" "NO" "" "" +"brief" "varchar(1000)" "YES" "" "" +"description" "text" "YES" "" "" +"spu" "varchar(100)" "YES" "" "" +"vendor" "varchar(255)" "YES" "" "" +"vendor_url" "varchar(500)" "YES" "" "" +"seo_title" "varchar(500)" "YES" "" "" +"seo_description" "text" "YES" "" "" +"seo_keywords" "text" "YES" "" "" +"image_src" "varchar(500)" "YES" "" "" +"image_width" "int(11)" "YES" "" "" +"image_height" "int(11)" "YES" "" "" +"image_path" "varchar(255)" "YES" "" "" +"image_alt" "varchar(500)" "YES" "" "" +"inventory_policy" "varchar(50)" "YES" "" "" +"inventory_quantity" "int(11)" "YES" "" "0" "" +"inventory_tracking" "tinyint(1)" "YES" "" "0" "" +"published" "tinyint(1)" "YES" "" "0" "" +"published_at" "datetime" "YES" "MUL" "" +"requires_shipping" "tinyint(1)" "YES" "" "1" "" +"taxable" "tinyint(1)" "YES" "" "0" "" +"fake_sales" "int(11)" "YES" "" "0" "" +"display_fake_sales" "tinyint(1)" "YES" "" "0" "" +"mixed_wholesale" "tinyint(1)" "YES" "" "0" "" +"need_variant_image" "tinyint(1)" "YES" "" "0" "" +"has_only_default_variant" "tinyint(1)" "YES" "" "0" "" +"tags" "text" "YES" "" "" +"note" "text" "YES" "" "" +"category" "varchar(255)" "YES" "" "" +"shoplazza_created_at" "datetime" "YES" "" "" +"shoplazza_updated_at" "datetime" "YES" "MUL" "" +"tenant_id" "bigint(20)" "NO" "MUL" "" +"creator" "varchar(64)" "YES" "" "" "" +"create_time" "datetime" "NO" "" "CURRENT_TIMESTAMP" "" +"updater" "varchar(64)" "YES" "" "" "" +"update_time" "datetime" "NO" "" "CURRENT_TIMESTAMP" "on update CURRENT_TIMESTAMP" +"deleted" "bit(1)" "NO" "" "b'0'" "" + +### SKU表(shoplazza_product_sku) + +主要字段: +- `id`: BIGINT - 主键ID(对应variant_id) +- `spu_id`: BIGINT - SPU ID(关联字段) +- `title`: VARCHAR(500) - 变体标题 +- `price`: DECIMAL(10,2) - 价格 +- `compare_at_price`: DECIMAL(10,2) - 原价 +- `sku`: VARCHAR(100) - SKU编码 +- `inventory_quantity`: INT(11) - 库存数量 +- `option1`: VARCHAR(255) - 选项1 +- `option2`: VARCHAR(255) - 选项2 +- `option3`: VARCHAR(255) - 选项3 + +sku全部字段 +"Field" "Type" "Null" "Key" "Default" "Extra" +"id" "bigint(20)" "NO" "PRI" "auto_increment" +"spu_id" "bigint(20)" "NO" "MUL" "" +"shop_id" "bigint(20)" "NO" "MUL" "" +"shoplazza_id" "varchar(64)" "NO" "" "" +"shoplazza_product_id" "varchar(64)" "NO" "MUL" "" +"shoplazza_image_id" "varchar(64)" "YES" "" "" +"title" "varchar(500)" "YES" "" "" +"sku" "varchar(100)" "YES" "MUL" "" +"barcode" "varchar(100)" "YES" "" "" +"position" "int(11)" "YES" "" "0" "" +"price" "decimal(10,2)" "YES" "" "" +"compare_at_price" "decimal(10,2)" "YES" "" "" +"cost_price" "decimal(10,2)" "YES" "" "" +"option1" "varchar(255)" "YES" "" "" +"option2" "varchar(255)" "YES" "" "" +"option3" "varchar(255)" "YES" "" "" +"inventory_quantity" "int(11)" "YES" "" "0" "" +"weight" "decimal(10,2)" "YES" "" "" +"weight_unit" "varchar(10)" "YES" "" "" +"image_src" "varchar(500)" "YES" "" "" +"wholesale_price" "json" "YES" "" "" +"note" "text" "YES" "" "" +"extend" "json" "YES" "" "" +"shoplazza_created_at" "datetime" "YES" "" "" +"shoplazza_updated_at" "datetime" "YES" "" "" +"tenant_id" "bigint(20)" "NO" "MUL" "" +"creator" "varchar(64)" "YES" "" "" "" +"create_time" "datetime" "NO" "" "CURRENT_TIMESTAMP" "" +"updater" "varchar(64)" "YES" "" "" "" +"update_time" "datetime" "NO" "" "CURRENT_TIMESTAMP" "on update CURRENT_TIMESTAMP" +"deleted" "bit(1)" "NO" "" "b'0'" "" + +参考2:波哥 索引《店匠指南》 -> 商品详解 + + +参考3: 店匠的商品结构 - 民丰: 固定字段: 1、必填:品名 2、非必填:副标题、类目、专辑、标签、供应商、市场 @@ -33,7 +150,110 @@ 商品信息:品名、颜色、尺码,基本上就这三个信息 -spu/sku表: + +### 分类 +最多三级分类,商品可以指向任何级别的分类 +Field Type +category varchar(255) +category_id bigint(20) +category_google_id bigint(20) +category_level int(11) +category_path varchar(500) + + +### 属性 +1. 组织ES输入数据的时候,需要为sku拼接spu的 option1 option2 option3,作为属性名称(比如“颜色”),sku的 option1 option2 option3 作为属性值(比如“白色”) +2. 有以下方案: TODO 可以选择其中一种,或者2用于填充3用于搜索: +1)铺平展开,只支持三个 +attr1 +attr2 +attr3 +option1 +option2 +option3 +2)nested,支持多个,动态。 查询性能低 +"specifications": { + "type": "nested", + "properties": { + "name": { "type": "keyword" }, // "颜色", "容量" + "value": { "type": "keyword" } // "白色", "256GB" + } +}, +3)平铺展开。写入时从 specifications 提取并填充这些字段,查询性能高。 +"properties": { + "color": { "type": "keyword" }, + "capacity": { "type": "keyword" }, + "network": { "type": "keyword" }, + "edition": { "type": "keyword" } +} + +### status +1. 商品下架等状态 +2. 无库存,是用status记录,提升查询效率 + +### 多语言 +索引:中英文(两套),如果商品资料是中文,则我们系统使用 谷歌翻译(和平台使用的翻译工具对齐) 自动翻译为英文。 +检索:将非中英文,翻译成英文后,再检索英文。 + +## 分面 +1. 分类 +2. 标签。这个是个扁平的结构,不是像属性那样k-v-pair的 +不用考虑属性。这个 option1 2 3不能放在外面筛选。他的定位是spu内部的东西,不是外部用于筛选商品的。 即使外面要有那种 动态筛选 比如搜手机出品牌、款式 这种, 不是对应的这个字段,会是另外的字段。 + + +### 字段预处理 +需要用「英文逗号」隔开,作为list输入的字段: +SEO关键词 +专辑名称 +标签 +尺寸信息 + + +## SPU 与 SKU 的协同设计 + +以下方案: +1. sku为索引单位。使用 collapse 按 spu_id 折叠 +需要考虑大量的字段冗余 + +2. spu为单位。 sku的title作为 spu 的sku_titles 属性。 + 除了title, brielf description seo相关 cate tags vendor所有影响相关性的字段都在spu。 sku只有一个title。所以,可以以spu为单位,sku的title作为spu的一个字段,以list形式灌入,假设一个spu有三个sku,那么这个sku_titles字段有三个值,打分的时候按max取得打分,并且我们可以得到这三个sku的title匹配的得分,因此好决定sku的排序。 + +3. sku 作为nested + + + +参考 [](https://blog.csdn.net/csdn_tom_168/article/details/150432666) +方案一:独立索引 +spu-catalog-*:用于品牌、类目、商品介绍等宏观搜索。 +sku-catalog-*:用于具体规格搜索、下单。 +优势:职责分离,查询更高效。 + +方案二:联合查询 +GET /spu-read,sku-read/_search +适用于“模糊搜索 → 跳转详情页”的场景。 + +方案三:父子文档(不推荐) +join 类型维护 SPU-SKU 关系。 +性能差,维护复杂,不适用于高并发搜索场景。 + + +## rank - 相关性 + + +## rank - 提权 + +function_score 提升相关性 +```json +"function_score": { + "functions": [ + { "field_value_factor": { "field": "sales_count", "factor": 0.001, "modifier": "log1p" } }, + { "gauss": { "listed_at": { "scale": "30d" } } } + ], + "boost_mode": "multiply" +} +``` + + ## 索引基本信息 @@ -42,6 +262,17 @@ spu/sku表: - **索引级别**: SPU级别(商品级别) - **数据结构**: SPU文档包含嵌套的skus数组 + +## 分片与副本设置 +```json +# 分片数: 根据cpu数量和sku数量决定。 +"settings": { + "number_of_shards": 8, + "number_of_replicas": 1, + "refresh_interval": "15s" +} +``` + ## 索引类型与处理说明 ### 文本字段 @@ -186,7 +417,7 @@ spu/sku表: | 索引字段名 | ES字段类型 | 是否索引 | 数据来源表 | 表中字段名 | 表中字段类型 | Boost权重 | 是否返回 | 数据预处理 | 说明 | |-----------|-----------|---------|-----------|-----------|-------------|-----------|---------|-------------|------| | vendor | HKText | 是 | SPU表 | vendor | VARCHAR(255) | 1.5 | 是 | | 供应商/品牌,HKText字段自动提供 `vendor.keyword` 用于过滤、聚合 | -| tags | HKText | 是 | SPU表 | tags | VARCHAR(1024) | 1.0 | 是 | | 标签字段,支持模糊搜索;使用 `tags.keyword` 进行精确过滤 | +| tags | HKText | 是 | SPU表 | tags | VARCHAR(1024) | 1.0 | 是 | 按逗号分割为list | 标签字段,支持模糊搜索;使用 `tags.keyword` 进行精确过滤 | | category | HKText | 是 | SPU表 | category | VARCHAR(255) | 1.5 | 是 | | 类目字段,使用 `category.keyword` 进行过滤/分面 | ### 价格字段 @@ -306,124 +537,6 @@ spu/sku表: 4. **向量搜索**: title_embedding字段用于语义搜索,需要配合文本查询使用 5. **Boost权重**: 不同字段的boost权重影响搜索结果的相关性排序 -## 数据来源表结构 - -### SPU表(shoplazza_product_spu) - -主要字段: -- `id`: BIGINT - 主键ID -- `tenant_id`: BIGINT - 租户ID -- `handle`: VARCHAR(255) - URL handle -- `title`: VARCHAR(512) - 商品标题 -- `brief`: VARCHAR(512) - 商品简介 -- `description`: TEXT - 商品描述 -- `vendor`: VARCHAR(255) - 供应商/品牌 -- `category`: VARCHAR(255) - 类目 -- `tags`: VARCHAR(1024) - 标签 -- `seo_title`: VARCHAR(512) - SEO标题 -- `seo_description`: TEXT - SEO描述 -- `seo_keywords`: VARCHAR(1024) - SEO关键词 -- `image_src`: VARCHAR(500) - 图片URL -- `create_time`: DATETIME - 创建时间 -- `update_time`: DATETIME - 更新时间 -- `shoplazza_created_at`: DATETIME - 店匠创建时间 -- `shoplazza_updated_at`: DATETIME - 店匠更新时间 - -spu表全部字段 -"Field" "Type" "Null" "Key" "Default" "Extra" -"id" "bigint(20)" "NO" "PRI" "auto_increment" -"shop_id" "bigint(20)" "NO" "MUL" "" -"shoplazza_id" "varchar(64)" "NO" "" "" -"handle" "varchar(255)" "YES" "MUL" "" -"title" "varchar(500)" "NO" "" "" -"brief" "varchar(1000)" "YES" "" "" -"description" "text" "YES" "" "" -"spu" "varchar(100)" "YES" "" "" -"vendor" "varchar(255)" "YES" "" "" -"vendor_url" "varchar(500)" "YES" "" "" -"seo_title" "varchar(500)" "YES" "" "" -"seo_description" "text" "YES" "" "" -"seo_keywords" "text" "YES" "" "" -"image_src" "varchar(500)" "YES" "" "" -"image_width" "int(11)" "YES" "" "" -"image_height" "int(11)" "YES" "" "" -"image_path" "varchar(255)" "YES" "" "" -"image_alt" "varchar(500)" "YES" "" "" -"inventory_policy" "varchar(50)" "YES" "" "" -"inventory_quantity" "int(11)" "YES" "" "0" "" -"inventory_tracking" "tinyint(1)" "YES" "" "0" "" -"published" "tinyint(1)" "YES" "" "0" "" -"published_at" "datetime" "YES" "MUL" "" -"requires_shipping" "tinyint(1)" "YES" "" "1" "" -"taxable" "tinyint(1)" "YES" "" "0" "" -"fake_sales" "int(11)" "YES" "" "0" "" -"display_fake_sales" "tinyint(1)" "YES" "" "0" "" -"mixed_wholesale" "tinyint(1)" "YES" "" "0" "" -"need_variant_image" "tinyint(1)" "YES" "" "0" "" -"has_only_default_variant" "tinyint(1)" "YES" "" "0" "" -"tags" "text" "YES" "" "" -"note" "text" "YES" "" "" -"category" "varchar(255)" "YES" "" "" -"shoplazza_created_at" "datetime" "YES" "" "" -"shoplazza_updated_at" "datetime" "YES" "MUL" "" -"tenant_id" "bigint(20)" "NO" "MUL" "" -"creator" "varchar(64)" "YES" "" "" "" -"create_time" "datetime" "NO" "" "CURRENT_TIMESTAMP" "" -"updater" "varchar(64)" "YES" "" "" "" -"update_time" "datetime" "NO" "" "CURRENT_TIMESTAMP" "on update CURRENT_TIMESTAMP" -"deleted" "bit(1)" "NO" "" "b'0'" "" - - - - -### SKU表(shoplazza_product_sku) - -主要字段: -- `id`: BIGINT - 主键ID(对应variant_id) -- `spu_id`: BIGINT - SPU ID(关联字段) -- `title`: VARCHAR(500) - 变体标题 -- `price`: DECIMAL(10,2) - 价格 -- `compare_at_price`: DECIMAL(10,2) - 原价 -- `sku`: VARCHAR(100) - SKU编码 -- `inventory_quantity`: INT(11) - 库存数量 -- `option1`: VARCHAR(255) - 选项1 -- `option2`: VARCHAR(255) - 选项2 -- `option3`: VARCHAR(255) - 选项3 - -sku全部字段 -"Field" "Type" "Null" "Key" "Default" "Extra" -"id" "bigint(20)" "NO" "PRI" "auto_increment" -"spu_id" "bigint(20)" "NO" "MUL" "" -"shop_id" "bigint(20)" "NO" "MUL" "" -"shoplazza_id" "varchar(64)" "NO" "" "" -"shoplazza_product_id" "varchar(64)" "NO" "MUL" "" -"shoplazza_image_id" "varchar(64)" "YES" "" "" -"title" "varchar(500)" "YES" "" "" -"sku" "varchar(100)" "YES" "MUL" "" -"barcode" "varchar(100)" "YES" "" "" -"position" "int(11)" "YES" "" "0" "" -"price" "decimal(10,2)" "YES" "" "" -"compare_at_price" "decimal(10,2)" "YES" "" "" -"cost_price" "decimal(10,2)" "YES" "" "" -"option1" "varchar(255)" "YES" "" "" -"option2" "varchar(255)" "YES" "" "" -"option3" "varchar(255)" "YES" "" "" -"inventory_quantity" "int(11)" "YES" "" "0" "" -"weight" "decimal(10,2)" "YES" "" "" -"weight_unit" "varchar(10)" "YES" "" "" -"image_src" "varchar(500)" "YES" "" "" -"wholesale_price" "json" "YES" "" "" -"note" "text" "YES" "" "" -"extend" "json" "YES" "" "" -"shoplazza_created_at" "datetime" "YES" "" "" -"shoplazza_updated_at" "datetime" "YES" "" "" -"tenant_id" "bigint(20)" "NO" "MUL" "" -"creator" "varchar(64)" "YES" "" "" "" -"create_time" "datetime" "NO" "" "CURRENT_TIMESTAMP" "" -"updater" "varchar(64)" "YES" "" "" "" -"update_time" "datetime" "NO" "" "CURRENT_TIMESTAMP" "on update CURRENT_TIMESTAMP" -"deleted" "bit(1)" "NO" "" "b'0'" "" - ## TODO 多语言问题。 diff --git a/scripts/csv_to_excel.py b/scripts/csv_to_excel.py new file mode 100755 index 0000000..7dae590 --- /dev/null +++ b/scripts/csv_to_excel.py @@ -0,0 +1,355 @@ +#!/usr/bin/env python3 +""" +Convert CSV data to Excel import template. + +Reads CSV file (goods_with_pic.5years_congku.csv.shuf.1w) and generates Excel file +based on the template format (商品导入模板.xlsx). + +Each CSV row corresponds to 1 SPU and 1 SKU, which will be exported as a single +S (Single variant) row in the Excel template. +""" + +import sys +import os +import csv +import random +import argparse +import re +from pathlib import Path +from datetime import datetime, timedelta +import pandas as pd +from openpyxl import load_workbook +from openpyxl.styles import Font, Alignment +from openpyxl.utils import get_column_letter + +# Add parent directory to path +sys.path.insert(0, str(Path(__file__).parent.parent)) + + +def clean_value(value): + """ + Clean and normalize value. + + Args: + value: Value to clean + + Returns: + Cleaned string value + """ + if value is None: + return '' + value = str(value).strip() + # Remove surrounding quotes + if value.startswith('"') and value.endswith('"'): + value = value[1:-1] + return value + + +def parse_csv_row(row: dict) -> dict: + """ + Parse CSV row and extract fields. + + Args: + row: CSV row dictionary + + Returns: + Parsed data dictionary + """ + return { + 'skuId': clean_value(row.get('skuId', '')), + 'name': clean_value(row.get('name', '')), + 'name_pinyin': clean_value(row.get('name_pinyin', '')), + 'create_time': clean_value(row.get('create_time', '')), + 'ruSkuName': clean_value(row.get('ruSkuName', '')), + 'enSpuName': clean_value(row.get('enSpuName', '')), + 'categoryName': clean_value(row.get('categoryName', '')), + 'supplierName': clean_value(row.get('supplierName', '')), + 'brandName': clean_value(row.get('brandName', '')), + 'file_id': clean_value(row.get('file_id', '')), + 'days_since_last_update': clean_value(row.get('days_since_last_update', '')), + 'id': clean_value(row.get('id', '')), + 'imageUrl': clean_value(row.get('imageUrl', '')) + } + + +def generate_handle(title: str) -> str: + """ + Generate URL-friendly handle from title. + + Args: + title: Product title + + Returns: + URL-friendly handle (ASCII only) + """ + # Convert to lowercase + handle = title.lower() + + # Remove non-ASCII characters, keep only letters, numbers, spaces, and hyphens + handle = re.sub(r'[^a-z0-9\s-]', '', handle) + + # Replace spaces and multiple hyphens with single hyphen + handle = re.sub(r'[-\s]+', '-', handle) + handle = handle.strip('-') + + # Limit length + if len(handle) > 255: + handle = handle[:255] + + return handle or 'product' + + +def read_csv_file(csv_file: str) -> list: + """ + Read CSV file and return list of parsed rows. + + Args: + csv_file: Path to CSV file + + Returns: + List of parsed CSV data dictionaries + """ + csv_data_list = [] + + with open(csv_file, 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + parsed = parse_csv_row(row) + csv_data_list.append(parsed) + + return csv_data_list + + +def csv_to_excel_row(csv_data: dict) -> dict: + """ + Convert CSV data row to Excel template row. + + Each CSV row represents a single product with one variant (S type in Excel). + + Args: + csv_data: Parsed CSV row data + + Returns: + Dictionary mapping Excel column names to values + """ + # Parse create_time + try: + created_at = datetime.strptime(csv_data['create_time'], '%Y-%m-%d %H:%M:%S') + create_time_str = created_at.strftime('%Y-%m-%d %H:%M:%S') + except: + created_at = datetime.now() - timedelta(days=random.randint(1, 365)) + create_time_str = created_at.strftime('%Y-%m-%d %H:%M:%S') + + # Generate title - use name or enSpuName + title = csv_data['name'] or csv_data['enSpuName'] or 'Product' + + # Generate handle - prefer enSpuName, then name_pinyin, then title + handle_source = csv_data['enSpuName'] or csv_data['name_pinyin'] or title + handle = generate_handle(handle_source) + if handle and not handle.startswith('products/'): + handle = f'products/{handle}' + + # Generate SEO fields + seo_title = f"{title} - {csv_data['categoryName']}" if csv_data['categoryName'] else title + seo_description = f"购买{csv_data['brandName']}{title}" if csv_data['brandName'] else title + seo_keywords_parts = [title] + if csv_data['categoryName']: + seo_keywords_parts.append(csv_data['categoryName']) + if csv_data['brandName']: + seo_keywords_parts.append(csv_data['brandName']) + seo_keywords = ','.join(seo_keywords_parts) + + # Generate tags from category and brand + tags_parts = [] + if csv_data['categoryName']: + tags_parts.append(csv_data['categoryName']) + if csv_data['brandName']: + tags_parts.append(csv_data['brandName']) + tags = ','.join(tags_parts) if tags_parts else '' + + # Generate prices (similar to import_tenant2_csv.py) + price = round(random.uniform(50, 500), 2) + compare_at_price = round(price * random.uniform(1.2, 1.5), 2) + cost_price = round(price * 0.6, 2) + + # Generate random stock + inventory_quantity = random.randint(0, 100) + + # Generate random weight + weight = round(random.uniform(0.1, 5.0), 2) + weight_unit = 'kg' + + # Use ruSkuName as SKU title, fallback to name + sku_title = csv_data['ruSkuName'] or csv_data['name'] or 'SKU' + + # Use skuId as SKU code + sku_code = csv_data['skuId'] or '' + + # Generate barcode + try: + sku_id = int(csv_data['skuId']) + barcode = f"BAR{sku_id:08d}" + except: + barcode = '' + + # Build description + description = f"

{csv_data['name']}

" if csv_data['name'] else '' + + # Build brief (subtitle) + brief = csv_data['name'] or '' + + # Excel row data (mapping to Excel template columns) + excel_row = { + '商品ID': '', # Empty for new products + '创建时间': create_time_str, + '商品标题*': title, + '商品属性*': 'S', # Single variant product + '商品副标题': brief, + '商品描述': description, + 'SEO标题': seo_title, + 'SEO描述': seo_description, + 'SEO URL Handle': handle, + 'SEO URL 重定向': 'N', # Default to N + 'SEO关键词': seo_keywords, + '商品上架': 'Y', # Published by default + '需要物流': 'Y', # Requires shipping + '商品收税': 'N', # Not taxable by default + '商品spu': '', # Empty + '启用虚拟销量': 'N', # No fake sales + '虚拟销量值': '', # Empty + '跟踪库存': 'Y', # Track inventory + '库存规则*': '1', # Allow purchase when stock is 0 + '专辑名称': csv_data['categoryName'] or '', # Category as album + '标签': tags, + '供应商名称': csv_data['supplierName'] or '', + '供应商URL': '', # Empty + '款式1': '', # Not used for S type + '款式2': '', # Not used for S type + '款式3': '', # Not used for S type + '商品售价*': price, + '商品原价': compare_at_price, + '成本价': cost_price, + '商品SKU': sku_code, + '商品重量': weight, + '重量单位': weight_unit, + '商品条形码': barcode, + '商品库存': inventory_quantity, + '尺寸信息': '', # Empty + '原产地国别': '', # Empty + 'HS(协调制度)代码': '', # Empty + '商品图片*': csv_data['imageUrl'] or '', # Image URL + '商品备注': '', # Empty + '款式备注': '', # Empty + '商品主图': csv_data['imageUrl'] or '', # Main image URL + } + + return excel_row + + +def create_excel_from_template(template_file: str, output_file: str, csv_data_list: list): + """ + Create Excel file from template and fill with CSV data. + + Args: + template_file: Path to Excel template file + output_file: Path to output Excel file + csv_data_list: List of parsed CSV data dictionaries + """ + # Load template + wb = load_workbook(template_file) + ws = wb.active # Use the active sheet (Sheet4) + + # Find header row (row 2, index 1) + header_row_idx = 2 # Row 2 in Excel (1-based, but header is at index 1 in pandas) + + # Get column mapping from header row + column_mapping = {} + for col_idx in range(1, ws.max_column + 1): + cell_value = ws.cell(row=header_row_idx, column=col_idx).value + if cell_value: + column_mapping[cell_value] = col_idx + + # Start writing data from row 4 (after header and instructions) + data_start_row = 4 # Row 4 in Excel (1-based) + + # Clear existing data rows (from row 4 onwards, but keep header and instructions) + # Find the last row with data in the template + last_template_row = ws.max_row + if last_template_row >= data_start_row: + # Clear data rows (keep header and instruction rows) + for row in range(data_start_row, last_template_row + 1): + for col in range(1, ws.max_column + 1): + ws.cell(row=row, column=col).value = None + + # Convert CSV data to Excel rows + for row_idx, csv_data in enumerate(csv_data_list): + excel_row = csv_to_excel_row(csv_data) + excel_row_num = data_start_row + row_idx + + # Write each field to corresponding column + for field_name, col_idx in column_mapping.items(): + if field_name in excel_row: + cell = ws.cell(row=excel_row_num, column=col_idx) + value = excel_row[field_name] + cell.value = value + + # Set alignment for text fields + if isinstance(value, str): + cell.alignment = Alignment(vertical='top', wrap_text=True) + elif isinstance(value, (int, float)): + cell.alignment = Alignment(vertical='top') + + # Save workbook + wb.save(output_file) + print(f"Excel file created: {output_file}") + print(f" - Total rows: {len(csv_data_list)}") + + +def main(): + parser = argparse.ArgumentParser(description='Convert CSV data to Excel import template') + parser.add_argument('--csv-file', + default='data/customer1/goods_with_pic.5years_congku.csv.shuf.1w', + help='CSV file path (default: data/customer1/goods_with_pic.5years_congku.csv.shuf.1w)') + parser.add_argument('--template', + default='docs/商品导入模板.xlsx', + help='Excel template file path (default: docs/商品导入模板.xlsx)') + parser.add_argument('--output', + default='商品导入数据.xlsx', + help='Output Excel file path (default: 商品导入数据.xlsx)') + parser.add_argument('--limit', + type=int, + default=None, + help='Limit number of rows to process (default: all)') + + args = parser.parse_args() + + # Check if files exist + if not os.path.exists(args.csv_file): + print(f"Error: CSV file not found: {args.csv_file}") + sys.exit(1) + + if not os.path.exists(args.template): + print(f"Error: Template file not found: {args.template}") + sys.exit(1) + + # Read CSV file + print(f"Reading CSV file: {args.csv_file}") + csv_data_list = read_csv_file(args.csv_file) + print(f"Read {len(csv_data_list)} rows from CSV") + + # Limit rows if specified + if args.limit: + csv_data_list = csv_data_list[:args.limit] + print(f"Limited to {len(csv_data_list)} rows") + + # Create Excel file + print(f"Creating Excel file from template: {args.template}") + print(f"Output file: {args.output}") + create_excel_from_template(args.template, args.output, csv_data_list) + + print(f"\nDone! Generated {len(csv_data_list)} product rows in Excel file.") + + +if __name__ == '__main__': + main() + diff --git a/scripts/tenant3__csv_to_shoplazza_xlsx.sh b/scripts/tenant3__csv_to_shoplazza_xlsx.sh new file mode 100644 index 0000000..bf93793 --- /dev/null +++ b/scripts/tenant3__csv_to_shoplazza_xlsx.sh @@ -0,0 +1,18 @@ +# 激活环境 +source /home/tw/miniconda3/etc/profile.d/conda.sh +conda activate searchengine + +# # 基本使用(生成所有数据) +# python scripts/csv_to_excel.py + +# # 指定输出文件 +# python scripts/csv_to_excel.py --output tenant3_imports.xlsx + +# # 限制处理行数(用于测试) +# python scripts/csv_to_excel.py --limit 100 + +# 指定CSV文件和模板文件 +python scripts/csv_to_excel.py \ + --csv-file data/customer1/goods_with_pic.5years_congku.csv.shuf.1w \ + --template docs/商品导入模板.xlsx \ + --output tenant3_imports.xlsx \ No newline at end of file -- libgit2 0.21.2