Commit cd29428b86b6a3c1f2c6b3663c57e750decf660c
1 parent
f3c11fef
亚马逊数据导入店匠店铺 - 数据处理
Showing
3 changed files
with
628 additions
and
23 deletions
Show diff stats
| ... | ... | @@ -0,0 +1,136 @@ |
| 1 | +## 亚马逊格式数据 → 店匠(Shoplazza)商品导入模板:转换说明 | |
| 2 | + | |
| 3 | +本仓库支持把 `data/mai_jia_jing_ling/products_data/*.xlsx`(**亚马逊格式导出**)转换为店匠后台可导入的 `docs/商品导入模板.xlsx` 格式。 | |
| 4 | + | |
| 5 | +对应脚本: | |
| 6 | +- **主入口**:`scripts/amazon_xlsx_to_shoplazza_xlsx.py` | |
| 7 | +- **历史兼容**:`scripts/competitor_xlsx_to_shoplazza_xlsx.py`(仅名称过时,逻辑一致) | |
| 8 | +- **模板写入复用工具**:`scripts/shoplazza_excel_template.py` | |
| 9 | + | |
| 10 | +--- | |
| 11 | + | |
| 12 | +## 一、输入数据(亚马逊格式 xlsx)的关键字段 | |
| 13 | + | |
| 14 | +以 `Competitor-US-Last-30-days-363464.xlsx` 为例(文件名不影响:内容是亚马逊维度字段): | |
| 15 | + | |
| 16 | +- **ASIN**:变体 id(我们视为 `sku_id`,会写入模板的 `商品SKU`) | |
| 17 | +- **父ASIN**:父商品 id(我们视为 `spu_id`/`product_id`,会写入模板的 `商品spu`,并用于分组 M/P) | |
| 18 | +- **商品标题**:商品标题(写入 `商品标题*`、SEO标题等) | |
| 19 | +- **SKU**:亚马逊变体描述字符串(关键:解析出多款式维度) | |
| 20 | + - 示例:`Size: One Size | Color: Black` | |
| 21 | +- **商品主图**:图片 URL(用于 `商品图片*` / `商品主图`) | |
| 22 | +- **价格($)** / **prime价格($)**:价格(用于 `商品售价*` / `商品原价`) | |
| 23 | +- **详细参数**:详情参数串(用于拼接 `商品描述`) | |
| 24 | +- **上架时间**:用于 `创建时间` | |
| 25 | +- **类目路径/大类目/小类目/品牌/商品详情页链接/品牌链接**:用于专辑、标签、SEO、供应商URL、备注等 | |
| 26 | +- **商品重量(单位换算)/商品重量/商品尺寸**:用于 `商品重量/重量单位/尺寸信息` | |
| 27 | + | |
| 28 | +> 注意:该数据源通常**没有库存**,脚本默认给每个变体一个固定库存(当前默认 100),以满足导入后的可用性。 | |
| 29 | + | |
| 30 | +--- | |
| 31 | + | |
| 32 | +## 二、输出数据(店匠导入模板)的核心规则(M / P / S) | |
| 33 | + | |
| 34 | +店匠模板在 `docs/商品导入模板说明.md` 中定义了三种商品属性(`商品属性*`): | |
| 35 | + | |
| 36 | +- **S(单一款式)**:一个商品只有一个变体(只有 1 个 ASIN) | |
| 37 | + - 输出 **1 行** | |
| 38 | +- **M(主商品)+ P(子款式)**:一个父商品(父ASIN)包含多个变体(多个 ASIN) | |
| 39 | + - 输出 **1 行 M + N 行 P** | |
| 40 | + - 且 **同一商品的 P 行必须紧跟在 M 行后面**(模板导入强约束) | |
| 41 | + | |
| 42 | +本仓库的转换策略: | |
| 43 | +- 对每个 `父ASIN` 分组: | |
| 44 | + - **分组 size = 1** → 生成 `S` | |
| 45 | + - **分组 size > 1** → 生成 `M` + 多个 `P` | |
| 46 | + | |
| 47 | +--- | |
| 48 | + | |
| 49 | +## 三、多款式(变体)是如何构造的(最关键部分) | |
| 50 | + | |
| 51 | +### 1)为什么 “SKU” 列是关键 | |
| 52 | + | |
| 53 | +亚马逊格式里,变体的“颜色/尺码”等信息往往并不拆成多个列,而是集中在 `SKU` 字符串里,例如: | |
| 54 | + | |
| 55 | +- `Size: One Size | Color: Black` | |
| 56 | +- `Color: Red | Style: 2-Pack` | |
| 57 | + | |
| 58 | +店匠模板的多款式需要: | |
| 59 | +- **M 行**:`款式1/款式2/款式3` 写“维度名”(例如 Size / Color / Material) | |
| 60 | +- **P 行**:`款式1/款式2/款式3` 写“维度值”(例如 One Size / Black / Cotton) | |
| 61 | + | |
| 62 | +### 2)脚本如何从 SKU 解析出维度(key/value) | |
| 63 | + | |
| 64 | +脚本会把 `SKU` 以 `|` 分割,再用 `:` 拆成 key/value: | |
| 65 | + | |
| 66 | +- 输入:`Size: One Size | Color: Black` | |
| 67 | +- 解析结果:`{ "Size": "One Size", "Color": "Black" }` | |
| 68 | + | |
| 69 | +### 3)如何从多个变体里选出 “最多3个维度” | |
| 70 | + | |
| 71 | +店匠模板只提供 `款式1~3` 三个维度,因此脚本会在一个 `父ASIN` 组内统计 key 的出现频次,并按优先级挑选最多 3 个维度: | |
| 72 | + | |
| 73 | +- 优先级大致为:`Size`、`Color`、`Style`、`Pattern`、`Material` …… | |
| 74 | +- 如果一个组里解析不到任何 key/value,则退化为单维度:`Variant` | |
| 75 | + - M 行 `款式1 = Variant` | |
| 76 | + - P 行 `款式1 = ASIN` | |
| 77 | + | |
| 78 | +### 4)M 行与 P 行分别填什么(避免导入报错) | |
| 79 | + | |
| 80 | +根据模板说明,脚本遵循以下分工: | |
| 81 | + | |
| 82 | +- **M 行(主商品)**: | |
| 83 | + - 填:标题/描述/SEO/专辑/标签/主图/款式维度名 | |
| 84 | + - 不填:价格、库存、重量等 SKU 级字段(保持为空更安全) | |
| 85 | +- **P 行(子款式)**: | |
| 86 | + - 填:款式维度值、价格、商品SKU(ASIN)、库存、重量、尺寸、(可选)子款式图 | |
| 87 | + - 不填:描述/SEO/专辑/供应商等 SPU 级字段(保持为空) | |
| 88 | + | |
| 89 | +--- | |
| 90 | + | |
| 91 | +## 四、字段映射总览(高频字段) | |
| 92 | + | |
| 93 | +- **商品spu** ← `父ASIN`(无父ASIN则用 ASIN) | |
| 94 | +- **商品SKU** ← `ASIN` | |
| 95 | +- **商品标题\*** ← `商品标题` | |
| 96 | +- **商品图片\*** / **商品主图** ← `商品主图` | |
| 97 | +- **商品售价\*** ← `prime价格($)` 优先,否则 `价格($)` | |
| 98 | +- **创建时间** ← `上架时间`(仅日期时补齐为 `YYYY-MM-DD 00:00:00`) | |
| 99 | +- **商品描述** ← `商品标题` + `详细参数`(以 HTML 拼接) | |
| 100 | +- **专辑名称** ← `大类目`(无则取 `类目路径` 第一段) | |
| 101 | +- **标签** ← `品牌,大类目,小类目` | |
| 102 | +- **商品重量/重量单位** ← 优先解析 `商品重量(单位换算)`(如 `68.04 g`) | |
| 103 | +- **尺寸信息** ← 解析 `商品尺寸` 前三段数字(英寸)拼成 `L,W,H` | |
| 104 | + | |
| 105 | +--- | |
| 106 | + | |
| 107 | +## 五、如何运行(生成导入文件) | |
| 108 | + | |
| 109 | +### 1)先小批量验证(推荐) | |
| 110 | + | |
| 111 | +```bash | |
| 112 | +python scripts/amazon_xlsx_to_shoplazza_xlsx.py \ | |
| 113 | + --input-dir data/mai_jia_jing_ling/products_data \ | |
| 114 | + --template docs/商品导入模板.xlsx \ | |
| 115 | + --output data/mai_jia_jing_ling/amazon_shoplazza_import_SAMPLE.xlsx \ | |
| 116 | + --max-files 1 --max-rows-per-file 2000 --max-products 50 | |
| 117 | +``` | |
| 118 | + | |
| 119 | +### 2)生成全量 | |
| 120 | + | |
| 121 | +```bash | |
| 122 | +python scripts/amazon_xlsx_to_shoplazza_xlsx.py \ | |
| 123 | + --input-dir data/mai_jia_jing_ling/products_data \ | |
| 124 | + --template docs/商品导入模板.xlsx \ | |
| 125 | + --output data/mai_jia_jing_ling/amazon_shoplazza_import_ALL.xlsx | |
| 126 | +``` | |
| 127 | + | |
| 128 | +--- | |
| 129 | + | |
| 130 | +## 六、可扩展点(后续常见需求) | |
| 131 | + | |
| 132 | +- **库存/上架/收税策略参数化**:目前是脚本默认值(Y/N/100),可按目标店铺规则改为命令行参数。 | |
| 133 | +- **更强的多款式解析**:如果未来亚马逊格式 `SKU` 不规范,可补充从 `详细参数` 里挖出 `Color/Size`。 | |
| 134 | +- **图片策略**:目前 P 行用各自 `商品主图`;也可改为 M 行合并多图(逗号拼接)。 | |
| 135 | + | |
| 136 | + | ... | ... |
| ... | ... | @@ -0,0 +1,480 @@ |
| 1 | +#!/usr/bin/env python3 | |
| 2 | +""" | |
| 3 | +Convert Amazon-format Excel exports (with Parent/Child ASIN structure) into | |
| 4 | +Shoplazza (店匠) product import Excel format based on `docs/商品导入模板.xlsx`. | |
| 5 | + | |
| 6 | +Data source: | |
| 7 | +- Directory with multiple `*.xlsx` files under `products_data/`. | |
| 8 | +- Each file contains a main sheet + "Notes" sheet. | |
| 9 | +- Column meanings (sample): | |
| 10 | + - ASIN: variant id (sku_id) | |
| 11 | + - 父ASIN: parent product id (spu_id) | |
| 12 | + | |
| 13 | +Output: | |
| 14 | +- For each 父ASIN group: | |
| 15 | + - If only 1 ASIN: generate one "S" row | |
| 16 | + - Else: generate one "M" row + multiple "P" rows | |
| 17 | + | |
| 18 | +Multi-variant (M/P) key point: | |
| 19 | +- Variant dimensions are parsed primarily from the `SKU` column, e.g. | |
| 20 | + "Size: One Size | Color: Black", and mapped into 款式1/2/3. | |
| 21 | +""" | |
| 22 | + | |
| 23 | +# NOTE: This file is intentionally the same implementation as | |
| 24 | +# `competitor_xlsx_to_shoplazza_xlsx.py`, but renamed to reflect the correct | |
| 25 | +# data source (Amazon-format exports). Keep the logic in sync. | |
| 26 | + | |
| 27 | +import os | |
| 28 | +import re | |
| 29 | +import sys | |
| 30 | +import argparse | |
| 31 | +from datetime import datetime | |
| 32 | +from collections import defaultdict, Counter | |
| 33 | +from pathlib import Path | |
| 34 | + | |
| 35 | +from openpyxl import load_workbook | |
| 36 | + | |
| 37 | +# Allow running as `python scripts/xxx.py` without installing as a package | |
| 38 | +sys.path.insert(0, str(Path(__file__).resolve().parent)) | |
| 39 | +from shoplazza_excel_template import create_excel_from_template | |
| 40 | + | |
| 41 | + | |
| 42 | +PREFERRED_OPTION_KEYS = [ | |
| 43 | + "Size", "Color", "Style", "Pattern", "Material", "Flavor", "Scent", | |
| 44 | + "Pack", "Pack of", "Number of Items", "Count", "Capacity", "Length", | |
| 45 | + "Width", "Height", "Model", "Configuration", | |
| 46 | +] | |
| 47 | + | |
| 48 | + | |
| 49 | +def clean_str(v): | |
| 50 | + if v is None: | |
| 51 | + return "" | |
| 52 | + return str(v).strip() | |
| 53 | + | |
| 54 | + | |
| 55 | +def html_escape(s): | |
| 56 | + s = clean_str(s) | |
| 57 | + return (s.replace("&", "&") | |
| 58 | + .replace("<", "<") | |
| 59 | + .replace(">", ">")) | |
| 60 | + | |
| 61 | + | |
| 62 | +def generate_handle(title): | |
| 63 | + """ | |
| 64 | + Generate URL-friendly handle from title (ASCII only). | |
| 65 | + Keep consistent with existing scripts. | |
| 66 | + """ | |
| 67 | + handle = clean_str(title).lower() | |
| 68 | + handle = re.sub(r"[^a-z0-9\\s-]", "", handle) | |
| 69 | + handle = re.sub(r"[-\\s]+", "-", handle).strip("-") | |
| 70 | + if len(handle) > 255: | |
| 71 | + handle = handle[:255] | |
| 72 | + return handle or "product" | |
| 73 | + | |
| 74 | + | |
| 75 | +def parse_date_to_template(dt_value): | |
| 76 | + """ | |
| 77 | + Template expects: YYYY-MM-DD HH:MM:SS | |
| 78 | + Input could be "2018-05-09" or datetime/date. | |
| 79 | + """ | |
| 80 | + if dt_value is None or dt_value == "": | |
| 81 | + return "" | |
| 82 | + if isinstance(dt_value, datetime): | |
| 83 | + return dt_value.strftime("%Y-%m-%d %H:%M:%S") | |
| 84 | + s = clean_str(dt_value) | |
| 85 | + for fmt in ("%Y-%m-%d", "%Y/%m/%d", "%Y-%m-%d %H:%M:%S", "%Y/%m/%d %H:%M:%S"): | |
| 86 | + try: | |
| 87 | + d = datetime.strptime(s, fmt) | |
| 88 | + return d.strftime("%Y-%m-%d %H:%M:%S") | |
| 89 | + except Exception: | |
| 90 | + pass | |
| 91 | + return "" | |
| 92 | + | |
| 93 | + | |
| 94 | +def parse_weight(weight_conv, weight_raw): | |
| 95 | + """ | |
| 96 | + Return (weight_value, unit) where unit in {kg, lb, g, oz}. | |
| 97 | + Prefer '商品重量(单位换算)' like '68.04 g'. | |
| 98 | + Fallback to '商品重量' like '0.15 pounds'. | |
| 99 | + """ | |
| 100 | + s = clean_str(weight_conv) or clean_str(weight_raw) | |
| 101 | + if not s: | |
| 102 | + return ("", "") | |
| 103 | + m = re.search(r"([0-9]+(?:\\.[0-9]+)?)\\s*([a-zA-Z]+)", s) | |
| 104 | + if not m: | |
| 105 | + return ("", "") | |
| 106 | + val = float(m.group(1)) | |
| 107 | + unit = m.group(2).lower() | |
| 108 | + if unit in ("g", "gram", "grams"): | |
| 109 | + return (val, "g") | |
| 110 | + if unit in ("kg", "kilogram", "kilograms"): | |
| 111 | + return (val, "kg") | |
| 112 | + if unit in ("lb", "lbs", "pound", "pounds"): | |
| 113 | + return (val, "lb") | |
| 114 | + if unit in ("oz", "ounce", "ounces"): | |
| 115 | + return (val, "oz") | |
| 116 | + return ("", "") | |
| 117 | + | |
| 118 | + | |
| 119 | +def parse_dimensions_inches(dim_raw): | |
| 120 | + """ | |
| 121 | + Template '尺寸信息': 'L,W,H' in inches. | |
| 122 | + Input example: '7.9 x 7.9 x 2 inches' | |
| 123 | + """ | |
| 124 | + s = clean_str(dim_raw) | |
| 125 | + if not s: | |
| 126 | + return "" | |
| 127 | + nums = re.findall(r"([0-9]+(?:\\.[0-9]+)?)", s) | |
| 128 | + if len(nums) < 3: | |
| 129 | + return "" | |
| 130 | + return "{},{},{}".format(nums[0], nums[1], nums[2]) | |
| 131 | + | |
| 132 | + | |
| 133 | +def parse_sku_options(sku_text): | |
| 134 | + """ | |
| 135 | + Parse 'SKU' column into {key: value}. | |
| 136 | + Example: | |
| 137 | + 'Size: One Size | Color: Black' -> {'Size':'One Size','Color':'Black'} | |
| 138 | + """ | |
| 139 | + s = clean_str(sku_text) | |
| 140 | + if not s: | |
| 141 | + return {} | |
| 142 | + parts = [p.strip() for p in s.split("|") if p.strip()] | |
| 143 | + out = {} | |
| 144 | + for p in parts: | |
| 145 | + if ":" not in p: | |
| 146 | + continue | |
| 147 | + k, v = p.split(":", 1) | |
| 148 | + k = clean_str(k) | |
| 149 | + v = clean_str(v) | |
| 150 | + if k and v: | |
| 151 | + out[k] = v | |
| 152 | + return out | |
| 153 | + | |
| 154 | + | |
| 155 | +def choose_option_keys(variant_dicts, max_keys=3): | |
| 156 | + freq = Counter() | |
| 157 | + for d in variant_dicts: | |
| 158 | + for k, v in d.items(): | |
| 159 | + if v: | |
| 160 | + freq[k] += 1 | |
| 161 | + if not freq: | |
| 162 | + return [] | |
| 163 | + preferred_rank = {k: i for i, k in enumerate(PREFERRED_OPTION_KEYS)} | |
| 164 | + | |
| 165 | + def key_sort(k): | |
| 166 | + return (preferred_rank.get(k, 10 ** 6), -freq[k], k.lower()) | |
| 167 | + | |
| 168 | + keys = sorted(freq.keys(), key=key_sort) | |
| 169 | + return keys[:max_keys] | |
| 170 | + | |
| 171 | + | |
| 172 | +def build_description_html(title, details, product_url): | |
| 173 | + parts = [] | |
| 174 | + if title: | |
| 175 | + parts.append("<p>{}</p>".format(html_escape(title))) | |
| 176 | + detail_items = [x.strip() for x in clean_str(details).split("|") if x.strip()] | |
| 177 | + if detail_items: | |
| 178 | + li = "".join(["<li>{}</li>".format(html_escape(x)) for x in detail_items[:30]]) | |
| 179 | + parts.append("<ul>{}</ul>".format(li)) | |
| 180 | + if product_url: | |
| 181 | + parts.append('<p>Source: <a href="{0}">{0}</a></p>'.format(html_escape(product_url))) | |
| 182 | + return "".join(parts) | |
| 183 | + | |
| 184 | + | |
| 185 | +def amazon_sheet(ws): | |
| 186 | + headers = [] | |
| 187 | + for c in range(1, ws.max_column + 1): | |
| 188 | + v = ws.cell(1, c).value | |
| 189 | + headers.append(clean_str(v)) | |
| 190 | + return {h: i + 1 for i, h in enumerate(headers) if h} | |
| 191 | + | |
| 192 | + | |
| 193 | +def read_amazon_rows_from_file(xlsx_path, max_rows=None): | |
| 194 | + wb = load_workbook(xlsx_path, read_only=True, data_only=True) | |
| 195 | + sheet_name = None | |
| 196 | + for name in wb.sheetnames: | |
| 197 | + if str(name).lower() == "notes": | |
| 198 | + continue | |
| 199 | + sheet_name = name | |
| 200 | + break | |
| 201 | + if sheet_name is None: | |
| 202 | + return [] | |
| 203 | + ws = wb[sheet_name] | |
| 204 | + idx = amazon_sheet(ws) | |
| 205 | + | |
| 206 | + required = ["ASIN", "父ASIN", "商品标题", "商品主图", "SKU", "详细参数", "价格($)", "prime价格($)", | |
| 207 | + "上架时间", "类目路径", "大类目", "小类目", "品牌", "品牌链接", "商品详情页链接", | |
| 208 | + "商品重量(单位换算)", "商品重量", "商品尺寸"] | |
| 209 | + for k in required: | |
| 210 | + if k not in idx: | |
| 211 | + raise RuntimeError("Missing column '{}' in {} sheet {}".format(k, xlsx_path, sheet_name)) | |
| 212 | + | |
| 213 | + rows = [] | |
| 214 | + end_row = ws.max_row | |
| 215 | + if max_rows is not None: | |
| 216 | + end_row = min(end_row, 1 + int(max_rows)) | |
| 217 | + | |
| 218 | + for r in range(2, end_row + 1): | |
| 219 | + asin = clean_str(ws.cell(r, idx["ASIN"]).value) | |
| 220 | + if not asin: | |
| 221 | + continue | |
| 222 | + parent = clean_str(ws.cell(r, idx["父ASIN"]).value) or asin | |
| 223 | + rows.append({ | |
| 224 | + "ASIN": asin, | |
| 225 | + "父ASIN": parent, | |
| 226 | + "SKU": clean_str(ws.cell(r, idx["SKU"]).value), | |
| 227 | + "详细参数": clean_str(ws.cell(r, idx["详细参数"]).value), | |
| 228 | + "商品标题": clean_str(ws.cell(r, idx["商品标题"]).value), | |
| 229 | + "商品主图": clean_str(ws.cell(r, idx["商品主图"]).value), | |
| 230 | + "价格($)": ws.cell(r, idx["价格($)"]).value, | |
| 231 | + "prime价格($)": ws.cell(r, idx["prime价格($)"]).value, | |
| 232 | + "上架时间": clean_str(ws.cell(r, idx["上架时间"]).value), | |
| 233 | + "类目路径": clean_str(ws.cell(r, idx["类目路径"]).value), | |
| 234 | + "大类目": clean_str(ws.cell(r, idx["大类目"]).value), | |
| 235 | + "小类目": clean_str(ws.cell(r, idx["小类目"]).value), | |
| 236 | + "品牌": clean_str(ws.cell(r, idx["品牌"]).value), | |
| 237 | + "品牌链接": clean_str(ws.cell(r, idx["品牌链接"]).value), | |
| 238 | + "商品详情页链接": clean_str(ws.cell(r, idx["商品详情页链接"]).value), | |
| 239 | + "商品重量(单位换算)": clean_str(ws.cell(r, idx["商品重量(单位换算)"]).value), | |
| 240 | + "商品重量": clean_str(ws.cell(r, idx["商品重量"]).value), | |
| 241 | + "商品尺寸": clean_str(ws.cell(r, idx["商品尺寸"]).value), | |
| 242 | + }) | |
| 243 | + return rows | |
| 244 | + | |
| 245 | + | |
| 246 | +def to_price(v): | |
| 247 | + if v is None or v == "": | |
| 248 | + return None | |
| 249 | + try: | |
| 250 | + return float(v) | |
| 251 | + except Exception: | |
| 252 | + s = clean_str(v) | |
| 253 | + m = re.search(r"([0-9]+(?:\\.[0-9]+)?)", s) | |
| 254 | + return float(m.group(1)) if m else None | |
| 255 | + | |
| 256 | + | |
| 257 | +def build_common_fields(base_row, spu_id): | |
| 258 | + title = base_row.get("商品标题") or "Product" | |
| 259 | + brand = base_row.get("品牌") or "" | |
| 260 | + big_cat = base_row.get("大类目") or "" | |
| 261 | + small_cat = base_row.get("小类目") or "" | |
| 262 | + cat_path = base_row.get("类目路径") or "" | |
| 263 | + | |
| 264 | + handle = generate_handle(title) | |
| 265 | + if handle and not handle.startswith("products/"): | |
| 266 | + handle = "products/{}".format(handle) | |
| 267 | + | |
| 268 | + seo_title = title | |
| 269 | + seo_desc_parts = [x for x in [brand, title, big_cat] if x] | |
| 270 | + seo_description = " ".join(seo_desc_parts)[:5000] | |
| 271 | + seo_keywords = ",".join([x for x in [title, brand, big_cat, small_cat] if x])[:5000] | |
| 272 | + tags = ",".join([x for x in [brand, big_cat, small_cat] if x]) | |
| 273 | + | |
| 274 | + created_at = parse_date_to_template(base_row.get("上架时间")) | |
| 275 | + description = build_description_html(title, base_row.get("详细参数"), base_row.get("商品详情页链接")) | |
| 276 | + | |
| 277 | + inventory_qty = 100 | |
| 278 | + weight_val, weight_unit = parse_weight(base_row.get("商品重量(单位换算)"), base_row.get("商品重量")) | |
| 279 | + size_info = parse_dimensions_inches(base_row.get("商品尺寸")) | |
| 280 | + | |
| 281 | + album = big_cat or (cat_path.split(":")[0] if cat_path else "") | |
| 282 | + | |
| 283 | + return { | |
| 284 | + "商品ID": "", | |
| 285 | + "创建时间": created_at, | |
| 286 | + "商品标题*": title[:255], | |
| 287 | + "商品副标题": "{} {}".format(brand, big_cat).strip()[:600], | |
| 288 | + "商品描述": description, | |
| 289 | + "SEO标题": seo_title[:5000], | |
| 290 | + "SEO描述": seo_description, | |
| 291 | + "SEO URL Handle": handle, | |
| 292 | + "SEO URL 重定向": "N", | |
| 293 | + "SEO关键词": seo_keywords, | |
| 294 | + "商品上架": "Y", | |
| 295 | + "需要物流": "Y", | |
| 296 | + "商品收税": "N", | |
| 297 | + "商品spu": spu_id[:100], | |
| 298 | + "启用虚拟销量": "N", | |
| 299 | + "虚拟销量值": "", | |
| 300 | + "跟踪库存": "Y", | |
| 301 | + "库存规则*": "1", | |
| 302 | + "专辑名称": album, | |
| 303 | + "标签": tags, | |
| 304 | + "供应商名称": "Amazon", | |
| 305 | + "供应商URL": base_row.get("商品详情页链接") or base_row.get("品牌链接") or "", | |
| 306 | + "商品重量": weight_val if weight_val != "" else "", | |
| 307 | + "重量单位": weight_unit, | |
| 308 | + "商品库存": inventory_qty, | |
| 309 | + "尺寸信息": size_info, | |
| 310 | + "原产地国别": "", | |
| 311 | + "HS(协调制度)代码": "", | |
| 312 | + "商品备注": "ASIN:{}; ParentASIN:{}; CategoryPath:{}".format( | |
| 313 | + base_row.get("ASIN", ""), spu_id, (cat_path[:200] if cat_path else "") | |
| 314 | + )[:500], | |
| 315 | + "款式备注": "", | |
| 316 | + } | |
| 317 | + | |
| 318 | + | |
| 319 | +def build_s_row(base_row): | |
| 320 | + spu_id = base_row.get("父ASIN") or base_row.get("ASIN") | |
| 321 | + common = build_common_fields(base_row, spu_id=spu_id) | |
| 322 | + price = to_price(base_row.get("prime价格($)")) or to_price(base_row.get("价格($)")) or 9.99 | |
| 323 | + image = base_row.get("商品主图") or "" | |
| 324 | + row = {} | |
| 325 | + row.update(common) | |
| 326 | + row.update({ | |
| 327 | + "商品属性*": "S", | |
| 328 | + "款式1": "", | |
| 329 | + "款式2": "", | |
| 330 | + "款式3": "", | |
| 331 | + "商品售价*": price, | |
| 332 | + "商品原价": price, | |
| 333 | + "成本价": "", | |
| 334 | + "商品SKU": base_row.get("ASIN") or "", | |
| 335 | + "商品条形码": "", | |
| 336 | + "商品图片*": image, | |
| 337 | + "商品主图": image, | |
| 338 | + }) | |
| 339 | + return row | |
| 340 | + | |
| 341 | + | |
| 342 | +def build_m_p_rows(variant_rows): | |
| 343 | + base = variant_rows[0] | |
| 344 | + spu_id = base.get("父ASIN") or base.get("ASIN") | |
| 345 | + common = build_common_fields(base, spu_id=spu_id) | |
| 346 | + | |
| 347 | + option_dicts = [parse_sku_options(v.get("SKU")) for v in variant_rows] | |
| 348 | + option_keys = choose_option_keys(option_dicts, max_keys=3) or ["Variant"] | |
| 349 | + | |
| 350 | + m = {} | |
| 351 | + m.update(common) | |
| 352 | + m.update({ | |
| 353 | + "商品属性*": "M", | |
| 354 | + "款式1": option_keys[0] if len(option_keys) > 0 else "", | |
| 355 | + "款式2": option_keys[1] if len(option_keys) > 1 else "", | |
| 356 | + "款式3": option_keys[2] if len(option_keys) > 2 else "", | |
| 357 | + "商品售价*": "", | |
| 358 | + "商品原价": "", | |
| 359 | + "成本价": "", | |
| 360 | + "商品SKU": "", | |
| 361 | + "商品条形码": "", | |
| 362 | + "商品图片*": base.get("商品主图") or "", | |
| 363 | + "商品主图": base.get("商品主图") or "", | |
| 364 | + }) | |
| 365 | + m["商品重量"] = "" | |
| 366 | + m["重量单位"] = "" | |
| 367 | + m["商品库存"] = "" | |
| 368 | + m["尺寸信息"] = "" | |
| 369 | + | |
| 370 | + rows = [m] | |
| 371 | + | |
| 372 | + for v in variant_rows: | |
| 373 | + v_common = build_common_fields(v, spu_id=spu_id) | |
| 374 | + v_common.update({ | |
| 375 | + "商品副标题": "", | |
| 376 | + "商品描述": "", | |
| 377 | + "SEO标题": "", | |
| 378 | + "SEO描述": "", | |
| 379 | + "SEO URL Handle": "", | |
| 380 | + "SEO URL 重定向": "", | |
| 381 | + "SEO关键词": "", | |
| 382 | + "专辑名称": "", | |
| 383 | + "标签": "", | |
| 384 | + "供应商名称": "", | |
| 385 | + "供应商URL": "", | |
| 386 | + "商品备注": "", | |
| 387 | + }) | |
| 388 | + | |
| 389 | + opt = parse_sku_options(v.get("SKU")) | |
| 390 | + opt_vals = [v.get("ASIN")] if option_keys == ["Variant"] else [opt.get(k, "") for k in option_keys] | |
| 391 | + | |
| 392 | + price = to_price(v.get("prime价格($)")) or to_price(v.get("价格($)")) or 9.99 | |
| 393 | + image = v.get("商品主图") or "" | |
| 394 | + | |
| 395 | + p = {} | |
| 396 | + p.update(v_common) | |
| 397 | + p.update({ | |
| 398 | + "商品属性*": "P", | |
| 399 | + "款式1": opt_vals[0] if len(opt_vals) > 0 else "", | |
| 400 | + "款式2": opt_vals[1] if len(opt_vals) > 1 else "", | |
| 401 | + "款式3": opt_vals[2] if len(opt_vals) > 2 else "", | |
| 402 | + "商品售价*": price, | |
| 403 | + "商品原价": price, | |
| 404 | + "成本价": "", | |
| 405 | + "商品SKU": v.get("ASIN") or "", | |
| 406 | + "商品条形码": "", | |
| 407 | + "商品图片*": image, | |
| 408 | + "商品主图": "", | |
| 409 | + }) | |
| 410 | + rows.append(p) | |
| 411 | + | |
| 412 | + return rows | |
| 413 | + | |
| 414 | + | |
| 415 | +def main(): | |
| 416 | + parser = argparse.ArgumentParser(description="Convert Amazon-format xlsx files to Shoplazza import xlsx") | |
| 417 | + parser.add_argument("--input-dir", default="data/mai_jia_jing_ling/products_data", help="Directory containing Amazon-format xlsx files") | |
| 418 | + parser.add_argument("--template", default="docs/商品导入模板.xlsx", help="Shoplazza import template xlsx") | |
| 419 | + parser.add_argument("--output", default="amazon_shoplazza_import.xlsx", help="Output xlsx file path") | |
| 420 | + parser.add_argument("--max-files", type=int, default=None, help="Limit number of xlsx files to read (for testing)") | |
| 421 | + parser.add_argument("--max-rows-per-file", type=int, default=None, help="Limit rows per xlsx file (for testing)") | |
| 422 | + parser.add_argument("--max-products", type=int, default=None, help="Limit number of SPU groups to output (for testing)") | |
| 423 | + args = parser.parse_args() | |
| 424 | + | |
| 425 | + if not os.path.isdir(args.input_dir): | |
| 426 | + raise RuntimeError("input-dir not found: {}".format(args.input_dir)) | |
| 427 | + if not os.path.exists(args.template): | |
| 428 | + raise RuntimeError("template not found: {}".format(args.template)) | |
| 429 | + | |
| 430 | + files = [os.path.join(args.input_dir, f) for f in os.listdir(args.input_dir) if f.lower().endswith(".xlsx")] | |
| 431 | + files.sort() | |
| 432 | + if args.max_files is not None: | |
| 433 | + files = files[: int(args.max_files)] | |
| 434 | + | |
| 435 | + print("Reading Amazon-format files: {} (from {})".format(len(files), args.input_dir), flush=True) | |
| 436 | + | |
| 437 | + groups = defaultdict(list) | |
| 438 | + seen_asin = set() | |
| 439 | + | |
| 440 | + for fp in files: | |
| 441 | + print(" - loading: {}".format(fp), flush=True) | |
| 442 | + try: | |
| 443 | + rows = read_amazon_rows_from_file(fp, max_rows=args.max_rows_per_file) | |
| 444 | + except Exception as e: | |
| 445 | + print("WARN: failed to read {}: {}".format(fp, e)) | |
| 446 | + continue | |
| 447 | + print(" loaded rows: {}".format(len(rows)), flush=True) | |
| 448 | + | |
| 449 | + for r in rows: | |
| 450 | + asin = r.get("ASIN") | |
| 451 | + if asin in seen_asin: | |
| 452 | + continue | |
| 453 | + seen_asin.add(asin) | |
| 454 | + spu_id = r.get("父ASIN") or asin | |
| 455 | + groups[spu_id].append(r) | |
| 456 | + | |
| 457 | + print("Collected variants: {}, SPU groups: {}".format(len(seen_asin), len(groups)), flush=True) | |
| 458 | + | |
| 459 | + excel_rows = [] | |
| 460 | + spu_count = 0 | |
| 461 | + | |
| 462 | + for spu_id, variants in groups.items(): | |
| 463 | + if not variants: | |
| 464 | + continue | |
| 465 | + spu_count += 1 | |
| 466 | + if args.max_products is not None and spu_count > int(args.max_products): | |
| 467 | + break | |
| 468 | + if len(variants) == 1: | |
| 469 | + excel_rows.append(build_s_row(variants[0])) | |
| 470 | + else: | |
| 471 | + excel_rows.extend(build_m_p_rows(variants)) | |
| 472 | + | |
| 473 | + print("Generated Excel rows: {} (SPU groups output: {})".format(len(excel_rows), min(spu_count, len(groups))), flush=True) | |
| 474 | + create_excel_from_template(args.template, args.output, excel_rows) | |
| 475 | + | |
| 476 | + | |
| 477 | +if __name__ == "__main__": | |
| 478 | + main() | |
| 479 | + | |
| 480 | + | ... | ... |
scripts/competitor_xlsx_to_shoplazza_xlsx.py
| 1 | 1 | #!/usr/bin/env python3 |
| 2 | 2 | """ |
| 3 | -Convert competitor Excel exports (with Parent/Child ASIN structure) into | |
| 4 | -Shoplazza (店匠) product import Excel format based on `docs/商品导入模板.xlsx`. | |
| 5 | - | |
| 6 | -Data source: | |
| 7 | -- Directory with multiple `Competitor-*.xlsx` files. | |
| 8 | -- Each file contains a main sheet + "Notes" sheet. | |
| 9 | -- Column meanings (sample): | |
| 10 | - - ASIN: variant id (sku_id) | |
| 11 | - - 父ASIN: product id (spu_id) | |
| 12 | - | |
| 13 | -Output: | |
| 14 | -- For each 父ASIN group: | |
| 15 | - - If only 1 ASIN: generate one "S" row | |
| 16 | - - Else: generate one "M" row + multiple "P" rows | |
| 17 | - | |
| 18 | -Important: | |
| 19 | -- Variant dimensions are parsed primarily from the `SKU` column: | |
| 20 | - "Size: One Size | Color: Black" | |
| 21 | - and mapped into 款式1/2/3. | |
| 3 | +DEPRECATED NAME (kept for backward compatibility). | |
| 4 | + | |
| 5 | +The input `products_data/*.xlsx` files are **Amazon-format exports** (with Parent/Child ASIN), | |
| 6 | +not “competitor data”. Please use: | |
| 7 | + | |
| 8 | + - `scripts/amazon_xlsx_to_shoplazza_xlsx.py` | |
| 9 | + | |
| 10 | +This script keeps the same logic but updates user-facing naming gradually. | |
| 22 | 11 | """ |
| 23 | 12 | |
| 24 | 13 | import os |
| ... | ... | @@ -457,10 +446,10 @@ def build_m_p_rows(variant_rows): |
| 457 | 446 | |
| 458 | 447 | |
| 459 | 448 | def main(): |
| 460 | - parser = argparse.ArgumentParser(description="Convert competitor xlsx files to Shoplazza import xlsx") | |
| 461 | - parser.add_argument("--input-dir", default="data/mai_jia_jing_ling/products_data", help="Directory containing competitor xlsx files") | |
| 449 | + parser = argparse.ArgumentParser(description="Convert Amazon-format xlsx files to Shoplazza import xlsx (deprecated script name)") | |
| 450 | + parser.add_argument("--input-dir", default="data/mai_jia_jing_ling/products_data", help="Directory containing Amazon-format xlsx files") | |
| 462 | 451 | parser.add_argument("--template", default="docs/商品导入模板.xlsx", help="Shoplazza import template xlsx") |
| 463 | - parser.add_argument("--output", default="competitor_shoplazza_import.xlsx", help="Output xlsx file path") | |
| 452 | + parser.add_argument("--output", default="amazon_shoplazza_import.xlsx", help="Output xlsx file path") | |
| 464 | 453 | parser.add_argument("--max-files", type=int, default=None, help="Limit number of xlsx files to read (for testing)") |
| 465 | 454 | parser.add_argument("--max-rows-per-file", type=int, default=None, help="Limit rows per xlsx file (for testing)") |
| 466 | 455 | parser.add_argument("--max-products", type=int, default=None, help="Limit number of SPU groups to output (for testing)") |
| ... | ... | @@ -477,7 +466,7 @@ def main(): |
| 477 | 466 | if args.max_files is not None: |
| 478 | 467 | files = files[: int(args.max_files)] |
| 479 | 468 | |
| 480 | - print("Reading competitor files: {} (from {})".format(len(files), input_dir), flush=True) | |
| 469 | + print("Reading Amazon-format files: {} (from {})".format(len(files), input_dir), flush=True) | |
| 481 | 470 | |
| 482 | 471 | groups = defaultdict(list) # spu_id -> [variant rows] |
| 483 | 472 | seen_asin = set() | ... | ... |