Commit cd29428b86b6a3c1f2c6b3663c57e750decf660c

Authored by tangwang
1 parent f3c11fef

亚马逊数据导入店匠店铺 - 数据处理

docs/亚马逊格式数据转店匠商品导入模板.md 0 → 100644
... ... @@ -0,0 +1,136 @@
  1 +## 亚马逊格式数据 → 店匠(Shoplazza)商品导入模板:转换说明
  2 +
  3 +本仓库支持把 `data/mai_jia_jing_ling/products_data/*.xlsx`(**亚马逊格式导出**)转换为店匠后台可导入的 `docs/商品导入模板.xlsx` 格式。
  4 +
  5 +对应脚本:
  6 +- **主入口**:`scripts/amazon_xlsx_to_shoplazza_xlsx.py`
  7 +- **历史兼容**:`scripts/competitor_xlsx_to_shoplazza_xlsx.py`(仅名称过时,逻辑一致)
  8 +- **模板写入复用工具**:`scripts/shoplazza_excel_template.py`
  9 +
  10 +---
  11 +
  12 +## 一、输入数据(亚马逊格式 xlsx)的关键字段
  13 +
  14 +以 `Competitor-US-Last-30-days-363464.xlsx` 为例(文件名不影响:内容是亚马逊维度字段):
  15 +
  16 +- **ASIN**:变体 id(我们视为 `sku_id`,会写入模板的 `商品SKU`)
  17 +- **父ASIN**:父商品 id(我们视为 `spu_id`/`product_id`,会写入模板的 `商品spu`,并用于分组 M/P)
  18 +- **商品标题**:商品标题(写入 `商品标题*`、SEO标题等)
  19 +- **SKU**:亚马逊变体描述字符串(关键:解析出多款式维度)
  20 + - 示例:`Size: One Size | Color: Black`
  21 +- **商品主图**:图片 URL(用于 `商品图片*` / `商品主图`)
  22 +- **价格($)** / **prime价格($)**:价格(用于 `商品售价*` / `商品原价`)
  23 +- **详细参数**:详情参数串(用于拼接 `商品描述`)
  24 +- **上架时间**:用于 `创建时间`
  25 +- **类目路径/大类目/小类目/品牌/商品详情页链接/品牌链接**:用于专辑、标签、SEO、供应商URL、备注等
  26 +- **商品重量(单位换算)/商品重量/商品尺寸**:用于 `商品重量/重量单位/尺寸信息`
  27 +
  28 +> 注意:该数据源通常**没有库存**,脚本默认给每个变体一个固定库存(当前默认 100),以满足导入后的可用性。
  29 +
  30 +---
  31 +
  32 +## 二、输出数据(店匠导入模板)的核心规则(M / P / S)
  33 +
  34 +店匠模板在 `docs/商品导入模板说明.md` 中定义了三种商品属性(`商品属性*`):
  35 +
  36 +- **S(单一款式)**:一个商品只有一个变体(只有 1 个 ASIN)
  37 + - 输出 **1 行**
  38 +- **M(主商品)+ P(子款式)**:一个父商品(父ASIN)包含多个变体(多个 ASIN)
  39 + - 输出 **1 行 M + N 行 P**
  40 + - 且 **同一商品的 P 行必须紧跟在 M 行后面**(模板导入强约束)
  41 +
  42 +本仓库的转换策略:
  43 +- 对每个 `父ASIN` 分组:
  44 + - **分组 size = 1** → 生成 `S`
  45 + - **分组 size > 1** → 生成 `M` + 多个 `P`
  46 +
  47 +---
  48 +
  49 +## 三、多款式(变体)是如何构造的(最关键部分)
  50 +
  51 +### 1)为什么 “SKU” 列是关键
  52 +
  53 +亚马逊格式里,变体的“颜色/尺码”等信息往往并不拆成多个列,而是集中在 `SKU` 字符串里,例如:
  54 +
  55 +- `Size: One Size | Color: Black`
  56 +- `Color: Red | Style: 2-Pack`
  57 +
  58 +店匠模板的多款式需要:
  59 +- **M 行**:`款式1/款式2/款式3` 写“维度名”(例如 Size / Color / Material)
  60 +- **P 行**:`款式1/款式2/款式3` 写“维度值”(例如 One Size / Black / Cotton)
  61 +
  62 +### 2)脚本如何从 SKU 解析出维度(key/value)
  63 +
  64 +脚本会把 `SKU` 以 `|` 分割,再用 `:` 拆成 key/value:
  65 +
  66 +- 输入:`Size: One Size | Color: Black`
  67 +- 解析结果:`{ "Size": "One Size", "Color": "Black" }`
  68 +
  69 +### 3)如何从多个变体里选出 “最多3个维度”
  70 +
  71 +店匠模板只提供 `款式1~3` 三个维度,因此脚本会在一个 `父ASIN` 组内统计 key 的出现频次,并按优先级挑选最多 3 个维度:
  72 +
  73 +- 优先级大致为:`Size`、`Color`、`Style`、`Pattern`、`Material` ……
  74 +- 如果一个组里解析不到任何 key/value,则退化为单维度:`Variant`
  75 + - M 行 `款式1 = Variant`
  76 + - P 行 `款式1 = ASIN`
  77 +
  78 +### 4)M 行与 P 行分别填什么(避免导入报错)
  79 +
  80 +根据模板说明,脚本遵循以下分工:
  81 +
  82 +- **M 行(主商品)**:
  83 + - 填:标题/描述/SEO/专辑/标签/主图/款式维度名
  84 + - 不填:价格、库存、重量等 SKU 级字段(保持为空更安全)
  85 +- **P 行(子款式)**:
  86 + - 填:款式维度值、价格、商品SKU(ASIN)、库存、重量、尺寸、(可选)子款式图
  87 + - 不填:描述/SEO/专辑/供应商等 SPU 级字段(保持为空)
  88 +
  89 +---
  90 +
  91 +## 四、字段映射总览(高频字段)
  92 +
  93 +- **商品spu** ← `父ASIN`(无父ASIN则用 ASIN)
  94 +- **商品SKU** ← `ASIN`
  95 +- **商品标题\*** ← `商品标题`
  96 +- **商品图片\*** / **商品主图** ← `商品主图`
  97 +- **商品售价\*** ← `prime价格($)` 优先,否则 `价格($)`
  98 +- **创建时间** ← `上架时间`(仅日期时补齐为 `YYYY-MM-DD 00:00:00`)
  99 +- **商品描述** ← `商品标题` + `详细参数`(以 HTML 拼接)
  100 +- **专辑名称** ← `大类目`(无则取 `类目路径` 第一段)
  101 +- **标签** ← `品牌,大类目,小类目`
  102 +- **商品重量/重量单位** ← 优先解析 `商品重量(单位换算)`(如 `68.04 g`)
  103 +- **尺寸信息** ← 解析 `商品尺寸` 前三段数字(英寸)拼成 `L,W,H`
  104 +
  105 +---
  106 +
  107 +## 五、如何运行(生成导入文件)
  108 +
  109 +### 1)先小批量验证(推荐)
  110 +
  111 +```bash
  112 +python scripts/amazon_xlsx_to_shoplazza_xlsx.py \
  113 + --input-dir data/mai_jia_jing_ling/products_data \
  114 + --template docs/商品导入模板.xlsx \
  115 + --output data/mai_jia_jing_ling/amazon_shoplazza_import_SAMPLE.xlsx \
  116 + --max-files 1 --max-rows-per-file 2000 --max-products 50
  117 +```
  118 +
  119 +### 2)生成全量
  120 +
  121 +```bash
  122 +python scripts/amazon_xlsx_to_shoplazza_xlsx.py \
  123 + --input-dir data/mai_jia_jing_ling/products_data \
  124 + --template docs/商品导入模板.xlsx \
  125 + --output data/mai_jia_jing_ling/amazon_shoplazza_import_ALL.xlsx
  126 +```
  127 +
  128 +---
  129 +
  130 +## 六、可扩展点(后续常见需求)
  131 +
  132 +- **库存/上架/收税策略参数化**:目前是脚本默认值(Y/N/100),可按目标店铺规则改为命令行参数。
  133 +- **更强的多款式解析**:如果未来亚马逊格式 `SKU` 不规范,可补充从 `详细参数` 里挖出 `Color/Size`。
  134 +- **图片策略**:目前 P 行用各自 `商品主图`;也可改为 M 行合并多图(逗号拼接)。
  135 +
  136 +
... ...
scripts/amazon_xlsx_to_shoplazza_xlsx.py 0 → 100644
... ... @@ -0,0 +1,480 @@
  1 +#!/usr/bin/env python3
  2 +"""
  3 +Convert Amazon-format Excel exports (with Parent/Child ASIN structure) into
  4 +Shoplazza (店匠) product import Excel format based on `docs/商品导入模板.xlsx`.
  5 +
  6 +Data source:
  7 +- Directory with multiple `*.xlsx` files under `products_data/`.
  8 +- Each file contains a main sheet + "Notes" sheet.
  9 +- Column meanings (sample):
  10 + - ASIN: variant id (sku_id)
  11 + - 父ASIN: parent product id (spu_id)
  12 +
  13 +Output:
  14 +- For each 父ASIN group:
  15 + - If only 1 ASIN: generate one "S" row
  16 + - Else: generate one "M" row + multiple "P" rows
  17 +
  18 +Multi-variant (M/P) key point:
  19 +- Variant dimensions are parsed primarily from the `SKU` column, e.g.
  20 + "Size: One Size | Color: Black", and mapped into 款式1/2/3.
  21 +"""
  22 +
  23 +# NOTE: This file is intentionally the same implementation as
  24 +# `competitor_xlsx_to_shoplazza_xlsx.py`, but renamed to reflect the correct
  25 +# data source (Amazon-format exports). Keep the logic in sync.
  26 +
  27 +import os
  28 +import re
  29 +import sys
  30 +import argparse
  31 +from datetime import datetime
  32 +from collections import defaultdict, Counter
  33 +from pathlib import Path
  34 +
  35 +from openpyxl import load_workbook
  36 +
  37 +# Allow running as `python scripts/xxx.py` without installing as a package
  38 +sys.path.insert(0, str(Path(__file__).resolve().parent))
  39 +from shoplazza_excel_template import create_excel_from_template
  40 +
  41 +
  42 +PREFERRED_OPTION_KEYS = [
  43 + "Size", "Color", "Style", "Pattern", "Material", "Flavor", "Scent",
  44 + "Pack", "Pack of", "Number of Items", "Count", "Capacity", "Length",
  45 + "Width", "Height", "Model", "Configuration",
  46 +]
  47 +
  48 +
  49 +def clean_str(v):
  50 + if v is None:
  51 + return ""
  52 + return str(v).strip()
  53 +
  54 +
  55 +def html_escape(s):
  56 + s = clean_str(s)
  57 + return (s.replace("&", "&")
  58 + .replace("<", "&lt;")
  59 + .replace(">", "&gt;"))
  60 +
  61 +
  62 +def generate_handle(title):
  63 + """
  64 + Generate URL-friendly handle from title (ASCII only).
  65 + Keep consistent with existing scripts.
  66 + """
  67 + handle = clean_str(title).lower()
  68 + handle = re.sub(r"[^a-z0-9\\s-]", "", handle)
  69 + handle = re.sub(r"[-\\s]+", "-", handle).strip("-")
  70 + if len(handle) > 255:
  71 + handle = handle[:255]
  72 + return handle or "product"
  73 +
  74 +
  75 +def parse_date_to_template(dt_value):
  76 + """
  77 + Template expects: YYYY-MM-DD HH:MM:SS
  78 + Input could be "2018-05-09" or datetime/date.
  79 + """
  80 + if dt_value is None or dt_value == "":
  81 + return ""
  82 + if isinstance(dt_value, datetime):
  83 + return dt_value.strftime("%Y-%m-%d %H:%M:%S")
  84 + s = clean_str(dt_value)
  85 + for fmt in ("%Y-%m-%d", "%Y/%m/%d", "%Y-%m-%d %H:%M:%S", "%Y/%m/%d %H:%M:%S"):
  86 + try:
  87 + d = datetime.strptime(s, fmt)
  88 + return d.strftime("%Y-%m-%d %H:%M:%S")
  89 + except Exception:
  90 + pass
  91 + return ""
  92 +
  93 +
  94 +def parse_weight(weight_conv, weight_raw):
  95 + """
  96 + Return (weight_value, unit) where unit in {kg, lb, g, oz}.
  97 + Prefer '商品重量(单位换算)' like '68.04 g'.
  98 + Fallback to '商品重量' like '0.15 pounds'.
  99 + """
  100 + s = clean_str(weight_conv) or clean_str(weight_raw)
  101 + if not s:
  102 + return ("", "")
  103 + m = re.search(r"([0-9]+(?:\\.[0-9]+)?)\\s*([a-zA-Z]+)", s)
  104 + if not m:
  105 + return ("", "")
  106 + val = float(m.group(1))
  107 + unit = m.group(2).lower()
  108 + if unit in ("g", "gram", "grams"):
  109 + return (val, "g")
  110 + if unit in ("kg", "kilogram", "kilograms"):
  111 + return (val, "kg")
  112 + if unit in ("lb", "lbs", "pound", "pounds"):
  113 + return (val, "lb")
  114 + if unit in ("oz", "ounce", "ounces"):
  115 + return (val, "oz")
  116 + return ("", "")
  117 +
  118 +
  119 +def parse_dimensions_inches(dim_raw):
  120 + """
  121 + Template '尺寸信息': 'L,W,H' in inches.
  122 + Input example: '7.9 x 7.9 x 2 inches'
  123 + """
  124 + s = clean_str(dim_raw)
  125 + if not s:
  126 + return ""
  127 + nums = re.findall(r"([0-9]+(?:\\.[0-9]+)?)", s)
  128 + if len(nums) < 3:
  129 + return ""
  130 + return "{},{},{}".format(nums[0], nums[1], nums[2])
  131 +
  132 +
  133 +def parse_sku_options(sku_text):
  134 + """
  135 + Parse 'SKU' column into {key: value}.
  136 + Example:
  137 + 'Size: One Size | Color: Black' -> {'Size':'One Size','Color':'Black'}
  138 + """
  139 + s = clean_str(sku_text)
  140 + if not s:
  141 + return {}
  142 + parts = [p.strip() for p in s.split("|") if p.strip()]
  143 + out = {}
  144 + for p in parts:
  145 + if ":" not in p:
  146 + continue
  147 + k, v = p.split(":", 1)
  148 + k = clean_str(k)
  149 + v = clean_str(v)
  150 + if k and v:
  151 + out[k] = v
  152 + return out
  153 +
  154 +
  155 +def choose_option_keys(variant_dicts, max_keys=3):
  156 + freq = Counter()
  157 + for d in variant_dicts:
  158 + for k, v in d.items():
  159 + if v:
  160 + freq[k] += 1
  161 + if not freq:
  162 + return []
  163 + preferred_rank = {k: i for i, k in enumerate(PREFERRED_OPTION_KEYS)}
  164 +
  165 + def key_sort(k):
  166 + return (preferred_rank.get(k, 10 ** 6), -freq[k], k.lower())
  167 +
  168 + keys = sorted(freq.keys(), key=key_sort)
  169 + return keys[:max_keys]
  170 +
  171 +
  172 +def build_description_html(title, details, product_url):
  173 + parts = []
  174 + if title:
  175 + parts.append("<p>{}</p>".format(html_escape(title)))
  176 + detail_items = [x.strip() for x in clean_str(details).split("|") if x.strip()]
  177 + if detail_items:
  178 + li = "".join(["<li>{}</li>".format(html_escape(x)) for x in detail_items[:30]])
  179 + parts.append("<ul>{}</ul>".format(li))
  180 + if product_url:
  181 + parts.append('<p>Source: <a href="{0}">{0}</a></p>'.format(html_escape(product_url)))
  182 + return "".join(parts)
  183 +
  184 +
  185 +def amazon_sheet(ws):
  186 + headers = []
  187 + for c in range(1, ws.max_column + 1):
  188 + v = ws.cell(1, c).value
  189 + headers.append(clean_str(v))
  190 + return {h: i + 1 for i, h in enumerate(headers) if h}
  191 +
  192 +
  193 +def read_amazon_rows_from_file(xlsx_path, max_rows=None):
  194 + wb = load_workbook(xlsx_path, read_only=True, data_only=True)
  195 + sheet_name = None
  196 + for name in wb.sheetnames:
  197 + if str(name).lower() == "notes":
  198 + continue
  199 + sheet_name = name
  200 + break
  201 + if sheet_name is None:
  202 + return []
  203 + ws = wb[sheet_name]
  204 + idx = amazon_sheet(ws)
  205 +
  206 + required = ["ASIN", "父ASIN", "商品标题", "商品主图", "SKU", "详细参数", "价格($)", "prime价格($)",
  207 + "上架时间", "类目路径", "大类目", "小类目", "品牌", "品牌链接", "商品详情页链接",
  208 + "商品重量(单位换算)", "商品重量", "商品尺寸"]
  209 + for k in required:
  210 + if k not in idx:
  211 + raise RuntimeError("Missing column '{}' in {} sheet {}".format(k, xlsx_path, sheet_name))
  212 +
  213 + rows = []
  214 + end_row = ws.max_row
  215 + if max_rows is not None:
  216 + end_row = min(end_row, 1 + int(max_rows))
  217 +
  218 + for r in range(2, end_row + 1):
  219 + asin = clean_str(ws.cell(r, idx["ASIN"]).value)
  220 + if not asin:
  221 + continue
  222 + parent = clean_str(ws.cell(r, idx["父ASIN"]).value) or asin
  223 + rows.append({
  224 + "ASIN": asin,
  225 + "父ASIN": parent,
  226 + "SKU": clean_str(ws.cell(r, idx["SKU"]).value),
  227 + "详细参数": clean_str(ws.cell(r, idx["详细参数"]).value),
  228 + "商品标题": clean_str(ws.cell(r, idx["商品标题"]).value),
  229 + "商品主图": clean_str(ws.cell(r, idx["商品主图"]).value),
  230 + "价格($)": ws.cell(r, idx["价格($)"]).value,
  231 + "prime价格($)": ws.cell(r, idx["prime价格($)"]).value,
  232 + "上架时间": clean_str(ws.cell(r, idx["上架时间"]).value),
  233 + "类目路径": clean_str(ws.cell(r, idx["类目路径"]).value),
  234 + "大类目": clean_str(ws.cell(r, idx["大类目"]).value),
  235 + "小类目": clean_str(ws.cell(r, idx["小类目"]).value),
  236 + "品牌": clean_str(ws.cell(r, idx["品牌"]).value),
  237 + "品牌链接": clean_str(ws.cell(r, idx["品牌链接"]).value),
  238 + "商品详情页链接": clean_str(ws.cell(r, idx["商品详情页链接"]).value),
  239 + "商品重量(单位换算)": clean_str(ws.cell(r, idx["商品重量(单位换算)"]).value),
  240 + "商品重量": clean_str(ws.cell(r, idx["商品重量"]).value),
  241 + "商品尺寸": clean_str(ws.cell(r, idx["商品尺寸"]).value),
  242 + })
  243 + return rows
  244 +
  245 +
  246 +def to_price(v):
  247 + if v is None or v == "":
  248 + return None
  249 + try:
  250 + return float(v)
  251 + except Exception:
  252 + s = clean_str(v)
  253 + m = re.search(r"([0-9]+(?:\\.[0-9]+)?)", s)
  254 + return float(m.group(1)) if m else None
  255 +
  256 +
  257 +def build_common_fields(base_row, spu_id):
  258 + title = base_row.get("商品标题") or "Product"
  259 + brand = base_row.get("品牌") or ""
  260 + big_cat = base_row.get("大类目") or ""
  261 + small_cat = base_row.get("小类目") or ""
  262 + cat_path = base_row.get("类目路径") or ""
  263 +
  264 + handle = generate_handle(title)
  265 + if handle and not handle.startswith("products/"):
  266 + handle = "products/{}".format(handle)
  267 +
  268 + seo_title = title
  269 + seo_desc_parts = [x for x in [brand, title, big_cat] if x]
  270 + seo_description = " ".join(seo_desc_parts)[:5000]
  271 + seo_keywords = ",".join([x for x in [title, brand, big_cat, small_cat] if x])[:5000]
  272 + tags = ",".join([x for x in [brand, big_cat, small_cat] if x])
  273 +
  274 + created_at = parse_date_to_template(base_row.get("上架时间"))
  275 + description = build_description_html(title, base_row.get("详细参数"), base_row.get("商品详情页链接"))
  276 +
  277 + inventory_qty = 100
  278 + weight_val, weight_unit = parse_weight(base_row.get("商品重量(单位换算)"), base_row.get("商品重量"))
  279 + size_info = parse_dimensions_inches(base_row.get("商品尺寸"))
  280 +
  281 + album = big_cat or (cat_path.split(":")[0] if cat_path else "")
  282 +
  283 + return {
  284 + "商品ID": "",
  285 + "创建时间": created_at,
  286 + "商品标题*": title[:255],
  287 + "商品副标题": "{} {}".format(brand, big_cat).strip()[:600],
  288 + "商品描述": description,
  289 + "SEO标题": seo_title[:5000],
  290 + "SEO描述": seo_description,
  291 + "SEO URL Handle": handle,
  292 + "SEO URL 重定向": "N",
  293 + "SEO关键词": seo_keywords,
  294 + "商品上架": "Y",
  295 + "需要物流": "Y",
  296 + "商品收税": "N",
  297 + "商品spu": spu_id[:100],
  298 + "启用虚拟销量": "N",
  299 + "虚拟销量值": "",
  300 + "跟踪库存": "Y",
  301 + "库存规则*": "1",
  302 + "专辑名称": album,
  303 + "标签": tags,
  304 + "供应商名称": "Amazon",
  305 + "供应商URL": base_row.get("商品详情页链接") or base_row.get("品牌链接") or "",
  306 + "商品重量": weight_val if weight_val != "" else "",
  307 + "重量单位": weight_unit,
  308 + "商品库存": inventory_qty,
  309 + "尺寸信息": size_info,
  310 + "原产地国别": "",
  311 + "HS(协调制度)代码": "",
  312 + "商品备注": "ASIN:{}; ParentASIN:{}; CategoryPath:{}".format(
  313 + base_row.get("ASIN", ""), spu_id, (cat_path[:200] if cat_path else "")
  314 + )[:500],
  315 + "款式备注": "",
  316 + }
  317 +
  318 +
  319 +def build_s_row(base_row):
  320 + spu_id = base_row.get("父ASIN") or base_row.get("ASIN")
  321 + common = build_common_fields(base_row, spu_id=spu_id)
  322 + price = to_price(base_row.get("prime价格($)")) or to_price(base_row.get("价格($)")) or 9.99
  323 + image = base_row.get("商品主图") or ""
  324 + row = {}
  325 + row.update(common)
  326 + row.update({
  327 + "商品属性*": "S",
  328 + "款式1": "",
  329 + "款式2": "",
  330 + "款式3": "",
  331 + "商品售价*": price,
  332 + "商品原价": price,
  333 + "成本价": "",
  334 + "商品SKU": base_row.get("ASIN") or "",
  335 + "商品条形码": "",
  336 + "商品图片*": image,
  337 + "商品主图": image,
  338 + })
  339 + return row
  340 +
  341 +
  342 +def build_m_p_rows(variant_rows):
  343 + base = variant_rows[0]
  344 + spu_id = base.get("父ASIN") or base.get("ASIN")
  345 + common = build_common_fields(base, spu_id=spu_id)
  346 +
  347 + option_dicts = [parse_sku_options(v.get("SKU")) for v in variant_rows]
  348 + option_keys = choose_option_keys(option_dicts, max_keys=3) or ["Variant"]
  349 +
  350 + m = {}
  351 + m.update(common)
  352 + m.update({
  353 + "商品属性*": "M",
  354 + "款式1": option_keys[0] if len(option_keys) > 0 else "",
  355 + "款式2": option_keys[1] if len(option_keys) > 1 else "",
  356 + "款式3": option_keys[2] if len(option_keys) > 2 else "",
  357 + "商品售价*": "",
  358 + "商品原价": "",
  359 + "成本价": "",
  360 + "商品SKU": "",
  361 + "商品条形码": "",
  362 + "商品图片*": base.get("商品主图") or "",
  363 + "商品主图": base.get("商品主图") or "",
  364 + })
  365 + m["商品重量"] = ""
  366 + m["重量单位"] = ""
  367 + m["商品库存"] = ""
  368 + m["尺寸信息"] = ""
  369 +
  370 + rows = [m]
  371 +
  372 + for v in variant_rows:
  373 + v_common = build_common_fields(v, spu_id=spu_id)
  374 + v_common.update({
  375 + "商品副标题": "",
  376 + "商品描述": "",
  377 + "SEO标题": "",
  378 + "SEO描述": "",
  379 + "SEO URL Handle": "",
  380 + "SEO URL 重定向": "",
  381 + "SEO关键词": "",
  382 + "专辑名称": "",
  383 + "标签": "",
  384 + "供应商名称": "",
  385 + "供应商URL": "",
  386 + "商品备注": "",
  387 + })
  388 +
  389 + opt = parse_sku_options(v.get("SKU"))
  390 + opt_vals = [v.get("ASIN")] if option_keys == ["Variant"] else [opt.get(k, "") for k in option_keys]
  391 +
  392 + price = to_price(v.get("prime价格($)")) or to_price(v.get("价格($)")) or 9.99
  393 + image = v.get("商品主图") or ""
  394 +
  395 + p = {}
  396 + p.update(v_common)
  397 + p.update({
  398 + "商品属性*": "P",
  399 + "款式1": opt_vals[0] if len(opt_vals) > 0 else "",
  400 + "款式2": opt_vals[1] if len(opt_vals) > 1 else "",
  401 + "款式3": opt_vals[2] if len(opt_vals) > 2 else "",
  402 + "商品售价*": price,
  403 + "商品原价": price,
  404 + "成本价": "",
  405 + "商品SKU": v.get("ASIN") or "",
  406 + "商品条形码": "",
  407 + "商品图片*": image,
  408 + "商品主图": "",
  409 + })
  410 + rows.append(p)
  411 +
  412 + return rows
  413 +
  414 +
  415 +def main():
  416 + parser = argparse.ArgumentParser(description="Convert Amazon-format xlsx files to Shoplazza import xlsx")
  417 + parser.add_argument("--input-dir", default="data/mai_jia_jing_ling/products_data", help="Directory containing Amazon-format xlsx files")
  418 + parser.add_argument("--template", default="docs/商品导入模板.xlsx", help="Shoplazza import template xlsx")
  419 + parser.add_argument("--output", default="amazon_shoplazza_import.xlsx", help="Output xlsx file path")
  420 + parser.add_argument("--max-files", type=int, default=None, help="Limit number of xlsx files to read (for testing)")
  421 + parser.add_argument("--max-rows-per-file", type=int, default=None, help="Limit rows per xlsx file (for testing)")
  422 + parser.add_argument("--max-products", type=int, default=None, help="Limit number of SPU groups to output (for testing)")
  423 + args = parser.parse_args()
  424 +
  425 + if not os.path.isdir(args.input_dir):
  426 + raise RuntimeError("input-dir not found: {}".format(args.input_dir))
  427 + if not os.path.exists(args.template):
  428 + raise RuntimeError("template not found: {}".format(args.template))
  429 +
  430 + files = [os.path.join(args.input_dir, f) for f in os.listdir(args.input_dir) if f.lower().endswith(".xlsx")]
  431 + files.sort()
  432 + if args.max_files is not None:
  433 + files = files[: int(args.max_files)]
  434 +
  435 + print("Reading Amazon-format files: {} (from {})".format(len(files), args.input_dir), flush=True)
  436 +
  437 + groups = defaultdict(list)
  438 + seen_asin = set()
  439 +
  440 + for fp in files:
  441 + print(" - loading: {}".format(fp), flush=True)
  442 + try:
  443 + rows = read_amazon_rows_from_file(fp, max_rows=args.max_rows_per_file)
  444 + except Exception as e:
  445 + print("WARN: failed to read {}: {}".format(fp, e))
  446 + continue
  447 + print(" loaded rows: {}".format(len(rows)), flush=True)
  448 +
  449 + for r in rows:
  450 + asin = r.get("ASIN")
  451 + if asin in seen_asin:
  452 + continue
  453 + seen_asin.add(asin)
  454 + spu_id = r.get("父ASIN") or asin
  455 + groups[spu_id].append(r)
  456 +
  457 + print("Collected variants: {}, SPU groups: {}".format(len(seen_asin), len(groups)), flush=True)
  458 +
  459 + excel_rows = []
  460 + spu_count = 0
  461 +
  462 + for spu_id, variants in groups.items():
  463 + if not variants:
  464 + continue
  465 + spu_count += 1
  466 + if args.max_products is not None and spu_count > int(args.max_products):
  467 + break
  468 + if len(variants) == 1:
  469 + excel_rows.append(build_s_row(variants[0]))
  470 + else:
  471 + excel_rows.extend(build_m_p_rows(variants))
  472 +
  473 + print("Generated Excel rows: {} (SPU groups output: {})".format(len(excel_rows), min(spu_count, len(groups))), flush=True)
  474 + create_excel_from_template(args.template, args.output, excel_rows)
  475 +
  476 +
  477 +if __name__ == "__main__":
  478 + main()
  479 +
  480 +
... ...
scripts/competitor_xlsx_to_shoplazza_xlsx.py
1 1 #!/usr/bin/env python3
2 2 """
3   -Convert competitor Excel exports (with Parent/Child ASIN structure) into
4   -Shoplazza (店匠) product import Excel format based on `docs/商品导入模板.xlsx`.
5   -
6   -Data source:
7   -- Directory with multiple `Competitor-*.xlsx` files.
8   -- Each file contains a main sheet + "Notes" sheet.
9   -- Column meanings (sample):
10   - - ASIN: variant id (sku_id)
11   - - 父ASIN: product id (spu_id)
12   -
13   -Output:
14   -- For each 父ASIN group:
15   - - If only 1 ASIN: generate one "S" row
16   - - Else: generate one "M" row + multiple "P" rows
17   -
18   -Important:
19   -- Variant dimensions are parsed primarily from the `SKU` column:
20   - "Size: One Size | Color: Black"
21   - and mapped into 款式1/2/3.
  3 +DEPRECATED NAME (kept for backward compatibility).
  4 +
  5 +The input `products_data/*.xlsx` files are **Amazon-format exports** (with Parent/Child ASIN),
  6 +not “competitor data”. Please use:
  7 +
  8 + - `scripts/amazon_xlsx_to_shoplazza_xlsx.py`
  9 +
  10 +This script keeps the same logic but updates user-facing naming gradually.
22 11 """
23 12  
24 13 import os
... ... @@ -457,10 +446,10 @@ def build_m_p_rows(variant_rows):
457 446  
458 447  
459 448 def main():
460   - parser = argparse.ArgumentParser(description="Convert competitor xlsx files to Shoplazza import xlsx")
461   - parser.add_argument("--input-dir", default="data/mai_jia_jing_ling/products_data", help="Directory containing competitor xlsx files")
  449 + parser = argparse.ArgumentParser(description="Convert Amazon-format xlsx files to Shoplazza import xlsx (deprecated script name)")
  450 + parser.add_argument("--input-dir", default="data/mai_jia_jing_ling/products_data", help="Directory containing Amazon-format xlsx files")
462 451 parser.add_argument("--template", default="docs/商品导入模板.xlsx", help="Shoplazza import template xlsx")
463   - parser.add_argument("--output", default="competitor_shoplazza_import.xlsx", help="Output xlsx file path")
  452 + parser.add_argument("--output", default="amazon_shoplazza_import.xlsx", help="Output xlsx file path")
464 453 parser.add_argument("--max-files", type=int, default=None, help="Limit number of xlsx files to read (for testing)")
465 454 parser.add_argument("--max-rows-per-file", type=int, default=None, help="Limit rows per xlsx file (for testing)")
466 455 parser.add_argument("--max-products", type=int, default=None, help="Limit number of SPU groups to output (for testing)")
... ... @@ -477,7 +466,7 @@ def main():
477 466 if args.max_files is not None:
478 467 files = files[: int(args.max_files)]
479 468  
480   - print("Reading competitor files: {} (from {})".format(len(files), input_dir), flush=True)
  469 + print("Reading Amazon-format files: {} (from {})".format(len(files), input_dir), flush=True)
481 470  
482 471 groups = defaultdict(list) # spu_id -> [variant rows]
483 472 seen_asin = set()
... ...