Merging Amazon synonym and AMALYZER data into one sheet

By Episode 16 there are two working artefacts: a synonym sheet built from eleven harvest sources, and an AMALYZER export of the keywords the ASIN (or the cohort benchmark) actually ranks for. Each contains things the other doesn't. The merge step combines them into the single source of truth Module 8 (writing) reads from.

What each stream contains

Synonym stream. Wide, includes aspirational keywords the listing doesn't yet rank for. Carries source attribution, intent classification and editorial context (audience markers, use cases).
AMALYZER stream. Narrow but verified — keywords with current rank data, current volume, and current indexation status. The reality check on what's actually achievable.

The join key and the merge rules

Normalise both sides. Lowercase, strip punctuation, collapse multiple spaces, stem plurals. Both sheets need the same key format before the join.
Outer join on the normalised keyword. The output sheet should preserve rows from both sides — the keywords that match (in both), the keywords that are aspirational (synonym side only), and the keywords that are surprise rankings (AMALYZER side only).
Tag each row by source bucket. "Both" rows are the highest-confidence working keywords. "Synonym only" rows are write-in candidates needing copy placement. "AMALYZER only" rows are accidental rankings — investigate why, often a clue to a missing category-attribute fill.
Carry forward all columns. Source attribution from the synonym side; rank, volume, indexation from the AMALYZER side; the evaluation columns from Episode 14.

What the merged sheet looks like

One row per unique normalised keyword. Columns:

keyword — the search-friendly form chosen as canonical.
source_bucket — both | synonym | amalyzer.
harvest_sources — comma-separated list of Episode-03-through-13 sources that produced it.
volume_monthly — from the AMALYZER side; fall back to harvest-side volume where missing.
current_rank — null for synonym-only rows; the current organic position otherwise.
indexed — true / false / unknown.
intent_class — head | use-case | audience | comparison | misspelling.
cluster_id — assigned in Episode 14, preserved through the merge.
position_assignment — title-front | title-back | bullet | backend | watch.

The three diagnostic views the merged sheet enables

Quick wins. Filter for source_bucket = both AND current_rank between 11 and 30. These are keywords with proven indexation and proximity to the first page — copy adjustments can vault them in days.
Surprise rankings. Filter for source_bucket = amalyzer. These are keywords the listing ranks for without anyone planning it. Usually a sign of strong category fit; sometimes a sign of irrelevant traffic that hurts conversion.
Aspirational priorities. Filter for source_bucket = synonym AND volume_monthly above the category threshold. These are the working list for the writer — keywords with proven demand that the listing isn't yet capturing.

Common merge mistakes

Inner-joining instead of outer-joining. Drops the synonym-only rows that are the entire point of the harvest.
Skipping normalisation. "non-stick" vs "nonstick" vs "non stick" become three rows instead of one. The downstream writer makes inconsistent placement decisions.
Manually copying instead of joining. Loses the column-level data lineage and makes refreshes impossible. The next time the AMALYZER data refreshes, the merge has to be redone from scratch.
Letting the position_assignment column drift. Episode 14's bucketing is the input; the merge must preserve it. If the merge process re-runs the bucketing automatically, the writer's manual adjustments get overwritten.

The handoff to Episode 18

The merged sheet is data, not yet a brief. Episode 18 covers the structuring step — turning the sheet into the structured input the Module 8 writer reads as instructions: title character budget per keyword, bullet anchor assignments, backend search-term byte budget, A+ alt-text targets. The merge stops with the sheet; the structuring step turns the sheet into a write-ready brief.

Merging synonym and AMALYZER data — one sheet, one source of truth.