Methodology

How Burnish computes the AI Commerce Index

The point of publishing this page is simple: the score needs to be explainable. Burnish separates what can move immediately inside the catalog from what takes time to change across AI engines.

Hero metric

AI Commerce Index

A 0–100 index shown prominently to merchants, with the underlying Readiness and Visibility components always visible instead of hidden.

Why two scores

Honest movement, not fake movement

A single score that jumps the moment a fix is applied would be misleading. Burnish keeps the split visible so merchants know what changed now and what will show up later.

Readiness score

Rubric-based and merchant-actionable

Readiness is calculated directly from catalog state. It moves when merchants improve fields, structure, and data quality inside Shopify.

Readiness is rubric-based and recalculates immediately after approved catalog fixes.
The rubric is weighted toward merchant-controlled fields rather than vanity metrics.
The goal is a score merchants can improve through real work inside Shopify.

Published weights

Factor	Weight
Metafield completeness	20%
Description quality	15%
Alt text coverage	10%
Structured data / schema.org	15%
Category / taxonomy correctness	10%
Title optimization	10%
Google Merchant Center readiness	10%
Variant consistency	5%
llms.txt presence	5%

Visibility score

Empirical and anchored in real sampling

Visibility is measured through sampled shopper-style prompts and model responses. It reflects how AI engines actually surface the merchant, not how good the catalog looks in isolation.

01
Generate shopper-style prompts grounded in the merchant catalog, category, and competitors.
02
Query official APIs across ChatGPT, Perplexity, Gemini, and Claude depending on tier.
03
Multi-sample responses to reduce non-deterministic noise.
04
Detect mentions against catalog-grounded brand and product context.
05
Normalize results into per-engine visibility measurements with confidence intervals.

Confidence

Visibility is reported with confidence intervals so the merchant sees signal quality, not just a naked score.

Model anchoring

Measurements are tied to model versions so merchants know when provider changes require a fresh baseline.

Raw access

The methodology is designed to support explainability, including the ability to trace back into sampled queries and response evidence.

Caveats

Visibility changes lag behind fixes. Merchants should expect a 2–6 week delay as engines re-index.
Visibility is a statistical estimate, not a promise of placement.
Attribution is directional and operational, not a randomized controlled-trial claim.

Providers and data handling

Only approved model providers are used, through commercial API agreements.
Provider access is limited to the product and catalog data required for scoring and fix generation.
Merchants are given clear disclosure around data handling and provider use before the workflow expands.

Benchmark publication

F1 publication is part of the product promise

Burnish is designed to publish brand-disambiguation benchmark results and refresh them over time. The production benchmark is still being finalized, so this page intentionally sets the framework first and the measured number second.

Join the first cohort if you want to see the methodology mature alongside the product.