Sources

Product catalog

Breezaro answers product questions with live prices, availability and images. The accuracy of those answers depends entirely on what we read from your shop. This page explains the three ways we can read your catalogue, and how we estimate products on custom e-shops that don't publish a feed.

Sources

Supported sources

Shopify Public Feed

Reads /products.json from your storefront URL — the same public feed any visitor can hit. No login or API token required.

Paste your storefront URL (e.g. shop.example.com) into the wizard.

Best for: any Shopify shop.

XML Feed

Standard product feeds: Heureka, Google Shopping, Glami, Zboží.cz, and similar formats.

Paste the public feed URL into the wizard.

Best for: shops that already publish a marketing feed.

Crawler with product detection

Visits the public pages of your shop, reads structured product data (JSON-LD, microdata, dataLayer) and Open Graph tags, and assembles the catalogue from there.

Paste your base URL and opt-in to the first crawl.

Best for: custom or platform-less e-shops without a feed.

Behaviour

How we estimate products on custom e-shops

When you connect the crawler, we visit every reachable page and try to recognise products from the signals each page exposes. Detection runs in tiers — the first match wins.

What the crawler visits

Every reachable page from the configured base URL.
Respects robots.txt — disallowed paths are skipped.
SSRF-safe: rejects internal/private targets before any request leaves our network.

Detection tiers

Tier 1 — HIGH confidence
JSON-LD (schema.org/Product) → Microdata (itemtype="…/Product") → dataLayer (GA4 / GTM e-commerce events). First complete match wins.
Tier 2 — MEDIUM confidence
Open Graph (og:type=product) corroborated by a partial Tier-1 signal. Tier-1 fields take precedence on conflict.
Tier 2 — LOW confidence
Open Graph alone, when no Tier-1 signal exists. Many themes ship only OG product extensions — without this fallback, those shops would be 0% covered.

Required fields

A page is accepted as a product only if we can read a title, a price, and a currency. Image is best-effort: extractor signal → DOM lookup → live Playwright lookup against the post-JS DOM.

Currency fallback

When the page doesn't expose a currency, we infer it from the base URL TLD (e.g. .cz → CZK).

Behaviour

Limits and sync cadence

01Each tenant can hold up to 10,000 crawler-detected products. Larger catalogues should use a feed source instead.
02Nightly sync (and the manual Sync button) keeps already-detected products current — price, availability, images — and removes products that disappear.
03You can pause, resume, or abort any crawl run from the Sources screen.

Behaviour

Sitemap (for discovering new products)

01We keep your catalogue up to date every night, and on a manual Sync.
02To also pick up newly added products automatically, your site should publish a sitemap.xml that lists product URLs. With one, new products are discovered on each sync.
03Without a sitemap, existing products are still kept current, but new products are not added on their own — re-add the source to run a full re-scan.
04Most e-commerce platforms generate a product sitemap by default.

Privacy