Sources

Product sources

Breezaro answers product questions with live prices, availability and images. The accuracy of those answers depends entirely on what we read from your shop. This page explains the three ways we can read your catalogue, and how we estimate products on custom e-shops that don't publish a feed.

Sources

Supported sources

Shopify Public Feed

Reads /products.json from your storefront URL — the same public feed any visitor can hit. No login or API token required.

Paste your storefront URL (e.g. shop.example.com) into the wizard.

Best for: any Shopify shop.

XML Feed

Standard product feeds: Heureka, Google Shopping, Glami, Zboží.cz, and similar formats.

Paste the public feed URL into the wizard.

Best for: shops that already publish a marketing feed.

Crawler with product detection

Visits the public pages of your shop, reads structured product data (JSON-LD, microdata, dataLayer) and Open Graph tags, and assembles the catalogue from there.

Paste your base URL and opt-in to the first crawl.

Best for: custom or platform-less e-shops without a feed.

Behaviour

How we estimate products on custom e-shops

When you connect the crawler, we visit every reachable page and try to recognise products from the signals each page exposes. Detection runs in tiers — the first match wins.

What the crawler visits

  • Every reachable page from the configured base URL.
  • Respects robots.txt — disallowed paths are skipped.
  • SSRF-safe: rejects internal/private targets before any request leaves our network.

Detection tiers

  • Tier 1 — HIGH confidence

    JSON-LD (schema.org/Product) → Microdata (itemtype="…/Product") → dataLayer (GA4 / GTM e-commerce events). First complete match wins.

  • Tier 2 — MEDIUM confidence

    Open Graph (og:type=product) corroborated by a partial Tier-1 signal. Tier-1 fields take precedence on conflict.

  • Tier 2 — LOW confidence

    Open Graph alone, when no Tier-1 signal exists. Many themes ship only OG product extensions — without this fallback, those shops would be 0% covered.

Required fields

A page is accepted as a product only if we can read a title, a price, and a currency. Image is best-effort: extractor signal → DOM lookup → live Playwright lookup against the post-JS DOM.

Currency fallback

When the page doesn't expose a currency, we infer it from the base URL TLD (e.g. .cz → CZK).

Behaviour

Limits and sync cadence

  • 01Each tenant can hold up to 5,000 crawler-detected products. Larger catalogues should use a feed source instead.
  • 02Nightly sync watches already-detected products (price, availability, images). New products only appear after a fresh full crawl.
  • 03You can pause, resume, or abort any crawl run from the Sources screen.
Privacy

Privacy & safety