Product sources
Breezaro answers product questions with live prices, availability and images. The accuracy of those answers depends entirely on what we read from your shop. This page explains the three ways we can read your catalogue, and how we estimate products on custom e-shops that don't publish a feed.
Supported sources
Shopify Public Feed
Reads /products.json from your storefront URL — the same public feed any visitor can hit. No login or API token required.
Paste your storefront URL (e.g. shop.example.com) into the wizard.
Best for: any Shopify shop.
XML Feed
Standard product feeds: Heureka, Google Shopping, Glami, Zboží.cz, and similar formats.
Paste the public feed URL into the wizard.
Best for: shops that already publish a marketing feed.
Crawler with product detection
Visits the public pages of your shop, reads structured product data (JSON-LD, microdata, dataLayer) and Open Graph tags, and assembles the catalogue from there.
Paste your base URL and opt-in to the first crawl.
Best for: custom or platform-less e-shops without a feed.
How we estimate products on custom e-shops
When you connect the crawler, we visit every reachable page and try to recognise products from the signals each page exposes. Detection runs in tiers — the first match wins.
What the crawler visits
- Every reachable page from the configured base URL.
- Respects robots.txt — disallowed paths are skipped.
- SSRF-safe: rejects internal/private targets before any request leaves our network.
Detection tiers
- Tier 1 — HIGH confidence
JSON-LD (schema.org/Product) → Microdata (itemtype="…/Product") → dataLayer (GA4 / GTM e-commerce events). First complete match wins.
- Tier 2 — MEDIUM confidence
Open Graph (og:type=product) corroborated by a partial Tier-1 signal. Tier-1 fields take precedence on conflict.
- Tier 2 — LOW confidence
Open Graph alone, when no Tier-1 signal exists. Many themes ship only OG product extensions — without this fallback, those shops would be 0% covered.
Required fields
A page is accepted as a product only if we can read a title, a price, and a currency. Image is best-effort: extractor signal → DOM lookup → live Playwright lookup against the post-JS DOM.
Currency fallback
When the page doesn't expose a currency, we infer it from the base URL TLD (e.g. .cz → CZK).
Limits and sync cadence
- 01Each tenant can hold up to 5,000 crawler-detected products. Larger catalogues should use a feed source instead.
- 02Nightly sync watches already-detected products (price, availability, images). New products only appear after a fresh full crawl.
- 03You can pause, resume, or abort any crawl run from the Sources screen.