BLOG
Notes from the team behind Runo.
Guides, deep-dives, and engineering notes on web scraping, anti-bot bypass, and structured extraction.
Schema design patterns for e-commerce extraction
Battle-tested schema patterns for product pages, category pages, reviews, and inventory. Edge cases, type choices, and the fields people forget.
Building a news aggregator with the /crawl endpoint
Walkthrough of a working news aggregator: source discovery, crawl configuration, dedup across sources, and a 24-hour ingest cadence that scales.
Headless browser fingerprinting in 2026: how detection works and what to do
A technical breakdown of the signals anti-bot services use to detect headless browsers, and the patches that close the gap.
Scraping Google SERP results in 2026: what works and what doesn't
Direct Google scraping is a losing battle in 2026. Here's the realistic landscape, the alternatives that work, and how to extract structured data from search results.
Sentiment analysis from product reviews: a practical pipeline
How to scrape product reviews at scale and turn them into actionable sentiment data. Schema design, aspect-based sentiment, and avoiding the common pitfalls.
Lead generation from public web data: a builder's guide
How to extract qualified leads from company websites, public directories, and structured registries without violating terms of service or privacy law.
Building a real-estate data pipeline with a scraping API
An end-to-end walkthrough of pulling listings from multiple real-estate portals into a normalized database. Schema design, dedup, refresh cadence, and cost.
Is web scraping legal in 2026? A practical guide for builders
What courts, regulators, and contracts actually say about scraping public web data, with the case law that shaped the current landscape and a working playbook.
Scraping JavaScript-heavy SPAs: Next.js, Nuxt, and React in 2026
Why plain HTTP fetching returns empty pages on modern frontends, what render targets work, and how to recover server-shipped data without a headless browser.
The complete guide to web scraping APIs in 2026
What a modern web scraping API actually does, how to evaluate one, and where each category (proxies, browsers, extractors) fits into a real pipeline.
Firecrawl vs Apify vs Runo: which scraping API to pick in 2026
An honest, side-by-side look at three popular scraping APIs. What each is built for, where each shines, and where each costs you time and money.
How to scrape Cloudflare-protected sites without getting blocked
A practical, layered approach to defeating Cloudflare's bot challenges in 2026. TLS fingerprints, hardened headless, cookie persistence, and when to escalate.
LLM extraction vs CSS selectors: why selector-based scraping is dead at scale
Selectors break when sites redesign. LLMs extract by semantic meaning. Here's why the tradeoff has flipped, with cost numbers from real workloads.
Extracting structured JSON from any HTML: a developer's guide
How to turn arbitrary web pages into typed JSON shaped to your schema. Covers schema design, type coercion, null handling, and edge cases.
Web scraping for AI agents: building the data layer for LLM apps
How to architect the scraping stack behind autonomous agents. Schema-typed data, low-latency tool calls, cost control, and error semantics that don't break the loop.
How to monitor competitor prices with a scraping API
A practical guide to building a competitor price monitoring pipeline. Schema design, change detection, alerting, and the legal and operational pitfalls.