Back to Specs
Technical Spec

Scraping the Unscrappable: High-Performance Government Tender Extraction in Rust

RoleBackend Engineer
TimelineAug 2025 - Jan 2026
Tech Stack
RustChrome DevTools ProtocolAsyncMultithreadingWeb Scraping

The Challenge

Clients needed structured, up-to-date tender data from Etimad Tenders (Saudi Arabia) and Bury Council (UK). Both portals were fully client-side rendered, protected against automated access, and frequently changing their DOM structure and pagination logic. Scale mattered — scraping needed to be fast enough for daily business use.

Key Constraints

  • Portals were fully client-side rendered — no static HTML to parse.
  • Aggressive bot protection on government portals.
  • DOM structure and pagination changed frequently without notice.
  • Data needed to be production-ready in CSV/JSON for downstream business use.

Why Rust? Most scraping is done in Python. Rust was chosen deliberately — for memory safety, zero-cost abstractions, and native async/multithreading support that Python simply can't match at scale.

Why Chrome DevTools Protocol? CDP gives direct programmatic control over a real Chromium browser instance — meaning JavaScript executes fully, dynamic content loads, and the scraper behaves like a real user. This bypassed the rendering limitations of lightweight HTTP scrapers.

What Was Built

  • A CDP-based browser automation engine in Rust that controlled headless Chromium
  • Intelligent pagination handling that adapted to DOM structure changes
  • Async + multithreaded pipelines that processed multiple pages simultaneously
  • Extractors for titles, deadlines, and attached documents across both portals
  • Export pipelines delivering clean CSV and JSON datasets ready for downstream use
  • Similar scrapers for platforms like LinkedIn, TripAdvisor, and Instagram

Engineering Tradeoffs

Rust vs Python for Scraping
Pros

Memory safety, zero-cost abstractions, native async/multithreading — ~50% faster scraping.

Cons

Steeper learning curve and longer initial development time compared to Python.

CDP vs Lightweight HTTP Scrapers
Pros

Full JavaScript execution, behaves like a real user, bypasses bot protection effectively.

Cons

Heavier resource usage — requires running a full Chromium instance.

Impact & Outcome

Scraping speed improved by ~50% over baseline using async + multithreaded pipelines.

Successfully extracted tender data from two heavily protected government portals across two countries.

Delivered structured CSV and JSON datasets ready for immediate downstream business use.