
RealWork is a public-spending intelligence pipeline built on live web data from Bright Data. We ingested 100,000+ LA City payment records, cross-referenced 500 California nonprofit grant recipients against their IRS Form 990 filings, and scanned 50,000 state purchase orders for threshold-structuring patterns. Then we built a second capability: a PDF extraction engine that turns LA County's 1,652 unqueryable scanned homeless-service invoices into a live, public, structured dataset. What Bright Data enables: CA Secretary of State blocks ordinary scrapers. ProPublica's 990 archive requires bypass. The state's procurement dashboard is a JS-rendered SPA. Web Unlocker, SERP API, and Scraping Browser are what make cross-source verification possible at scale. What the pipeline found: 12 Cal Fire purchase orders at exactly $49,950 — $50 below the competitive-bidding threshold — signed by a single procurement officer in a 17-day window. 36 nonprofit grant recipients with officer compensation anomalies flagged from public 990 filings. And a live extraction of $3M+ in homeless-service billing data that LA County publishes as scanned PDFs nobody can query. The extraction pipeline: Gemini 2.5 Flash reads scanned PDFs natively — no OCR stack — and returns structured JSON (vendor, date, billed amount, deliverables). The ledger is public on GitHub and growing as the pipeline runs. Every finding cites public records. No fraud conclusions — anomalies warranting investigation by oversight bodies with subpoena authority.
31 May 2026