Files

Peter Foster 685ac00f7c feat: implement TED EU scraper with Playwright

- Add Playwright browser automation for TED EU tender scraping
- Install playwright + chromium browser dependencies
- Scraper successfully finds UK-relevant EU tenders (~11 per run)
- Uses headless Chrome with keyword filtering
- Add SCRAPERS_STATUS.md documentation

All 6 main scrapers now operational (digital-marketplace API still down).
Total active tenders: 626

2026-02-15 13:28:54 +00:00

classify-sector.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

contracts-finder.js

Fix: use direct notice URLs for Contracts Finder and Sell2Wales instead of search fallbacks

2026-02-14 18:33:12 +00:00

digital-marketplace.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

etendersni.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

find-tender.js

fix: clean Apply Now URLs and disable TED demo scraper

2026-02-15 13:18:50 +00:00

pcs-scotland.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

README.md

feat: visual polish, nav login link, pricing badge fix, cursor fix, button contrast

2026-02-14 14:17:15 +00:00

sell2wales.js

Fix: Sell2Wales direct URL to use search_view.aspx with ID parameter

2026-02-14 18:36:16 +00:00

ted-eu.js

feat: implement TED EU scraper with Playwright

2026-02-15 13:28:54 +00:00

update-existing-sectors.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

README.md

TenderRadar Scrapers

This directory contains scrapers for UK public procurement tender sources.

Scrapers

1. Contracts Finder (`contracts-finder.js`)

Source: https://www.contractsfinder.service.gov.uk
Coverage: England and non-devolved territories
Method: JSON API
Frequency: Every 4 hours (0, 4, 8, 12, 16, 20:00)
Data Range: Last 30 days
Status: ✅ Working

2. Find a Tender (`find-tender.js`)

Source: https://www.find-tender.service.gov.uk
Coverage: UK-wide above-threshold procurement notices
Method: HTML scraping with pagination (5 pages)
Frequency: Every 4 hours (0:10, 4:10, 8:10, 12:10, 16:10, 20:10)
Status: ✅ Working

3. Public Contracts Scotland (`pcs-scotland.js`)

Source: https://www.publiccontractsscotland.gov.uk
Coverage: Scottish public sector tenders
Method: HTML scraping
Frequency: Every 4 hours (0:20, 4:20, 8:20, 12:20, 16:20, 20:20)
Status: ✅ Working

4. Sell2Wales (`sell2wales.js`)

Source: https://www.sell2wales.gov.wales
Coverage: Welsh public sector tenders
Method: HTML scraping
Frequency: Every 4 hours (0:30, 4:30, 8:30, 12:30, 16:30, 20:30)
Status: ✅ Working

Database Schema

All scrapers insert into the tenders table with the following key fields:

source: Identifier for the data source (contracts_finder, find_tender, pcs_scotland, sell2wales)
source_id: Unique identifier from the source (used for deduplication via UNIQUE constraint)
title: Tender title
description: Full description
summary: Shortened description
authority_name: Publishing authority
location: Geographic location
published_date: When the tender was published
deadline: Application deadline
notice_url: Link to full notice
status: open/closed based on deadline

Running Scrapers

Individual Scraper

cd /home/peter/tenderpilot
node scrapers/contracts-finder.js
node scrapers/find-tender.js
node scrapers/pcs-scotland.js
node scrapers/sell2wales.js

All Scrapers

cd /home/peter/tenderpilot
./run-all-scrapers.sh

Cron Schedule

The scrapers run automatically every 4 hours, staggered by 10 minutes:

0 */4 * * * cd /home/peter/tenderpilot && node scrapers/contracts-finder.js >> /home/peter/tenderpilot/scraper.log 2>&1
10 */4 * * * cd /home/peter/tenderpilot && node scrapers/find-tender.js >> /home/peter/tenderpilot/scraper.log 2>&1
20 */4 * * * cd /home/peter/tenderpilot && node scrapers/pcs-scotland.js >> /home/peter/tenderpilot/scraper.log 2>&1
30 */4 * * * cd /home/peter/tenderpilot && node scrapers/sell2wales.js >> /home/peter/tenderpilot/scraper.log 2>&1

Monitoring

Check logs:

tail -f /home/peter/tenderpilot/scraper.log

Check database:

PGPASSWORD=tenderpilot123 psql -h localhost -U tenderpilot -d tenderpilot -c "SELECT source, COUNT(*) FROM tenders GROUP BY source;"

Rate Limiting & Ethical Scraping

All scrapers implement:

Proper User-Agent headers identifying the service
Rate limiting (2-5 second delays between requests)
Pagination limits where applicable
Respectful request patterns

Dependencies

axios: HTTP client
cheerio: HTML parsing (for web scrapers)
pg: PostgreSQL client
dotenv: Environment variables

Maintenance

Scrapers use ON CONFLICT (source_id) DO NOTHING to avoid duplicates
Old scrapers can update existing records if needed
Monitor for HTML structure changes on scraped sites
API endpoints (Contracts Finder) are more stable than HTML scraping

Last Updated

2026-02-14 - Initial deployment with all four scrapers

README.md

TenderRadar Scrapers

Scrapers

1. Contracts Finder (contracts-finder.js)

2. Find a Tender (find-tender.js)

3. Public Contracts Scotland (pcs-scotland.js)

4. Sell2Wales (sell2wales.js)

Database Schema

Running Scrapers

Individual Scraper

All Scrapers

Cron Schedule

Monitoring

Rate Limiting & Ethical Scraping

Dependencies

Maintenance

Last Updated

1. Contracts Finder (`contracts-finder.js`)

2. Find a Tender (`find-tender.js`)

3. Public Contracts Scotland (`pcs-scotland.js`)

4. Sell2Wales (`sell2wales.js`)