1. Focus on Stable International/Regional Sources - Improved TED EU scraper (5 search strategies, 5 pages each) - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI) - De-prioritize unreliable UK gov sites (100% removal rate) 2. Archival Feature - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures - Cleanup script now preserves full tender snapshots before archiving - Gradual failure handling (3 retries before archiving) - No data loss - historical record preserved 3. Email Alerts - Daily digest (8am) - all new tenders from last 24h - High-value alerts (every 4h) - tenders >£100k - Professional HTML emails with all tender details - Configurable via environment variables Expected outcomes: - 50-100 stable tenders (vs 26 currently) - Zero 404 errors (archived data preserved) - Proactive notifications (no missed opportunities) - Historical archive for trend analysis Files: - scrapers/ted-eu.js (improved) - cleanup-with-archival.mjs (new) - send-tender-alerts.mjs (new) - migrations/add-archival-fields.sql (new) - THREE_IMPROVEMENTS_SUMMARY.md (documentation) All cron jobs updated for hourly scraping + daily cleanup + alerts
TenderRadar Scrapers
This directory contains scrapers for UK public procurement tender sources.
Scrapers
1. Contracts Finder (contracts-finder.js)
- Source: https://www.contractsfinder.service.gov.uk
- Coverage: England and non-devolved territories
- Method: JSON API
- Frequency: Every 4 hours (0, 4, 8, 12, 16, 20:00)
- Data Range: Last 30 days
- Status: ✅ Working
2. Find a Tender (find-tender.js)
- Source: https://www.find-tender.service.gov.uk
- Coverage: UK-wide above-threshold procurement notices
- Method: HTML scraping with pagination (5 pages)
- Frequency: Every 4 hours (0:10, 4:10, 8:10, 12:10, 16:10, 20:10)
- Status: ✅ Working
3. Public Contracts Scotland (pcs-scotland.js)
- Source: https://www.publiccontractsscotland.gov.uk
- Coverage: Scottish public sector tenders
- Method: HTML scraping
- Frequency: Every 4 hours (0:20, 4:20, 8:20, 12:20, 16:20, 20:20)
- Status: ✅ Working
4. Sell2Wales (sell2wales.js)
- Source: https://www.sell2wales.gov.wales
- Coverage: Welsh public sector tenders
- Method: HTML scraping
- Frequency: Every 4 hours (0:30, 4:30, 8:30, 12:30, 16:30, 20:30)
- Status: ✅ Working
Database Schema
All scrapers insert into the tenders table with the following key fields:
source: Identifier for the data source (contracts_finder, find_tender, pcs_scotland, sell2wales)source_id: Unique identifier from the source (used for deduplication via UNIQUE constraint)title: Tender titledescription: Full descriptionsummary: Shortened descriptionauthority_name: Publishing authoritylocation: Geographic locationpublished_date: When the tender was publisheddeadline: Application deadlinenotice_url: Link to full noticestatus: open/closed based on deadline
Running Scrapers
Individual Scraper
cd /home/peter/tenderpilot
node scrapers/contracts-finder.js
node scrapers/find-tender.js
node scrapers/pcs-scotland.js
node scrapers/sell2wales.js
All Scrapers
cd /home/peter/tenderpilot
./run-all-scrapers.sh
Cron Schedule
The scrapers run automatically every 4 hours, staggered by 10 minutes:
0 */4 * * * cd /home/peter/tenderpilot && node scrapers/contracts-finder.js >> /home/peter/tenderpilot/scraper.log 2>&1
10 */4 * * * cd /home/peter/tenderpilot && node scrapers/find-tender.js >> /home/peter/tenderpilot/scraper.log 2>&1
20 */4 * * * cd /home/peter/tenderpilot && node scrapers/pcs-scotland.js >> /home/peter/tenderpilot/scraper.log 2>&1
30 */4 * * * cd /home/peter/tenderpilot && node scrapers/sell2wales.js >> /home/peter/tenderpilot/scraper.log 2>&1
Monitoring
Check logs:
tail -f /home/peter/tenderpilot/scraper.log
Check database:
PGPASSWORD=tenderpilot123 psql -h localhost -U tenderpilot -d tenderpilot -c "SELECT source, COUNT(*) FROM tenders GROUP BY source;"
Rate Limiting & Ethical Scraping
All scrapers implement:
- Proper User-Agent headers identifying the service
- Rate limiting (2-5 second delays between requests)
- Pagination limits where applicable
- Respectful request patterns
Dependencies
- axios: HTTP client
- cheerio: HTML parsing (for web scrapers)
- pg: PostgreSQL client
- dotenv: Environment variables
Maintenance
- Scrapers use
ON CONFLICT (source_id) DO NOTHINGto avoid duplicates - Old scrapers can update existing records if needed
- Monitor for HTML structure changes on scraped sites
- API endpoints (Contracts Finder) are more stable than HTML scraping
Last Updated
2026-02-14 - Initial deployment with all four scrapers