Files
Peter Foster c6b0169f3e feat: three major improvements - stable sources, archival, email alerts
1. Focus on Stable International/Regional Sources
   - Improved TED EU scraper (5 search strategies, 5 pages each)
   - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI)
   - De-prioritize unreliable UK gov sites (100% removal rate)

2. Archival Feature
   - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures
   - Cleanup script now preserves full tender snapshots before archiving
   - Gradual failure handling (3 retries before archiving)
   - No data loss - historical record preserved

3. Email Alerts
   - Daily digest (8am) - all new tenders from last 24h
   - High-value alerts (every 4h) - tenders >£100k
   - Professional HTML emails with all tender details
   - Configurable via environment variables

Expected outcomes:
- 50-100 stable tenders (vs 26 currently)
- Zero 404 errors (archived data preserved)
- Proactive notifications (no missed opportunities)
- Historical archive for trend analysis

Files:
- scrapers/ted-eu.js (improved)
- cleanup-with-archival.mjs (new)
- send-tender-alerts.mjs (new)
- migrations/add-archival-fields.sql (new)
- THREE_IMPROVEMENTS_SUMMARY.md (documentation)

All cron jobs updated for hourly scraping + daily cleanup + alerts
2026-02-15 14:42:17 +00:00
..

TenderRadar Scrapers

This directory contains scrapers for UK public procurement tender sources.

Scrapers

1. Contracts Finder (contracts-finder.js)

  • Source: https://www.contractsfinder.service.gov.uk
  • Coverage: England and non-devolved territories
  • Method: JSON API
  • Frequency: Every 4 hours (0, 4, 8, 12, 16, 20:00)
  • Data Range: Last 30 days
  • Status: Working

2. Find a Tender (find-tender.js)

  • Source: https://www.find-tender.service.gov.uk
  • Coverage: UK-wide above-threshold procurement notices
  • Method: HTML scraping with pagination (5 pages)
  • Frequency: Every 4 hours (0:10, 4:10, 8:10, 12:10, 16:10, 20:10)
  • Status: Working

3. Public Contracts Scotland (pcs-scotland.js)

4. Sell2Wales (sell2wales.js)

  • Source: https://www.sell2wales.gov.wales
  • Coverage: Welsh public sector tenders
  • Method: HTML scraping
  • Frequency: Every 4 hours (0:30, 4:30, 8:30, 12:30, 16:30, 20:30)
  • Status: Working

Database Schema

All scrapers insert into the tenders table with the following key fields:

  • source: Identifier for the data source (contracts_finder, find_tender, pcs_scotland, sell2wales)
  • source_id: Unique identifier from the source (used for deduplication via UNIQUE constraint)
  • title: Tender title
  • description: Full description
  • summary: Shortened description
  • authority_name: Publishing authority
  • location: Geographic location
  • published_date: When the tender was published
  • deadline: Application deadline
  • notice_url: Link to full notice
  • status: open/closed based on deadline

Running Scrapers

Individual Scraper

cd /home/peter/tenderpilot
node scrapers/contracts-finder.js
node scrapers/find-tender.js
node scrapers/pcs-scotland.js
node scrapers/sell2wales.js

All Scrapers

cd /home/peter/tenderpilot
./run-all-scrapers.sh

Cron Schedule

The scrapers run automatically every 4 hours, staggered by 10 minutes:

0 */4 * * * cd /home/peter/tenderpilot && node scrapers/contracts-finder.js >> /home/peter/tenderpilot/scraper.log 2>&1
10 */4 * * * cd /home/peter/tenderpilot && node scrapers/find-tender.js >> /home/peter/tenderpilot/scraper.log 2>&1
20 */4 * * * cd /home/peter/tenderpilot && node scrapers/pcs-scotland.js >> /home/peter/tenderpilot/scraper.log 2>&1
30 */4 * * * cd /home/peter/tenderpilot && node scrapers/sell2wales.js >> /home/peter/tenderpilot/scraper.log 2>&1

Monitoring

Check logs:

tail -f /home/peter/tenderpilot/scraper.log

Check database:

PGPASSWORD=tenderpilot123 psql -h localhost -U tenderpilot -d tenderpilot -c "SELECT source, COUNT(*) FROM tenders GROUP BY source;"

Rate Limiting & Ethical Scraping

All scrapers implement:

  • Proper User-Agent headers identifying the service
  • Rate limiting (2-5 second delays between requests)
  • Pagination limits where applicable
  • Respectful request patterns

Dependencies

  • axios: HTTP client
  • cheerio: HTML parsing (for web scrapers)
  • pg: PostgreSQL client
  • dotenv: Environment variables

Maintenance

  • Scrapers use ON CONFLICT (source_id) DO NOTHING to avoid duplicates
  • Old scrapers can update existing records if needed
  • Monitor for HTML structure changes on scraped sites
  • API endpoints (Contracts Finder) are more stable than HTML scraping

Last Updated

2026-02-14 - Initial deployment with all four scrapers