Files

Peter Foster c6b0169f3e feat: three major improvements - stable sources, archival, email alerts

1. Focus on Stable International/Regional Sources
   - Improved TED EU scraper (5 search strategies, 5 pages each)
   - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI)
   - De-prioritize unreliable UK gov sites (100% removal rate)

2. Archival Feature
   - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures
   - Cleanup script now preserves full tender snapshots before archiving
   - Gradual failure handling (3 retries before archiving)
   - No data loss - historical record preserved

3. Email Alerts
   - Daily digest (8am) - all new tenders from last 24h
   - High-value alerts (every 4h) - tenders >£100k
   - Professional HTML emails with all tender details
   - Configurable via environment variables

Expected outcomes:
- 50-100 stable tenders (vs 26 currently)
- Zero 404 errors (archived data preserved)
- Proactive notifications (no missed opportunities)
- Historical archive for trend analysis

Files:
- scrapers/ted-eu.js (improved)
- cleanup-with-archival.mjs (new)
- send-tender-alerts.mjs (new)
- migrations/add-archival-fields.sql (new)
- THREE_IMPROVEMENTS_SUMMARY.md (documentation)

All cron jobs updated for hourly scraping + daily cleanup + alerts

2026-02-15 14:42:17 +00:00

classify-sector.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

contracts-finder.js

feat: major scraper improvements - all 3 enhancements

2026-02-15 14:30:41 +00:00

contracts-finder.js.backup

feat: three major improvements - stable sources, archival, email alerts

2026-02-15 14:42:17 +00:00

digital-marketplace.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

etendersni.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

find-tender.js

fix: clean Apply Now URLs and disable TED demo scraper

2026-02-15 13:18:50 +00:00

find-tender.js.bak

feat: three major improvements - stable sources, archival, email alerts

2026-02-15 14:42:17 +00:00

pcs-scotland.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

README.md

feat: visual polish, nav login link, pricing badge fix, cursor fix, button contrast

2026-02-14 14:17:15 +00:00

sell2wales.js

Fix: Sell2Wales direct URL to use search_view.aspx with ID parameter

2026-02-14 18:36:16 +00:00

ted-eu.js

feat: three major improvements - stable sources, archival, email alerts

2026-02-15 14:42:17 +00:00

update-existing-sectors.js

Add sector classification module, integrate into all 7 scrapers, fix CF pagination

2026-02-14 17:12:51 +00:00

README.md

TenderRadar Scrapers

This directory contains scrapers for UK public procurement tender sources.

Scrapers

1. Contracts Finder (`contracts-finder.js`)

Source: https://www.contractsfinder.service.gov.uk
Coverage: England and non-devolved territories
Method: JSON API
Frequency: Every 4 hours (0, 4, 8, 12, 16, 20:00)
Data Range: Last 30 days
Status: ✅ Working

2. Find a Tender (`find-tender.js`)

Source: https://www.find-tender.service.gov.uk
Coverage: UK-wide above-threshold procurement notices
Method: HTML scraping with pagination (5 pages)
Frequency: Every 4 hours (0:10, 4:10, 8:10, 12:10, 16:10, 20:10)
Status: ✅ Working

3. Public Contracts Scotland (`pcs-scotland.js`)

Source: https://www.publiccontractsscotland.gov.uk
Coverage: Scottish public sector tenders
Method: HTML scraping
Frequency: Every 4 hours (0:20, 4:20, 8:20, 12:20, 16:20, 20:20)
Status: ✅ Working

4. Sell2Wales (`sell2wales.js`)

Source: https://www.sell2wales.gov.wales
Coverage: Welsh public sector tenders
Method: HTML scraping
Frequency: Every 4 hours (0:30, 4:30, 8:30, 12:30, 16:30, 20:30)
Status: ✅ Working

Database Schema

All scrapers insert into the tenders table with the following key fields:

source: Identifier for the data source (contracts_finder, find_tender, pcs_scotland, sell2wales)
source_id: Unique identifier from the source (used for deduplication via UNIQUE constraint)
title: Tender title
description: Full description
summary: Shortened description
authority_name: Publishing authority
location: Geographic location
published_date: When the tender was published
deadline: Application deadline
notice_url: Link to full notice
status: open/closed based on deadline

Running Scrapers

Individual Scraper

cd /home/peter/tenderpilot
node scrapers/contracts-finder.js
node scrapers/find-tender.js
node scrapers/pcs-scotland.js
node scrapers/sell2wales.js

All Scrapers

cd /home/peter/tenderpilot
./run-all-scrapers.sh

Cron Schedule

The scrapers run automatically every 4 hours, staggered by 10 minutes:

0 */4 * * * cd /home/peter/tenderpilot && node scrapers/contracts-finder.js >> /home/peter/tenderpilot/scraper.log 2>&1
10 */4 * * * cd /home/peter/tenderpilot && node scrapers/find-tender.js >> /home/peter/tenderpilot/scraper.log 2>&1
20 */4 * * * cd /home/peter/tenderpilot && node scrapers/pcs-scotland.js >> /home/peter/tenderpilot/scraper.log 2>&1
30 */4 * * * cd /home/peter/tenderpilot && node scrapers/sell2wales.js >> /home/peter/tenderpilot/scraper.log 2>&1

Monitoring

Check logs:

tail -f /home/peter/tenderpilot/scraper.log

Check database:

PGPASSWORD=tenderpilot123 psql -h localhost -U tenderpilot -d tenderpilot -c "SELECT source, COUNT(*) FROM tenders GROUP BY source;"

Rate Limiting & Ethical Scraping

All scrapers implement:

Proper User-Agent headers identifying the service
Rate limiting (2-5 second delays between requests)
Pagination limits where applicable
Respectful request patterns

Dependencies

axios: HTTP client
cheerio: HTML parsing (for web scrapers)
pg: PostgreSQL client
dotenv: Environment variables

Maintenance

Scrapers use ON CONFLICT (source_id) DO NOTHING to avoid duplicates
Old scrapers can update existing records if needed
Monitor for HTML structure changes on scraped sites
API endpoints (Contracts Finder) are more stable than HTML scraping

Last Updated

2026-02-14 - Initial deployment with all four scrapers

README.md

TenderRadar Scrapers

Scrapers

1. Contracts Finder (contracts-finder.js)

2. Find a Tender (find-tender.js)

3. Public Contracts Scotland (pcs-scotland.js)

4. Sell2Wales (sell2wales.js)

Database Schema

Running Scrapers

Individual Scraper

All Scrapers

Cron Schedule

Monitoring

Rate Limiting & Ethical Scraping

Dependencies

Maintenance

Last Updated

1. Contracts Finder (`contracts-finder.js`)

2. Find a Tender (`find-tender.js`)

3. Public Contracts Scotland (`pcs-scotland.js`)

4. Sell2Wales (`sell2wales.js`)