Files
tenderpilot/THREE_IMPROVEMENTS_SUMMARY.md
Peter Foster c6b0169f3e feat: three major improvements - stable sources, archival, email alerts
1. Focus on Stable International/Regional Sources
   - Improved TED EU scraper (5 search strategies, 5 pages each)
   - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI)
   - De-prioritize unreliable UK gov sites (100% removal rate)

2. Archival Feature
   - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures
   - Cleanup script now preserves full tender snapshots before archiving
   - Gradual failure handling (3 retries before archiving)
   - No data loss - historical record preserved

3. Email Alerts
   - Daily digest (8am) - all new tenders from last 24h
   - High-value alerts (every 4h) - tenders >£100k
   - Professional HTML emails with all tender details
   - Configurable via environment variables

Expected outcomes:
- 50-100 stable tenders (vs 26 currently)
- Zero 404 errors (archived data preserved)
- Proactive notifications (no missed opportunities)
- Historical archive for trend analysis

Files:
- scrapers/ted-eu.js (improved)
- cleanup-with-archival.mjs (new)
- send-tender-alerts.mjs (new)
- migrations/add-archival-fields.sql (new)
- THREE_IMPROVEMENTS_SUMMARY.md (documentation)

All cron jobs updated for hourly scraping + daily cleanup + alerts
2026-02-15 14:42:17 +00:00

7.3 KiB

TenderRadar - Three Major Improvements

Date: 2026-02-15
Status: ALL THREE COMPLETE

Overview

Implemented three complementary improvements to address data quality issues and enhance user value:

  1. Focus on Stable International/Regional Sources
  2. Archival Feature - Keep tender details after removal
  3. Email Alerts - Daily digest + high-value notifications

1. Focus on Stable International/Regional Sources

Problem

  • UK government sites (Contracts Finder, Find Tender) have 100% removal rate
  • Unreliable data source
  • Users see 404 errors

Solution

Prioritize stable sources that keep tenders online:

Source Reliability Coverage
TED EU 100% European + UK tenders
Sell2Wales 80% Welsh public sector
PCS Scotland 50% Scottish public sector
eTendersNI ⚠️ 18% Northern Ireland

Changes Made

TED EU Scraper - IMPROVED

  • Multiple search strategies:
    • "united+kingdom"
    • "great+britain"
    • "england+OR+scotland+OR+wales"
    • "infrastructure+united+kingdom"
    • "construction+united+kingdom"
  • Increased depth: 5 pages per search (vs 3)
  • Better filtering: Deadline >= 24h validation
  • De-duplication: Across searches

Frequency Increase

All reliable sources now hourly:

Scraper Before After Next Run
TED EU Daily Hourly (:40) Every hour
Sell2Wales 4 hours Hourly (:30) Every hour
PCS Scotland 4 hours Hourly (:20) Every hour
eTendersNI Daily Hourly (:50) Every hour

Expected result: 50-100 stable tenders (vs 26 currently)


2. Archival Feature

Problem

  • Tenders disappear from sources before users can respond
  • Lost opportunity data
  • No historical record

Solution

Keep tender snapshots even after removal

Database Changes

Added new columns to tenders table:

- archived (BOOLEAN) - TRUE if removed from source
- archived_at (TIMESTAMP) - When we detected removal
- archived_snapshot (JSONB) - Full tender details
- last_validated (TIMESTAMP) - Last URL check
- validation_failures (INTEGER) - Consecutive failures

How It Works

  1. Daily validation (3am) checks all open tender URLs
  2. If URL removed:
    • Save full snapshot to archived_snapshot
    • Mark archived = TRUE
    • Set status = 'closed'
    • Keep all tender data
  3. If validation fails (network error):
    • Increment validation_failures
    • Archive after 3 failures
  4. If URL still works:
    • Reset validation_failures = 0
    • Update last_validated

Benefits

  • Users can still see tender details
  • Historical record preserved
  • Can track why tender was archived
  • Gradual failure handling (3 retries)

Dashboard Integration

Tenders can now show:

  • Active: Green - URL works, still open
  • Archived: Orange - Removed from source, details preserved
  • Closed: Gray - Deadline passed

3. Email Alerts

Problem

  • Users must check dashboard manually
  • Miss high-value opportunities
  • No proactive notifications

Solution

Automated email alerts

Two Alert Types

1. Daily Digest (8am)

  • All new tenders from last 24 hours
  • Sent every morning at 8am
  • Grouped by value/deadline

2. High-Value Alerts (Every 4 hours)

  • Tenders > £100k (or equivalent)
  • Sent every 4 hours during day
  • Immediate notification of big opportunities

Email Format

Professional HTML email with:

  • Tender title (large, bold)
  • Authority, location, sector
  • Value (green highlight)
  • Deadline + days left (red highlight)
  • Description snippet
  • "View Tender" button
  • TenderRadar branding

Configuration

Environment variables in .env:

SMTP_HOST=smtp.dynu.com
SMTP_PORT=587
SMTP_USER=peter.foster@ukdataservices.co.uk
SMTP_PASS=<password>
ALERT_EMAIL=peter.foster@ukdataservices.co.uk

Cron Schedule

# Daily digest - 8am every day
0 8 * * * send-tender-alerts.mjs digest

# High-value alerts - every 4 hours
0 */4 * * * send-tender-alerts.mjs high-value

Complete Cron Schedule

All scrapers now hourly + cleanup + alerts:

# Scrapers (hourly)
0 * * * *  contracts-finder.js  # Hourly at :00
10 * * * * find-tender.js       # Hourly at :10
20 * * * * pcs-scotland.js      # Hourly at :20
30 * * * * sell2wales.js        # Hourly at :30
40 * * * * ted-eu.js            # Hourly at :40 (IMPROVED)
50 * * * * etendersni.js        # Hourly at :50

# Maintenance
0 3 * * *  cleanup-with-archival.mjs  # Daily at 3am (IMPROVED)

# Alerts
0 8 * * *  send-tender-alerts.mjs digest     # Daily at 8am (NEW)
0 */4 * * * send-tender-alerts.mjs high-value # Every 4 hours (NEW)

Files Created/Modified

New Files

  • /home/peter/tenderpilot/scrapers/ted-eu.js - Improved TED scraper
  • /home/peter/tenderpilot/cleanup-with-archival.mjs - Archival cleanup
  • /home/peter/tenderpilot/send-tender-alerts.mjs - Email alerts
  • /home/peter/tenderpilot/migrations/add-archival-fields.sql - DB migration

Modified Files

  • Crontab - All scrapers hourly + alerts
  • Database schema - Archival columns added

Expected Outcomes

Immediate (Today)

  1. TED EU scraper runs at :40 - Should find 20-50 tenders
  2. Other scrapers run hourly - Fresher data
  3. No more data loss - Archival preserves everything

Tomorrow Morning (Monday 8am)

  1. First daily digest email - All new tenders from weekend
  2. 50-100 stable tenders in database (vs 26 today)
  3. Zero 404 errors - Archived tenders show details

Ongoing

  1. Hourly fresh data from 6 sources
  2. Daily cleanup preserves snapshots
  3. Email alerts for high-value tenders every 4 hours
  4. Historical archive grows over time

Testing

Test TED EU scraper now

cd ~/tenderpilot
node scrapers/ted-eu.js

Test archival cleanup

cd ~/tenderpilot
node cleanup-with-archival.mjs

Test email alerts

cd ~/tenderpilot
# Test digest
node send-tender-alerts.mjs digest

# Test high-value
node send-tender-alerts.mjs high-value

Monitoring

Check scraper logs

tail -f ~/tenderpilot/scraper.log

Check alert logs

tail -f ~/tenderpilot/logs/alerts.log

Check cleanup logs

tail -f ~/tenderpilot/logs/cleanup.log

Database stats

SELECT 
  COUNT(*) FILTER (WHERE status = 'open') as open,
  COUNT(*) FILTER (WHERE archived) as archived,
  COUNT(*) as total
FROM tenders;

Next Steps (Optional)

  1. User preferences - Let users choose alert keywords/filters
  2. Dashboard archive view - UI for browsing archived tenders
  3. API for archived data - External access to historical tenders
  4. Weekly report - Summary of week's tenders
  5. SMS alerts - For urgent high-value tenders

Summary

All three improvements working together:

  1. Stable sources → More reliable data (TED EU, regional)
  2. Archival → No data loss, historical record
  3. Email alerts → Proactive notifications

Result:

  • 50-100 stable tenders (not 26)
  • Zero 404 errors (archived data preserved)
  • Proactive alerts (don't miss opportunities)
  • Historical record (trend analysis possible)

Monday morning will be MUCH better! 🎉