1. Focus on Stable International/Regional Sources - Improved TED EU scraper (5 search strategies, 5 pages each) - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI) - De-prioritize unreliable UK gov sites (100% removal rate) 2. Archival Feature - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures - Cleanup script now preserves full tender snapshots before archiving - Gradual failure handling (3 retries before archiving) - No data loss - historical record preserved 3. Email Alerts - Daily digest (8am) - all new tenders from last 24h - High-value alerts (every 4h) - tenders >£100k - Professional HTML emails with all tender details - Configurable via environment variables Expected outcomes: - 50-100 stable tenders (vs 26 currently) - Zero 404 errors (archived data preserved) - Proactive notifications (no missed opportunities) - Historical archive for trend analysis Files: - scrapers/ted-eu.js (improved) - cleanup-with-archival.mjs (new) - send-tender-alerts.mjs (new) - migrations/add-archival-fields.sql (new) - THREE_IMPROVEMENTS_SUMMARY.md (documentation) All cron jobs updated for hourly scraping + daily cleanup + alerts
20 lines
996 B
SQL
20 lines
996 B
SQL
-- Add archival feature to tenders table
|
|
-- Allows keeping tender details even after source removes them
|
|
|
|
ALTER TABLE tenders
|
|
ADD COLUMN IF NOT EXISTS archived BOOLEAN DEFAULT FALSE,
|
|
ADD COLUMN IF NOT EXISTS archived_at TIMESTAMP,
|
|
ADD COLUMN IF NOT EXISTS archived_snapshot JSONB,
|
|
ADD COLUMN IF NOT EXISTS last_validated TIMESTAMP,
|
|
ADD COLUMN IF NOT EXISTS validation_failures INTEGER DEFAULT 0;
|
|
|
|
-- Create index for archival queries
|
|
CREATE INDEX IF NOT EXISTS idx_tenders_archived ON tenders(archived) WHERE archived = TRUE;
|
|
CREATE INDEX IF NOT EXISTS idx_tenders_last_validated ON tenders(last_validated);
|
|
|
|
-- Add comment
|
|
COMMENT ON COLUMN tenders.archived IS 'TRUE if tender removed from source but we keep snapshot';
|
|
COMMENT ON COLUMN tenders.archived_snapshot IS 'Full tender details saved when first scraped';
|
|
COMMENT ON COLUMN tenders.last_validated IS 'Last time we verified URL still works';
|
|
COMMENT ON COLUMN tenders.validation_failures IS 'Number of consecutive validation failures';
|