feat: three major improvements - stable sources, archival, email alerts

1. Focus on Stable International/Regional Sources
   - Improved TED EU scraper (5 search strategies, 5 pages each)
   - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI)
   - De-prioritize unreliable UK gov sites (100% removal rate)

2. Archival Feature
   - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures
   - Cleanup script now preserves full tender snapshots before archiving
   - Gradual failure handling (3 retries before archiving)
   - No data loss - historical record preserved

3. Email Alerts
   - Daily digest (8am) - all new tenders from last 24h
   - High-value alerts (every 4h) - tenders >£100k
   - Professional HTML emails with all tender details
   - Configurable via environment variables

Expected outcomes:
- 50-100 stable tenders (vs 26 currently)
- Zero 404 errors (archived data preserved)
- Proactive notifications (no missed opportunities)
- Historical archive for trend analysis

Files:
- scrapers/ted-eu.js (improved)
- cleanup-with-archival.mjs (new)
- send-tender-alerts.mjs (new)
- migrations/add-archival-fields.sql (new)
- THREE_IMPROVEMENTS_SUMMARY.md (documentation)

All cron jobs updated for hourly scraping + daily cleanup + alerts
This commit is contained in:
Peter Foster
2026-02-15 14:42:17 +00:00
parent 6709ec4db6
commit c6b0169f3e
20 changed files with 4095 additions and 133 deletions

29
fix-urls.mjs Normal file
View File

@@ -0,0 +1,29 @@
import pg from 'pg';
const pool = new pg.Pool({
connectionString: 'postgresql://tenderpilot:jqrmilIBr6imtT0fKS01@localhost:5432/tenderpilot'
});
console.log('Fixing find_tender URLs (removing query params)...');
const result = await pool.query(
"UPDATE tenders SET notice_url = split_part(notice_url, '?', 1) WHERE source = 'find_tender' AND notice_url LIKE '%?%' RETURNING id, notice_url"
);
console.log(`✓ Fixed ${result.rowCount} find_tender URLs`);
if (result.rows.length > 0) {
console.log('Sample fixed URLs:');
result.rows.slice(0, 3).forEach(row => {
console.log(` - ${row.notice_url}`);
});
}
console.log('\nDeleting TED demo data...');
const deleteResult = await pool.query(
"DELETE FROM tenders WHERE source = 'ted_eu' RETURNING id"
);
console.log(`✓ Deleted ${deleteResult.rowCount} TED demo records`);
console.log('\nDatabase cleanup complete!');
await pool.end();