feat: three major improvements - stable sources, archival, email alerts
1. Focus on Stable International/Regional Sources - Improved TED EU scraper (5 search strategies, 5 pages each) - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI) - De-prioritize unreliable UK gov sites (100% removal rate) 2. Archival Feature - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures - Cleanup script now preserves full tender snapshots before archiving - Gradual failure handling (3 retries before archiving) - No data loss - historical record preserved 3. Email Alerts - Daily digest (8am) - all new tenders from last 24h - High-value alerts (every 4h) - tenders >£100k - Professional HTML emails with all tender details - Configurable via environment variables Expected outcomes: - 50-100 stable tenders (vs 26 currently) - Zero 404 errors (archived data preserved) - Proactive notifications (no missed opportunities) - Historical archive for trend analysis Files: - scrapers/ted-eu.js (improved) - cleanup-with-archival.mjs (new) - send-tender-alerts.mjs (new) - migrations/add-archival-fields.sql (new) - THREE_IMPROVEMENTS_SUMMARY.md (documentation) All cron jobs updated for hourly scraping + daily cleanup + alerts
This commit is contained in:
16
check-sources.mjs
Normal file
16
check-sources.mjs
Normal file
@@ -0,0 +1,16 @@
|
||||
import pg from "pg";
|
||||
|
||||
const pool = new pg.Pool({
|
||||
connectionString: "postgresql://tenderpilot:jqrmilIBr6imtT0fKS01@localhost:5432/tenderpilot"
|
||||
});
|
||||
|
||||
console.log("=== Sample URLs per source ===");
|
||||
const sources = ["pcs_scotland", "sell2wales", "ted_eu"];
|
||||
|
||||
for (const source of sources) {
|
||||
const result = await pool.query("SELECT notice_url FROM tenders WHERE source = $1 LIMIT 2", [source]);
|
||||
console.log("\n" + source + ":");
|
||||
result.rows.forEach(row => console.log(" " + row.notice_url));
|
||||
}
|
||||
|
||||
await pool.end();
|
||||
Reference in New Issue
Block a user