1. Focus on Stable International/Regional Sources - Improved TED EU scraper (5 search strategies, 5 pages each) - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI) - De-prioritize unreliable UK gov sites (100% removal rate) 2. Archival Feature - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures - Cleanup script now preserves full tender snapshots before archiving - Gradual failure handling (3 retries before archiving) - No data loss - historical record preserved 3. Email Alerts - Daily digest (8am) - all new tenders from last 24h - High-value alerts (every 4h) - tenders >£100k - Professional HTML emails with all tender details - Configurable via environment variables Expected outcomes: - 50-100 stable tenders (vs 26 currently) - Zero 404 errors (archived data preserved) - Proactive notifications (no missed opportunities) - Historical archive for trend analysis Files: - scrapers/ted-eu.js (improved) - cleanup-with-archival.mjs (new) - send-tender-alerts.mjs (new) - migrations/add-archival-fields.sql (new) - THREE_IMPROVEMENTS_SUMMARY.md (documentation) All cron jobs updated for hourly scraping + daily cleanup + alerts
63 lines
1.9 KiB
JavaScript
63 lines
1.9 KiB
JavaScript
import pg from 'pg';
|
|
|
|
const pool = new pg.Pool({
|
|
connectionString: 'postgresql://tenderpilot:jqrmilIBr6imtT0fKS01@localhost:5432/tenderpilot'
|
|
});
|
|
|
|
console.log('=== TenderRadar URL Fix Verification ===\n');
|
|
|
|
// Count by source
|
|
const counts = await pool.query(
|
|
'SELECT source, COUNT(*) as count FROM tenders GROUP BY source ORDER BY count DESC'
|
|
);
|
|
|
|
console.log('Tender counts by source:');
|
|
counts.rows.forEach(row => {
|
|
console.log(` ${row.source.padEnd(20)} ${row.count}`);
|
|
});
|
|
|
|
// Check for any remaining search params
|
|
const searchParams = await pool.query(
|
|
"SELECT source, COUNT(*) as count FROM tenders WHERE notice_url LIKE '%?%' GROUP BY source"
|
|
);
|
|
|
|
console.log('\nURLs with query parameters:');
|
|
if (searchParams.rows.length === 0) {
|
|
console.log(' ✓ None found - all clean!');
|
|
} else {
|
|
searchParams.rows.forEach(row => {
|
|
console.log(` ⚠️ ${row.source}: ${row.count}`);
|
|
});
|
|
}
|
|
|
|
// Sample URLs per source
|
|
console.log('\nSample URLs (1 per source):');
|
|
const samples = await pool.query(
|
|
"SELECT DISTINCT ON (source) source, notice_url FROM tenders ORDER BY source, id"
|
|
);
|
|
|
|
samples.rows.forEach(row => {
|
|
const url = row.notice_url.length > 80 ? row.notice_url.substring(0, 77) + '...' : row.notice_url;
|
|
console.log(` ${row.source.padEnd(20)} ${url}`);
|
|
});
|
|
|
|
// Check for broken/invalid URLs
|
|
const issues = await pool.query(
|
|
"SELECT source, notice_url FROM tenders WHERE notice_url IS NULL OR notice_url = '' OR notice_url LIKE '%undefined%' OR notice_url LIKE '%null%' LIMIT 5"
|
|
);
|
|
|
|
console.log('\nPotential URL issues:');
|
|
if (issues.rows.length === 0) {
|
|
console.log(' ✓ No issues found!');
|
|
} else {
|
|
issues.rows.forEach(row => {
|
|
console.log(` ⚠️ ${row.source}: ${row.notice_url}`);
|
|
});
|
|
}
|
|
|
|
console.log('\n=== Summary ===');
|
|
console.log(`Total tenders: ${counts.rows.reduce((sum, r) => sum + parseInt(r.count), 0)}`);
|
|
console.log('All URLs validated and cleaned! ✅');
|
|
|
|
await pool.end();
|