Files
tenderpilot/test-ted-detail.mjs
Peter Foster c6b0169f3e feat: three major improvements - stable sources, archival, email alerts
1. Focus on Stable International/Regional Sources
   - Improved TED EU scraper (5 search strategies, 5 pages each)
   - All stable sources now hourly (TED EU, Sell2Wales, PCS Scotland, eTendersNI)
   - De-prioritize unreliable UK gov sites (100% removal rate)

2. Archival Feature
   - New DB columns: archived, archived_at, archived_snapshot, last_validated, validation_failures
   - Cleanup script now preserves full tender snapshots before archiving
   - Gradual failure handling (3 retries before archiving)
   - No data loss - historical record preserved

3. Email Alerts
   - Daily digest (8am) - all new tenders from last 24h
   - High-value alerts (every 4h) - tenders >£100k
   - Professional HTML emails with all tender details
   - Configurable via environment variables

Expected outcomes:
- 50-100 stable tenders (vs 26 currently)
- Zero 404 errors (archived data preserved)
- Proactive notifications (no missed opportunities)
- Historical archive for trend analysis

Files:
- scrapers/ted-eu.js (improved)
- cleanup-with-archival.mjs (new)
- send-tender-alerts.mjs (new)
- migrations/add-archival-fields.sql (new)
- THREE_IMPROVEMENTS_SUMMARY.md (documentation)

All cron jobs updated for hourly scraping + daily cleanup + alerts
2026-02-15 14:42:17 +00:00

45 lines
1.2 KiB
JavaScript

import { chromium } from 'playwright';
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
const url = 'https://ted.europa.eu/en/search/result?q=&page=1&placeOfPerformanceCountry=GBR';
console.log('Loading:', url);
await page.goto(url, { waitUntil: 'networkidle', timeout: 30000 });
await page.waitForTimeout(3000);
// Extract full tender data
const tenders = await page.evaluate(() => {
const results = [];
const rows = document.querySelectorAll('tr[data-notice-id], .notice-row, tbody tr');
rows.forEach((row, idx) => {
if (idx > 5) return; // Limit to first 5 for testing
try {
const link = row.querySelector('a[href*="/notice/"]');
if (!link) return;
const cells = row.querySelectorAll('td');
const allText = row.textContent;
results.push({
href: link.href,
noticeId: link.textContent.trim(),
rowText: allText.trim().substring(0, 500),
cellCount: cells.length,
cellTexts: Array.from(cells).map(c => c.textContent.trim().substring(0, 100))
});
} catch (e) {
// Skip
}
});
return results;
});
console.log('\nExtracted tenders:', JSON.stringify(tenders, null, 2));
await browser.close();