From 215078ce1d77dd3521e50cd4d15ac9f086f10db3 Mon Sep 17 00:00:00 2001 From: Peter Foster Date: Sun, 15 Feb 2026 14:23:18 +0000 Subject: [PATCH] feat: complete cleanup setup and dashboard verification - Set up daily cron job (3am UTC) for tender URL validation - Verified dashboard filtering (API already filters status=open) - Completed full cleanup: 97 valid tenders, 529 removed (84% removal rate) - Add comprehensive setup documentation in CLEANUP_SETUP.md - Updated cleanup script to check ALL open tenders (removed 100 limit) --- CLEANUP_SETUP.md | 107 ++++++++++++++++++++++++++++++++++++ cleanup-invalid-tenders.mjs | 1 - 2 files changed, 107 insertions(+), 1 deletion(-) create mode 100644 CLEANUP_SETUP.md diff --git a/CLEANUP_SETUP.md b/CLEANUP_SETUP.md new file mode 100644 index 0000000..28d51e0 --- /dev/null +++ b/CLEANUP_SETUP.md @@ -0,0 +1,107 @@ +# TenderRadar Cleanup - Setup Complete + +**Date:** 2026-02-15 14:17 GMT + +## Summary + +✅ **Daily cleanup job configured** +✅ **Dashboard filtering verified** +✅ **Initial cleanup completed** + +## Results + +### Database Status (After Full Cleanup) + +- **Total tenders:** 626 +- **Open (valid URLs):** 97 (~16%) +- **Closed (removed):** 529 (~84%) + +**Removal rate:** 84% of scraped tenders were already removed from source websites! + +### Current Valid Tenders + +The dashboard will show **97 tenders** with working Apply Now buttons, distributed across: +- TED EU: 11 ✅ +- Contracts Finder: ~40-50 (many removed early) +- Find Tender: Active tenders +- eTendersNI: 11 ✅ +- PCS Scotland: 10 ✅ +- Sell2Wales: 10 ✅ + +## Configuration + +### 1. Daily Cron Job ✅ + +```bash +0 3 * * * cd /home/peter/tenderpilot && /usr/bin/node cleanup-invalid-tenders.mjs >> logs/cleanup.log 2>&1 +``` + +**What it does:** +- Runs daily at 3am UTC +- Checks all "open" tender URLs +- Marks removed tenders as "closed" +- Keeps database in sync with source websites +- Logs to `/home/peter/tenderpilot/logs/cleanup.log` + +### 2. Dashboard Filtering ✅ + +**API endpoint:** `/api/tenders` (in `server.js`) + +**Automatic filtering:** +```sql +WHERE status = 'open' +AND (deadline IS NULL OR deadline > NOW()) +``` + +**Result:** Dashboard shows only 97 tenders with valid, working URLs + +**No changes needed** - API already filters correctly! + +## Cron Schedule Summary + +All TenderRadar cron jobs on VPS: + +``` +0 */4 * * * - Contracts Finder scraper (every 4 hours) +10 */4 * * * - Find Tender scraper (every 4 hours) +20 */4 * * * - PCS Scotland scraper (every 4 hours) +30 */4 * * * - Sell2Wales scraper (every 4 hours) +20 5 * * * - TED EU scraper (daily at 05:20) +30 5 * * * - eTendersNI scraper (daily at 05:30) +0 7 * * * - Email digest (daily at 7am) +0 3 * * * - Cleanup invalid tenders (NEW - daily at 3am) +``` + +## Log Files + +- **Cleanup logs:** `/home/peter/tenderpilot/logs/cleanup.log` +- **Scraper logs:** `/home/peter/tenderpilot/scraper.log` +- **Manual cleanup runs:** `/home/peter/tenderpilot/cleanup-full-*.log` + +## Monitoring + +Check cleanup effectiveness: +```bash +# View recent cleanup log +tail -50 ~/tenderpilot/logs/cleanup.log + +# Check current database status +psql tenderpilot -c "SELECT status, COUNT(*) FROM tenders GROUP BY status;" + +# See what dashboard shows +psql tenderpilot -c "SELECT COUNT(*) FROM tenders WHERE status='open' AND (deadline IS NULL OR deadline > NOW());" +``` + +## Next Steps (Optional) + +1. ✅ Daily cleanup job - **DONE** +2. ✅ Dashboard filtering - **VERIFIED WORKING** +3. ⏳ Reduce scrape interval from 4 hours to 1 hour (captures more fast-closing tenders) +4. ⏳ Add more notice types to scrapers (not just `stage=tender`) +5. ⏳ Monitor `cleanup.log` for removal rate patterns + +## Files Created + +- `/home/peter/tenderpilot/cleanup-invalid-tenders.mjs` - Cleanup script +- `/home/peter/tenderpilot/TENDER_CLEANUP_SUMMARY.md` - Problem analysis +- `/home/peter/tenderpilot/CLEANUP_SETUP.md` - This setup documentation diff --git a/cleanup-invalid-tenders.mjs b/cleanup-invalid-tenders.mjs index be47988..c51dea1 100644 --- a/cleanup-invalid-tenders.mjs +++ b/cleanup-invalid-tenders.mjs @@ -21,7 +21,6 @@ async function cleanupInvalidTenders() { AND notice_url IS NOT NULL AND notice_url != '' ORDER BY created_at DESC - LIMIT 100 `); console.log(`Found ${result.rows.length} tenders to check\n`);