# TenderRadar Cleanup - Setup Complete **Date:** 2026-02-15 14:17 GMT ## Summary ✅ **Daily cleanup job configured** ✅ **Dashboard filtering verified** ✅ **Initial cleanup completed** ## Results ### Database Status (After Full Cleanup) - **Total tenders:** 626 - **Open (valid URLs):** 97 (~16%) - **Closed (removed):** 529 (~84%) **Removal rate:** 84% of scraped tenders were already removed from source websites! ### Current Valid Tenders The dashboard will show **97 tenders** with working Apply Now buttons, distributed across: - TED EU: 11 ✅ - Contracts Finder: ~40-50 (many removed early) - Find Tender: Active tenders - eTendersNI: 11 ✅ - PCS Scotland: 10 ✅ - Sell2Wales: 10 ✅ ## Configuration ### 1. Daily Cron Job ✅ ```bash 0 3 * * * cd /home/peter/tenderpilot && /usr/bin/node cleanup-invalid-tenders.mjs >> logs/cleanup.log 2>&1 ``` **What it does:** - Runs daily at 3am UTC - Checks all "open" tender URLs - Marks removed tenders as "closed" - Keeps database in sync with source websites - Logs to `/home/peter/tenderpilot/logs/cleanup.log` ### 2. Dashboard Filtering ✅ **API endpoint:** `/api/tenders` (in `server.js`) **Automatic filtering:** ```sql WHERE status = 'open' AND (deadline IS NULL OR deadline > NOW()) ``` **Result:** Dashboard shows only 97 tenders with valid, working URLs **No changes needed** - API already filters correctly! ## Cron Schedule Summary All TenderRadar cron jobs on VPS: ``` 0 */4 * * * - Contracts Finder scraper (every 4 hours) 10 */4 * * * - Find Tender scraper (every 4 hours) 20 */4 * * * - PCS Scotland scraper (every 4 hours) 30 */4 * * * - Sell2Wales scraper (every 4 hours) 20 5 * * * - TED EU scraper (daily at 05:20) 30 5 * * * - eTendersNI scraper (daily at 05:30) 0 7 * * * - Email digest (daily at 7am) 0 3 * * * - Cleanup invalid tenders (NEW - daily at 3am) ``` ## Log Files - **Cleanup logs:** `/home/peter/tenderpilot/logs/cleanup.log` - **Scraper logs:** `/home/peter/tenderpilot/scraper.log` - **Manual cleanup runs:** `/home/peter/tenderpilot/cleanup-full-*.log` ## Monitoring Check cleanup effectiveness: ```bash # View recent cleanup log tail -50 ~/tenderpilot/logs/cleanup.log # Check current database status psql tenderpilot -c "SELECT status, COUNT(*) FROM tenders GROUP BY status;" # See what dashboard shows psql tenderpilot -c "SELECT COUNT(*) FROM tenders WHERE status='open' AND (deadline IS NULL OR deadline > NOW());" ``` ## Next Steps (Optional) 1. ✅ Daily cleanup job - **DONE** 2. ✅ Dashboard filtering - **VERIFIED WORKING** 3. ⏳ Reduce scrape interval from 4 hours to 1 hour (captures more fast-closing tenders) 4. ⏳ Add more notice types to scrapers (not just `stage=tender`) 5. ⏳ Monitor `cleanup.log` for removal rate patterns ## Files Created - `/home/peter/tenderpilot/cleanup-invalid-tenders.mjs` - Cleanup script - `/home/peter/tenderpilot/TENDER_CLEANUP_SUMMARY.md` - Problem analysis - `/home/peter/tenderpilot/CLEANUP_SETUP.md` - This setup documentation