Files
tenderpilot/CLEANUP_SETUP.md
Peter Foster 215078ce1d feat: complete cleanup setup and dashboard verification
- Set up daily cron job (3am UTC) for tender URL validation
- Verified dashboard filtering (API already filters status=open)
- Completed full cleanup: 97 valid tenders, 529 removed (84% removal rate)
- Add comprehensive setup documentation in CLEANUP_SETUP.md
- Updated cleanup script to check ALL open tenders (removed 100 limit)
2026-02-15 14:23:18 +00:00

108 lines
3.0 KiB
Markdown

# TenderRadar Cleanup - Setup Complete
**Date:** 2026-02-15 14:17 GMT
## Summary
**Daily cleanup job configured**
**Dashboard filtering verified**
**Initial cleanup completed**
## Results
### Database Status (After Full Cleanup)
- **Total tenders:** 626
- **Open (valid URLs):** 97 (~16%)
- **Closed (removed):** 529 (~84%)
**Removal rate:** 84% of scraped tenders were already removed from source websites!
### Current Valid Tenders
The dashboard will show **97 tenders** with working Apply Now buttons, distributed across:
- TED EU: 11 ✅
- Contracts Finder: ~40-50 (many removed early)
- Find Tender: Active tenders
- eTendersNI: 11 ✅
- PCS Scotland: 10 ✅
- Sell2Wales: 10 ✅
## Configuration
### 1. Daily Cron Job ✅
```bash
0 3 * * * cd /home/peter/tenderpilot && /usr/bin/node cleanup-invalid-tenders.mjs >> logs/cleanup.log 2>&1
```
**What it does:**
- Runs daily at 3am UTC
- Checks all "open" tender URLs
- Marks removed tenders as "closed"
- Keeps database in sync with source websites
- Logs to `/home/peter/tenderpilot/logs/cleanup.log`
### 2. Dashboard Filtering ✅
**API endpoint:** `/api/tenders` (in `server.js`)
**Automatic filtering:**
```sql
WHERE status = 'open'
AND (deadline IS NULL OR deadline > NOW())
```
**Result:** Dashboard shows only 97 tenders with valid, working URLs
**No changes needed** - API already filters correctly!
## Cron Schedule Summary
All TenderRadar cron jobs on VPS:
```
0 */4 * * * - Contracts Finder scraper (every 4 hours)
10 */4 * * * - Find Tender scraper (every 4 hours)
20 */4 * * * - PCS Scotland scraper (every 4 hours)
30 */4 * * * - Sell2Wales scraper (every 4 hours)
20 5 * * * - TED EU scraper (daily at 05:20)
30 5 * * * - eTendersNI scraper (daily at 05:30)
0 7 * * * - Email digest (daily at 7am)
0 3 * * * - Cleanup invalid tenders (NEW - daily at 3am)
```
## Log Files
- **Cleanup logs:** `/home/peter/tenderpilot/logs/cleanup.log`
- **Scraper logs:** `/home/peter/tenderpilot/scraper.log`
- **Manual cleanup runs:** `/home/peter/tenderpilot/cleanup-full-*.log`
## Monitoring
Check cleanup effectiveness:
```bash
# View recent cleanup log
tail -50 ~/tenderpilot/logs/cleanup.log
# Check current database status
psql tenderpilot -c "SELECT status, COUNT(*) FROM tenders GROUP BY status;"
# See what dashboard shows
psql tenderpilot -c "SELECT COUNT(*) FROM tenders WHERE status='open' AND (deadline IS NULL OR deadline > NOW());"
```
## Next Steps (Optional)
1. ✅ Daily cleanup job - **DONE**
2. ✅ Dashboard filtering - **VERIFIED WORKING**
3. ⏳ Reduce scrape interval from 4 hours to 1 hour (captures more fast-closing tenders)
4. ⏳ Add more notice types to scrapers (not just `stage=tender`)
5. ⏳ Monitor `cleanup.log` for removal rate patterns
## Files Created
- `/home/peter/tenderpilot/cleanup-invalid-tenders.mjs` - Cleanup script
- `/home/peter/tenderpilot/TENDER_CLEANUP_SUMMARY.md` - Problem analysis
- `/home/peter/tenderpilot/CLEANUP_SETUP.md` - This setup documentation