- Set up daily cron job (3am UTC) for tender URL validation - Verified dashboard filtering (API already filters status=open) - Completed full cleanup: 97 valid tenders, 529 removed (84% removal rate) - Add comprehensive setup documentation in CLEANUP_SETUP.md - Updated cleanup script to check ALL open tenders (removed 100 limit)
108 lines
3.0 KiB
Markdown
108 lines
3.0 KiB
Markdown
# TenderRadar Cleanup - Setup Complete
|
|
|
|
**Date:** 2026-02-15 14:17 GMT
|
|
|
|
## Summary
|
|
|
|
✅ **Daily cleanup job configured**
|
|
✅ **Dashboard filtering verified**
|
|
✅ **Initial cleanup completed**
|
|
|
|
## Results
|
|
|
|
### Database Status (After Full Cleanup)
|
|
|
|
- **Total tenders:** 626
|
|
- **Open (valid URLs):** 97 (~16%)
|
|
- **Closed (removed):** 529 (~84%)
|
|
|
|
**Removal rate:** 84% of scraped tenders were already removed from source websites!
|
|
|
|
### Current Valid Tenders
|
|
|
|
The dashboard will show **97 tenders** with working Apply Now buttons, distributed across:
|
|
- TED EU: 11 ✅
|
|
- Contracts Finder: ~40-50 (many removed early)
|
|
- Find Tender: Active tenders
|
|
- eTendersNI: 11 ✅
|
|
- PCS Scotland: 10 ✅
|
|
- Sell2Wales: 10 ✅
|
|
|
|
## Configuration
|
|
|
|
### 1. Daily Cron Job ✅
|
|
|
|
```bash
|
|
0 3 * * * cd /home/peter/tenderpilot && /usr/bin/node cleanup-invalid-tenders.mjs >> logs/cleanup.log 2>&1
|
|
```
|
|
|
|
**What it does:**
|
|
- Runs daily at 3am UTC
|
|
- Checks all "open" tender URLs
|
|
- Marks removed tenders as "closed"
|
|
- Keeps database in sync with source websites
|
|
- Logs to `/home/peter/tenderpilot/logs/cleanup.log`
|
|
|
|
### 2. Dashboard Filtering ✅
|
|
|
|
**API endpoint:** `/api/tenders` (in `server.js`)
|
|
|
|
**Automatic filtering:**
|
|
```sql
|
|
WHERE status = 'open'
|
|
AND (deadline IS NULL OR deadline > NOW())
|
|
```
|
|
|
|
**Result:** Dashboard shows only 97 tenders with valid, working URLs
|
|
|
|
**No changes needed** - API already filters correctly!
|
|
|
|
## Cron Schedule Summary
|
|
|
|
All TenderRadar cron jobs on VPS:
|
|
|
|
```
|
|
0 */4 * * * - Contracts Finder scraper (every 4 hours)
|
|
10 */4 * * * - Find Tender scraper (every 4 hours)
|
|
20 */4 * * * - PCS Scotland scraper (every 4 hours)
|
|
30 */4 * * * - Sell2Wales scraper (every 4 hours)
|
|
20 5 * * * - TED EU scraper (daily at 05:20)
|
|
30 5 * * * - eTendersNI scraper (daily at 05:30)
|
|
0 7 * * * - Email digest (daily at 7am)
|
|
0 3 * * * - Cleanup invalid tenders (NEW - daily at 3am)
|
|
```
|
|
|
|
## Log Files
|
|
|
|
- **Cleanup logs:** `/home/peter/tenderpilot/logs/cleanup.log`
|
|
- **Scraper logs:** `/home/peter/tenderpilot/scraper.log`
|
|
- **Manual cleanup runs:** `/home/peter/tenderpilot/cleanup-full-*.log`
|
|
|
|
## Monitoring
|
|
|
|
Check cleanup effectiveness:
|
|
```bash
|
|
# View recent cleanup log
|
|
tail -50 ~/tenderpilot/logs/cleanup.log
|
|
|
|
# Check current database status
|
|
psql tenderpilot -c "SELECT status, COUNT(*) FROM tenders GROUP BY status;"
|
|
|
|
# See what dashboard shows
|
|
psql tenderpilot -c "SELECT COUNT(*) FROM tenders WHERE status='open' AND (deadline IS NULL OR deadline > NOW());"
|
|
```
|
|
|
|
## Next Steps (Optional)
|
|
|
|
1. ✅ Daily cleanup job - **DONE**
|
|
2. ✅ Dashboard filtering - **VERIFIED WORKING**
|
|
3. ⏳ Reduce scrape interval from 4 hours to 1 hour (captures more fast-closing tenders)
|
|
4. ⏳ Add more notice types to scrapers (not just `stage=tender`)
|
|
5. ⏳ Monitor `cleanup.log` for removal rate patterns
|
|
|
|
## Files Created
|
|
|
|
- `/home/peter/tenderpilot/cleanup-invalid-tenders.mjs` - Cleanup script
|
|
- `/home/peter/tenderpilot/TENDER_CLEANUP_SUMMARY.md` - Problem analysis
|
|
- `/home/peter/tenderpilot/CLEANUP_SETUP.md` - This setup documentation
|