Files
tenderpilot/CLEANUP_SETUP.md
Peter Foster 215078ce1d feat: complete cleanup setup and dashboard verification
- Set up daily cron job (3am UTC) for tender URL validation
- Verified dashboard filtering (API already filters status=open)
- Completed full cleanup: 97 valid tenders, 529 removed (84% removal rate)
- Add comprehensive setup documentation in CLEANUP_SETUP.md
- Updated cleanup script to check ALL open tenders (removed 100 limit)
2026-02-15 14:23:18 +00:00

3.0 KiB

TenderRadar Cleanup - Setup Complete

Date: 2026-02-15 14:17 GMT

Summary

Daily cleanup job configured
Dashboard filtering verified
Initial cleanup completed

Results

Database Status (After Full Cleanup)

  • Total tenders: 626
  • Open (valid URLs): 97 (~16%)
  • Closed (removed): 529 (~84%)

Removal rate: 84% of scraped tenders were already removed from source websites!

Current Valid Tenders

The dashboard will show 97 tenders with working Apply Now buttons, distributed across:

  • TED EU: 11
  • Contracts Finder: ~40-50 (many removed early)
  • Find Tender: Active tenders
  • eTendersNI: 11
  • PCS Scotland: 10
  • Sell2Wales: 10

Configuration

1. Daily Cron Job

0 3 * * * cd /home/peter/tenderpilot && /usr/bin/node cleanup-invalid-tenders.mjs >> logs/cleanup.log 2>&1

What it does:

  • Runs daily at 3am UTC
  • Checks all "open" tender URLs
  • Marks removed tenders as "closed"
  • Keeps database in sync with source websites
  • Logs to /home/peter/tenderpilot/logs/cleanup.log

2. Dashboard Filtering

API endpoint: /api/tenders (in server.js)

Automatic filtering:

WHERE status = 'open' 
AND (deadline IS NULL OR deadline > NOW())

Result: Dashboard shows only 97 tenders with valid, working URLs

No changes needed - API already filters correctly!

Cron Schedule Summary

All TenderRadar cron jobs on VPS:

0 */4 * * *    - Contracts Finder scraper (every 4 hours)
10 */4 * * *   - Find Tender scraper (every 4 hours)
20 */4 * * *   - PCS Scotland scraper (every 4 hours)
30 */4 * * *   - Sell2Wales scraper (every 4 hours)
20 5 * * *     - TED EU scraper (daily at 05:20)
30 5 * * *     - eTendersNI scraper (daily at 05:30)
0 7 * * *      - Email digest (daily at 7am)
0 3 * * *      - Cleanup invalid tenders (NEW - daily at 3am)

Log Files

  • Cleanup logs: /home/peter/tenderpilot/logs/cleanup.log
  • Scraper logs: /home/peter/tenderpilot/scraper.log
  • Manual cleanup runs: /home/peter/tenderpilot/cleanup-full-*.log

Monitoring

Check cleanup effectiveness:

# View recent cleanup log
tail -50 ~/tenderpilot/logs/cleanup.log

# Check current database status
psql tenderpilot -c "SELECT status, COUNT(*) FROM tenders GROUP BY status;"

# See what dashboard shows
psql tenderpilot -c "SELECT COUNT(*) FROM tenders WHERE status='open' AND (deadline IS NULL OR deadline > NOW());"

Next Steps (Optional)

  1. Daily cleanup job - DONE
  2. Dashboard filtering - VERIFIED WORKING
  3. Reduce scrape interval from 4 hours to 1 hour (captures more fast-closing tenders)
  4. Add more notice types to scrapers (not just stage=tender)
  5. Monitor cleanup.log for removal rate patterns

Files Created

  • /home/peter/tenderpilot/cleanup-invalid-tenders.mjs - Cleanup script
  • /home/peter/tenderpilot/TENDER_CLEANUP_SUMMARY.md - Problem analysis
  • /home/peter/tenderpilot/CLEANUP_SETUP.md - This setup documentation