- Created cleanup-invalid-tenders.mjs script to validate tender URLs - Detects removed tenders via redirect to /syserror/notfound - Marks invalid tenders as closed in database - Initial run found 277/626 tenders (~44%) already removed from sources - Contracts Finder has highest removal rate (tenders removed before deadline) - Add comprehensive documentation in TENDER_CLEANUP_SUMMARY.md
2.6 KiB
2.6 KiB
Tender URL Cleanup Summary
Date: 2026-02-15
Issue: Apply Now buttons showing 404 errors for URL like:
https://www.contractsfinder.service.gov.uk/notice/24dac264-3958-4928-a1ad-675ecd5e203d
Root Cause
Tender URLs become invalid even before their deadline because:
- Contracting authorities close tenders early
- Contracts Finder immediately removes them from the site
- TenderRadar database still shows them as "open"
Cleanup Results
Before cleanup:
- Total tenders: 626
- Open tenders: 626
- Closed tenders: 0
After cleanup:
- Total tenders: 626
- Open tenders: 349 (valid, working URLs)
- Closed tenders: 277 (removed from source sites)
Removal rate: ~44% of tenders were already removed from their source websites!
How Contracts Finder Handles Removals
- Returns HTTP 200 (not 404!)
- Redirects to:
https://www.contractsfinder.service.gov.uk/syserror/notfound - This makes detection tricky - we need to check the final redirect URL
Solution Implemented
1. Cleanup Script: cleanup-invalid-tenders.mjs
- Checks tender URLs by making HEAD requests
- Detects removals by checking for
/syserror/or/notfoundin final URL - Marks removed tenders as "closed" in database
- Rate-limited to 500ms between requests (be nice to servers)
2. Cron Job (Recommended)
Add to crontab on VPS:
# Run tender cleanup daily at 3am
0 3 * * * cd /home/peter/tenderpilot && /usr/bin/node cleanup-invalid-tenders.mjs >> logs/cleanup.log 2>&1
3. Dashboard Filter (NEEDS IMPLEMENTATION)
The API already filters by status, but the dashboard doesn't pass the filter. Update /public/dashboard.html:
Currently the API query at line 1089 doesn't include status filter.
Need to ensure only "open" tenders are shown.
Testing Results
All tested URLs from each source:
| Source | Sample URLs Tested | Working | Removed |
|---|---|---|---|
| contracts_finder | 100+ | ~56 | ~44 |
| find_tender | 3 | 3 ✅ | 0 |
| ted_eu | 10 | 9 ✅ | 1 (rate limit) |
| etendersni | 3 | 3 ✅ | 0 |
| pcs_scotland | 3 | 3 ✅ | 0 |
| sell2wales | 3 | 3 ✅ | 0 |
Contracts Finder has the highest removal rate - nearly half of scraped tenders get removed early.
Next Steps
- ✅ Created cleanup script -
cleanup-invalid-tenders.mjs - ✅ Ran initial cleanup - 277 invalid tenders marked as closed
- ⏳ Set up daily cron job - run cleanup automatically
- ⏳ Verify dashboard filtering - ensure closed tenders don't show
Files Created
/home/peter/tenderpilot/cleanup-invalid-tenders.mjs- Cleanup script/home/peter/tenderpilot/TENDER_CLEANUP_SUMMARY.md- This documentation