85 lines
2.6 KiB
Markdown
85 lines
2.6 KiB
Markdown
|
|
# Tender URL Cleanup Summary
|
||
|
|
|
||
|
|
**Date:** 2026-02-15
|
||
|
|
**Issue:** Apply Now buttons showing 404 errors for URL like:
|
||
|
|
`https://www.contractsfinder.service.gov.uk/notice/24dac264-3958-4928-a1ad-675ecd5e203d`
|
||
|
|
|
||
|
|
## Root Cause
|
||
|
|
|
||
|
|
Tender URLs become invalid even before their deadline because:
|
||
|
|
1. Contracting authorities close tenders early
|
||
|
|
2. Contracts Finder immediately removes them from the site
|
||
|
|
3. TenderRadar database still shows them as "open"
|
||
|
|
|
||
|
|
## Cleanup Results
|
||
|
|
|
||
|
|
**Before cleanup:**
|
||
|
|
- Total tenders: 626
|
||
|
|
- Open tenders: 626
|
||
|
|
- Closed tenders: 0
|
||
|
|
|
||
|
|
**After cleanup:**
|
||
|
|
- Total tenders: 626
|
||
|
|
- **Open tenders: 349** (valid, working URLs)
|
||
|
|
- **Closed tenders: 277** (removed from source sites)
|
||
|
|
|
||
|
|
**Removal rate: ~44%** of tenders were already removed from their source websites!
|
||
|
|
|
||
|
|
## How Contracts Finder Handles Removals
|
||
|
|
|
||
|
|
- Returns HTTP 200 (not 404!)
|
||
|
|
- Redirects to: `https://www.contractsfinder.service.gov.uk/syserror/notfound`
|
||
|
|
- This makes detection tricky - we need to check the final redirect URL
|
||
|
|
|
||
|
|
## Solution Implemented
|
||
|
|
|
||
|
|
### 1. Cleanup Script: `cleanup-invalid-tenders.mjs`
|
||
|
|
|
||
|
|
- Checks tender URLs by making HEAD requests
|
||
|
|
- Detects removals by checking for `/syserror/` or `/notfound` in final URL
|
||
|
|
- Marks removed tenders as "closed" in database
|
||
|
|
- Rate-limited to 500ms between requests (be nice to servers)
|
||
|
|
|
||
|
|
### 2. Cron Job (Recommended)
|
||
|
|
|
||
|
|
Add to crontab on VPS:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Run tender cleanup daily at 3am
|
||
|
|
0 3 * * * cd /home/peter/tenderpilot && /usr/bin/node cleanup-invalid-tenders.mjs >> logs/cleanup.log 2>&1
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Dashboard Filter (NEEDS IMPLEMENTATION)
|
||
|
|
|
||
|
|
The API already filters by status, but the dashboard doesn't pass the filter. Update `/public/dashboard.html`:
|
||
|
|
|
||
|
|
Currently the API query at line 1089 doesn't include status filter.
|
||
|
|
Need to ensure only "open" tenders are shown.
|
||
|
|
|
||
|
|
## Testing Results
|
||
|
|
|
||
|
|
All tested URLs from each source:
|
||
|
|
|
||
|
|
| Source | Sample URLs Tested | Working | Removed |
|
||
|
|
|--------|-------------------|---------|---------|
|
||
|
|
| contracts_finder | 100+ | ~56 | ~44 |
|
||
|
|
| find_tender | 3 | 3 ✅ | 0 |
|
||
|
|
| ted_eu | 10 | 9 ✅ | 1 (rate limit) |
|
||
|
|
| etendersni | 3 | 3 ✅ | 0 |
|
||
|
|
| pcs_scotland | 3 | 3 ✅ | 0 |
|
||
|
|
| sell2wales | 3 | 3 ✅ | 0 |
|
||
|
|
|
||
|
|
**Contracts Finder has the highest removal rate** - nearly half of scraped tenders get removed early.
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. ✅ **Created cleanup script** - `cleanup-invalid-tenders.mjs`
|
||
|
|
2. ✅ **Ran initial cleanup** - 277 invalid tenders marked as closed
|
||
|
|
3. ⏳ **Set up daily cron job** - run cleanup automatically
|
||
|
|
4. ⏳ **Verify dashboard filtering** - ensure closed tenders don't show
|
||
|
|
|
||
|
|
## Files Created
|
||
|
|
|
||
|
|
- `/home/peter/tenderpilot/cleanup-invalid-tenders.mjs` - Cleanup script
|
||
|
|
- `/home/peter/tenderpilot/TENDER_CLEANUP_SUMMARY.md` - This documentation
|