Files
tenderpilot/TENDER_CLEANUP_SUMMARY.md
Peter Foster 0153da89c5 feat: add tender URL validation cleanup
- Created cleanup-invalid-tenders.mjs script to validate tender URLs
- Detects removed tenders via redirect to /syserror/notfound
- Marks invalid tenders as closed in database
- Initial run found 277/626 tenders (~44%) already removed from sources
- Contracts Finder has highest removal rate (tenders removed before deadline)
- Add comprehensive documentation in TENDER_CLEANUP_SUMMARY.md
2026-02-15 14:15:59 +00:00

2.6 KiB

Tender URL Cleanup Summary

Date: 2026-02-15
Issue: Apply Now buttons showing 404 errors for URL like:
https://www.contractsfinder.service.gov.uk/notice/24dac264-3958-4928-a1ad-675ecd5e203d

Root Cause

Tender URLs become invalid even before their deadline because:

  1. Contracting authorities close tenders early
  2. Contracts Finder immediately removes them from the site
  3. TenderRadar database still shows them as "open"

Cleanup Results

Before cleanup:

  • Total tenders: 626
  • Open tenders: 626
  • Closed tenders: 0

After cleanup:

  • Total tenders: 626
  • Open tenders: 349 (valid, working URLs)
  • Closed tenders: 277 (removed from source sites)

Removal rate: ~44% of tenders were already removed from their source websites!

How Contracts Finder Handles Removals

  • Returns HTTP 200 (not 404!)
  • Redirects to: https://www.contractsfinder.service.gov.uk/syserror/notfound
  • This makes detection tricky - we need to check the final redirect URL

Solution Implemented

1. Cleanup Script: cleanup-invalid-tenders.mjs

  • Checks tender URLs by making HEAD requests
  • Detects removals by checking for /syserror/ or /notfound in final URL
  • Marks removed tenders as "closed" in database
  • Rate-limited to 500ms between requests (be nice to servers)

Add to crontab on VPS:

# Run tender cleanup daily at 3am
0 3 * * * cd /home/peter/tenderpilot && /usr/bin/node cleanup-invalid-tenders.mjs >> logs/cleanup.log 2>&1

3. Dashboard Filter (NEEDS IMPLEMENTATION)

The API already filters by status, but the dashboard doesn't pass the filter. Update /public/dashboard.html:

Currently the API query at line 1089 doesn't include status filter.
Need to ensure only "open" tenders are shown.

Testing Results

All tested URLs from each source:

Source Sample URLs Tested Working Removed
contracts_finder 100+ ~56 ~44
find_tender 3 3 0
ted_eu 10 9 1 (rate limit)
etendersni 3 3 0
pcs_scotland 3 3 0
sell2wales 3 3 0

Contracts Finder has the highest removal rate - nearly half of scraped tenders get removed early.

Next Steps

  1. Created cleanup script - cleanup-invalid-tenders.mjs
  2. Ran initial cleanup - 277 invalid tenders marked as closed
  3. Set up daily cron job - run cleanup automatically
  4. Verify dashboard filtering - ensure closed tenders don't show

Files Created

  • /home/peter/tenderpilot/cleanup-invalid-tenders.mjs - Cleanup script
  • /home/peter/tenderpilot/TENDER_CLEANUP_SUMMARY.md - This documentation