- Strip tracking query params from find_tender URLs (?origin=SearchResults) - Disable TED EU scraper (requires browser automation, was using demo data) - Update 220 find_tender database records with clean URLs - Delete 4 TED demo records from database - Add URL_FIX_SUMMARY.md documentation All 615 tenders now have direct links to tender detail pages. Fixes Apply Now button UX issue.
TenderRadar Scrapers
This directory contains scrapers for UK public procurement tender sources.
Scrapers
1. Contracts Finder (contracts-finder.js)
- Source: https://www.contractsfinder.service.gov.uk
- Coverage: England and non-devolved territories
- Method: JSON API
- Frequency: Every 4 hours (0, 4, 8, 12, 16, 20:00)
- Data Range: Last 30 days
- Status: ✅ Working
2. Find a Tender (find-tender.js)
- Source: https://www.find-tender.service.gov.uk
- Coverage: UK-wide above-threshold procurement notices
- Method: HTML scraping with pagination (5 pages)
- Frequency: Every 4 hours (0:10, 4:10, 8:10, 12:10, 16:10, 20:10)
- Status: ✅ Working
3. Public Contracts Scotland (pcs-scotland.js)
- Source: https://www.publiccontractsscotland.gov.uk
- Coverage: Scottish public sector tenders
- Method: HTML scraping
- Frequency: Every 4 hours (0:20, 4:20, 8:20, 12:20, 16:20, 20:20)
- Status: ✅ Working
4. Sell2Wales (sell2wales.js)
- Source: https://www.sell2wales.gov.wales
- Coverage: Welsh public sector tenders
- Method: HTML scraping
- Frequency: Every 4 hours (0:30, 4:30, 8:30, 12:30, 16:30, 20:30)
- Status: ✅ Working
Database Schema
All scrapers insert into the tenders table with the following key fields:
source: Identifier for the data source (contracts_finder, find_tender, pcs_scotland, sell2wales)source_id: Unique identifier from the source (used for deduplication via UNIQUE constraint)title: Tender titledescription: Full descriptionsummary: Shortened descriptionauthority_name: Publishing authoritylocation: Geographic locationpublished_date: When the tender was publisheddeadline: Application deadlinenotice_url: Link to full noticestatus: open/closed based on deadline
Running Scrapers
Individual Scraper
cd /home/peter/tenderpilot
node scrapers/contracts-finder.js
node scrapers/find-tender.js
node scrapers/pcs-scotland.js
node scrapers/sell2wales.js
All Scrapers
cd /home/peter/tenderpilot
./run-all-scrapers.sh
Cron Schedule
The scrapers run automatically every 4 hours, staggered by 10 minutes:
0 */4 * * * cd /home/peter/tenderpilot && node scrapers/contracts-finder.js >> /home/peter/tenderpilot/scraper.log 2>&1
10 */4 * * * cd /home/peter/tenderpilot && node scrapers/find-tender.js >> /home/peter/tenderpilot/scraper.log 2>&1
20 */4 * * * cd /home/peter/tenderpilot && node scrapers/pcs-scotland.js >> /home/peter/tenderpilot/scraper.log 2>&1
30 */4 * * * cd /home/peter/tenderpilot && node scrapers/sell2wales.js >> /home/peter/tenderpilot/scraper.log 2>&1
Monitoring
Check logs:
tail -f /home/peter/tenderpilot/scraper.log
Check database:
PGPASSWORD=tenderpilot123 psql -h localhost -U tenderpilot -d tenderpilot -c "SELECT source, COUNT(*) FROM tenders GROUP BY source;"
Rate Limiting & Ethical Scraping
All scrapers implement:
- Proper User-Agent headers identifying the service
- Rate limiting (2-5 second delays between requests)
- Pagination limits where applicable
- Respectful request patterns
Dependencies
- axios: HTTP client
- cheerio: HTML parsing (for web scrapers)
- pg: PostgreSQL client
- dotenv: Environment variables
Maintenance
- Scrapers use
ON CONFLICT (source_id) DO NOTHINGto avoid duplicates - Old scrapers can update existing records if needed
- Monitor for HTML structure changes on scraped sites
- API endpoints (Contracts Finder) are more stable than HTML scraping
Last Updated
2026-02-14 - Initial deployment with all four scrapers