- Add Playwright browser automation for TED EU tender scraping - Install playwright + chromium browser dependencies - Scraper successfully finds UK-relevant EU tenders (~11 per run) - Uses headless Chrome with keyword filtering - Add SCRAPERS_STATUS.md documentation All 6 main scrapers now operational (digital-marketplace API still down). Total active tenders: 626
2.4 KiB
2.4 KiB
TenderRadar Scrapers - All Working ✅
Date: 2026-02-15
Status: ALL SCRAPERS OPERATIONAL
Summary
✅ 6 out of 6 main scrapers working
❌ 1 scraper disabled (digital-marketplace - API down)
📊 Total tenders: 626
Active Scrapers
| Source | Count | Status | Technology |
|---|---|---|---|
| contracts_finder | 364 | ✅ Working | JSON API |
| find_tender | 220 | ✅ Working | HTML scraping |
| ted_eu | 11 | ✅ NEWLY FIXED | Playwright browser automation |
| etendersni | 11 | ✅ Working | HTML scraping |
| pcs_scotland | 10 | ✅ Working | HTML scraping |
| sell2wales | 10 | ✅ Working | HTML scraping |
Scraper Details
contracts_finder (364 tenders)
- JSON API via OCDS format
- Direct notice URLs with UUIDs
- Production-ready
find_tender (220 tenders)
- HTML scraping with cheerio
- Recent fix: Strips tracking query params
- Production-ready
ted_eu (11 tenders) - NEWLY IMPLEMENTED
- Technology: Playwright headless browser automation
- Search: UK keyword filtering
- Performance: Scans 3 pages, finds ~11 UK-relevant EU tenders
- Production-ready
etendersni, pcs_scotland, sell2wales
- All working with direct tender URLs
- Production-ready
Disabled Scrapers
digital-marketplace
- Status: ❌ API timeout
- Reason: Endpoint unreachable after 30s
- Action: Monitor for service restoration
Recent Changes (2026-02-15)
- ✅ Fixed find_tender - Removed tracking params from 220 URLs
- ✅ Implemented ted_eu - Full Playwright browser automation
- ✅ Installed Playwright + Chromium - 167MB download complete
- ✅ Cleaned database - Removed 4 demo records
- ✅ Updated Apply Now URLs - 100% working across all sources
Dependencies
- axios, cheerio, playwright, pg, dotenv
- Chromium browser (via Playwright)
Performance
- Total scrape time: 5-10 minutes for all sources
- Database: PostgreSQL on VPS localhost
- Storage: 626 active tenders
- Cron schedule: Every 4 hours
Files Modified
scrapers/find-tender.js- Strip query paramsscrapers/ted-eu.js- Playwright implementationpackage.json- Added Playwright dependency- Database - 220 URLs cleaned, 11 new TED tenders added
Next Steps (Optional)
- Monitor digital-marketplace API
- Expand TED keyword search
- Consider additional UK procurement sources