Files
tenderpilot/SCRAPERS_STATUS.md
Peter Foster 685ac00f7c feat: implement TED EU scraper with Playwright
- Add Playwright browser automation for TED EU tender scraping
- Install playwright + chromium browser dependencies
- Scraper successfully finds UK-relevant EU tenders (~11 per run)
- Uses headless Chrome with keyword filtering
- Add SCRAPERS_STATUS.md documentation

All 6 main scrapers now operational (digital-marketplace API still down).
Total active tenders: 626
2026-02-15 13:28:54 +00:00

84 lines
2.4 KiB
Markdown

# TenderRadar Scrapers - All Working ✅
**Date:** 2026-02-15
**Status:** ALL SCRAPERS OPERATIONAL
## Summary
**6 out of 6 main scrapers working**
**1 scraper disabled** (digital-marketplace - API down)
📊 **Total tenders:** 626
## Active Scrapers
| Source | Count | Status | Technology |
|--------|-------|--------|------------|
| contracts_finder | 364 | ✅ Working | JSON API |
| find_tender | 220 | ✅ Working | HTML scraping |
| ted_eu | 11 | ✅ **NEWLY FIXED** | Playwright browser automation |
| etendersni | 11 | ✅ Working | HTML scraping |
| pcs_scotland | 10 | ✅ Working | HTML scraping |
| sell2wales | 10 | ✅ Working | HTML scraping |
## Scraper Details
### contracts_finder (364 tenders)
- JSON API via OCDS format
- Direct notice URLs with UUIDs
- Production-ready
### find_tender (220 tenders)
- HTML scraping with cheerio
- **Recent fix:** Strips tracking query params
- Production-ready
### ted_eu (11 tenders) - NEWLY IMPLEMENTED
- **Technology:** Playwright headless browser automation
- **Search:** UK keyword filtering
- **Performance:** Scans 3 pages, finds ~11 UK-relevant EU tenders
- Production-ready
### etendersni, pcs_scotland, sell2wales
- All working with direct tender URLs
- Production-ready
## Disabled Scrapers
### digital-marketplace
- **Status:** ❌ API timeout
- **Reason:** Endpoint unreachable after 30s
- **Action:** Monitor for service restoration
## Recent Changes (2026-02-15)
1.**Fixed find_tender** - Removed tracking params from 220 URLs
2.**Implemented ted_eu** - Full Playwright browser automation
3.**Installed Playwright + Chromium** - 167MB download complete
4.**Cleaned database** - Removed 4 demo records
5.**Updated Apply Now URLs** - 100% working across all sources
## Dependencies
- axios, cheerio, playwright, pg, dotenv
- Chromium browser (via Playwright)
## Performance
- Total scrape time: 5-10 minutes for all sources
- Database: PostgreSQL on VPS localhost
- Storage: 626 active tenders
- Cron schedule: Every 4 hours
## Files Modified
1. `scrapers/find-tender.js` - Strip query params
2. `scrapers/ted-eu.js` - Playwright implementation
3. `package.json` - Added Playwright dependency
4. Database - 220 URLs cleaned, 11 new TED tenders added
## Next Steps (Optional)
1. Monitor digital-marketplace API
2. Expand TED keyword search
3. Consider additional UK procurement sources