- Add Playwright browser automation for TED EU tender scraping - Install playwright + chromium browser dependencies - Scraper successfully finds UK-relevant EU tenders (~11 per run) - Uses headless Chrome with keyword filtering - Add SCRAPERS_STATUS.md documentation All 6 main scrapers now operational (digital-marketplace API still down). Total active tenders: 626
84 lines
2.4 KiB
Markdown
84 lines
2.4 KiB
Markdown
# TenderRadar Scrapers - All Working ✅
|
|
|
|
**Date:** 2026-02-15
|
|
**Status:** ALL SCRAPERS OPERATIONAL
|
|
|
|
## Summary
|
|
|
|
✅ **6 out of 6 main scrapers working**
|
|
❌ **1 scraper disabled** (digital-marketplace - API down)
|
|
📊 **Total tenders:** 626
|
|
|
|
## Active Scrapers
|
|
|
|
| Source | Count | Status | Technology |
|
|
|--------|-------|--------|------------|
|
|
| contracts_finder | 364 | ✅ Working | JSON API |
|
|
| find_tender | 220 | ✅ Working | HTML scraping |
|
|
| ted_eu | 11 | ✅ **NEWLY FIXED** | Playwright browser automation |
|
|
| etendersni | 11 | ✅ Working | HTML scraping |
|
|
| pcs_scotland | 10 | ✅ Working | HTML scraping |
|
|
| sell2wales | 10 | ✅ Working | HTML scraping |
|
|
|
|
## Scraper Details
|
|
|
|
### contracts_finder (364 tenders)
|
|
- JSON API via OCDS format
|
|
- Direct notice URLs with UUIDs
|
|
- Production-ready
|
|
|
|
### find_tender (220 tenders)
|
|
- HTML scraping with cheerio
|
|
- **Recent fix:** Strips tracking query params
|
|
- Production-ready
|
|
|
|
### ted_eu (11 tenders) - NEWLY IMPLEMENTED
|
|
- **Technology:** Playwright headless browser automation
|
|
- **Search:** UK keyword filtering
|
|
- **Performance:** Scans 3 pages, finds ~11 UK-relevant EU tenders
|
|
- Production-ready
|
|
|
|
### etendersni, pcs_scotland, sell2wales
|
|
- All working with direct tender URLs
|
|
- Production-ready
|
|
|
|
## Disabled Scrapers
|
|
|
|
### digital-marketplace
|
|
- **Status:** ❌ API timeout
|
|
- **Reason:** Endpoint unreachable after 30s
|
|
- **Action:** Monitor for service restoration
|
|
|
|
## Recent Changes (2026-02-15)
|
|
|
|
1. ✅ **Fixed find_tender** - Removed tracking params from 220 URLs
|
|
2. ✅ **Implemented ted_eu** - Full Playwright browser automation
|
|
3. ✅ **Installed Playwright + Chromium** - 167MB download complete
|
|
4. ✅ **Cleaned database** - Removed 4 demo records
|
|
5. ✅ **Updated Apply Now URLs** - 100% working across all sources
|
|
|
|
## Dependencies
|
|
|
|
- axios, cheerio, playwright, pg, dotenv
|
|
- Chromium browser (via Playwright)
|
|
|
|
## Performance
|
|
|
|
- Total scrape time: 5-10 minutes for all sources
|
|
- Database: PostgreSQL on VPS localhost
|
|
- Storage: 626 active tenders
|
|
- Cron schedule: Every 4 hours
|
|
|
|
## Files Modified
|
|
|
|
1. `scrapers/find-tender.js` - Strip query params
|
|
2. `scrapers/ted-eu.js` - Playwright implementation
|
|
3. `package.json` - Added Playwright dependency
|
|
4. Database - 220 URLs cleaned, 11 new TED tenders added
|
|
|
|
## Next Steps (Optional)
|
|
|
|
1. Monitor digital-marketplace API
|
|
2. Expand TED keyword search
|
|
3. Consider additional UK procurement sources
|