feat: visual polish, nav login link, pricing badge fix, cursor fix, button contrast
- Hero mockup: enhanced 3D perspective and shadow - Testimonials: illustrated SVG avatars - Growth pricing card: visual prominence (scale, gradient, badge) - Most Popular badge: repositioned to avoid overlapping heading - Nav: added Log In link next to Start Free Trial - Fixed btn-primary text colour on anchor tags (white on blue) - Fixed cursor: default on all non-interactive elements - Disabled user-select on non-form content to prevent text caret
This commit is contained in:
116
scrapers/README.md
Normal file
116
scrapers/README.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# TenderRadar Scrapers
|
||||
|
||||
This directory contains scrapers for UK public procurement tender sources.
|
||||
|
||||
## Scrapers
|
||||
|
||||
### 1. Contracts Finder (`contracts-finder.js`)
|
||||
- **Source**: https://www.contractsfinder.service.gov.uk
|
||||
- **Coverage**: England and non-devolved territories
|
||||
- **Method**: JSON API
|
||||
- **Frequency**: Every 4 hours (0, 4, 8, 12, 16, 20:00)
|
||||
- **Data Range**: Last 30 days
|
||||
- **Status**: ✅ Working
|
||||
|
||||
### 2. Find a Tender (`find-tender.js`)
|
||||
- **Source**: https://www.find-tender.service.gov.uk
|
||||
- **Coverage**: UK-wide above-threshold procurement notices
|
||||
- **Method**: HTML scraping with pagination (5 pages)
|
||||
- **Frequency**: Every 4 hours (0:10, 4:10, 8:10, 12:10, 16:10, 20:10)
|
||||
- **Status**: ✅ Working
|
||||
|
||||
### 3. Public Contracts Scotland (`pcs-scotland.js`)
|
||||
- **Source**: https://www.publiccontractsscotland.gov.uk
|
||||
- **Coverage**: Scottish public sector tenders
|
||||
- **Method**: HTML scraping
|
||||
- **Frequency**: Every 4 hours (0:20, 4:20, 8:20, 12:20, 16:20, 20:20)
|
||||
- **Status**: ✅ Working
|
||||
|
||||
### 4. Sell2Wales (`sell2wales.js`)
|
||||
- **Source**: https://www.sell2wales.gov.wales
|
||||
- **Coverage**: Welsh public sector tenders
|
||||
- **Method**: HTML scraping
|
||||
- **Frequency**: Every 4 hours (0:30, 4:30, 8:30, 12:30, 16:30, 20:30)
|
||||
- **Status**: ✅ Working
|
||||
|
||||
## Database Schema
|
||||
|
||||
All scrapers insert into the `tenders` table with the following key fields:
|
||||
|
||||
- `source`: Identifier for the data source (contracts_finder, find_tender, pcs_scotland, sell2wales)
|
||||
- `source_id`: Unique identifier from the source (used for deduplication via UNIQUE constraint)
|
||||
- `title`: Tender title
|
||||
- `description`: Full description
|
||||
- `summary`: Shortened description
|
||||
- `authority_name`: Publishing authority
|
||||
- `location`: Geographic location
|
||||
- `published_date`: When the tender was published
|
||||
- `deadline`: Application deadline
|
||||
- `notice_url`: Link to full notice
|
||||
- `status`: open/closed based on deadline
|
||||
|
||||
## Running Scrapers
|
||||
|
||||
### Individual Scraper
|
||||
```bash
|
||||
cd /home/peter/tenderpilot
|
||||
node scrapers/contracts-finder.js
|
||||
node scrapers/find-tender.js
|
||||
node scrapers/pcs-scotland.js
|
||||
node scrapers/sell2wales.js
|
||||
```
|
||||
|
||||
### All Scrapers
|
||||
```bash
|
||||
cd /home/peter/tenderpilot
|
||||
./run-all-scrapers.sh
|
||||
```
|
||||
|
||||
## Cron Schedule
|
||||
|
||||
The scrapers run automatically every 4 hours, staggered by 10 minutes:
|
||||
|
||||
```cron
|
||||
0 */4 * * * cd /home/peter/tenderpilot && node scrapers/contracts-finder.js >> /home/peter/tenderpilot/scraper.log 2>&1
|
||||
10 */4 * * * cd /home/peter/tenderpilot && node scrapers/find-tender.js >> /home/peter/tenderpilot/scraper.log 2>&1
|
||||
20 */4 * * * cd /home/peter/tenderpilot && node scrapers/pcs-scotland.js >> /home/peter/tenderpilot/scraper.log 2>&1
|
||||
30 */4 * * * cd /home/peter/tenderpilot && node scrapers/sell2wales.js >> /home/peter/tenderpilot/scraper.log 2>&1
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
Check logs:
|
||||
```bash
|
||||
tail -f /home/peter/tenderpilot/scraper.log
|
||||
```
|
||||
|
||||
Check database:
|
||||
```bash
|
||||
PGPASSWORD=tenderpilot123 psql -h localhost -U tenderpilot -d tenderpilot -c "SELECT source, COUNT(*) FROM tenders GROUP BY source;"
|
||||
```
|
||||
|
||||
## Rate Limiting & Ethical Scraping
|
||||
|
||||
All scrapers implement:
|
||||
- Proper User-Agent headers identifying the service
|
||||
- Rate limiting (2-5 second delays between requests)
|
||||
- Pagination limits where applicable
|
||||
- Respectful request patterns
|
||||
|
||||
## Dependencies
|
||||
|
||||
- axios: HTTP client
|
||||
- cheerio: HTML parsing (for web scrapers)
|
||||
- pg: PostgreSQL client
|
||||
- dotenv: Environment variables
|
||||
|
||||
## Maintenance
|
||||
|
||||
- Scrapers use `ON CONFLICT (source_id) DO NOTHING` to avoid duplicates
|
||||
- Old scrapers can update existing records if needed
|
||||
- Monitor for HTML structure changes on scraped sites
|
||||
- API endpoints (Contracts Finder) are more stable than HTML scraping
|
||||
|
||||
## Last Updated
|
||||
|
||||
2026-02-14 - Initial deployment with all four scrapers
|
||||
Reference in New Issue
Block a user