feat: visual polish, nav login link, pricing badge fix, cursor fix, button contrast
- Hero mockup: enhanced 3D perspective and shadow - Testimonials: illustrated SVG avatars - Growth pricing card: visual prominence (scale, gradient, badge) - Most Popular badge: repositioned to avoid overlapping heading - Nav: added Log In link next to Start Free Trial - Fixed btn-primary text colour on anchor tags (white on blue) - Fixed cursor: default on all non-interactive elements - Disabled user-select on non-form content to prevent text caret
This commit is contained in:
116
scrapers/README.md
Normal file
116
scrapers/README.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# TenderRadar Scrapers
|
||||
|
||||
This directory contains scrapers for UK public procurement tender sources.
|
||||
|
||||
## Scrapers
|
||||
|
||||
### 1. Contracts Finder (`contracts-finder.js`)
|
||||
- **Source**: https://www.contractsfinder.service.gov.uk
|
||||
- **Coverage**: England and non-devolved territories
|
||||
- **Method**: JSON API
|
||||
- **Frequency**: Every 4 hours (0, 4, 8, 12, 16, 20:00)
|
||||
- **Data Range**: Last 30 days
|
||||
- **Status**: ✅ Working
|
||||
|
||||
### 2. Find a Tender (`find-tender.js`)
|
||||
- **Source**: https://www.find-tender.service.gov.uk
|
||||
- **Coverage**: UK-wide above-threshold procurement notices
|
||||
- **Method**: HTML scraping with pagination (5 pages)
|
||||
- **Frequency**: Every 4 hours (0:10, 4:10, 8:10, 12:10, 16:10, 20:10)
|
||||
- **Status**: ✅ Working
|
||||
|
||||
### 3. Public Contracts Scotland (`pcs-scotland.js`)
|
||||
- **Source**: https://www.publiccontractsscotland.gov.uk
|
||||
- **Coverage**: Scottish public sector tenders
|
||||
- **Method**: HTML scraping
|
||||
- **Frequency**: Every 4 hours (0:20, 4:20, 8:20, 12:20, 16:20, 20:20)
|
||||
- **Status**: ✅ Working
|
||||
|
||||
### 4. Sell2Wales (`sell2wales.js`)
|
||||
- **Source**: https://www.sell2wales.gov.wales
|
||||
- **Coverage**: Welsh public sector tenders
|
||||
- **Method**: HTML scraping
|
||||
- **Frequency**: Every 4 hours (0:30, 4:30, 8:30, 12:30, 16:30, 20:30)
|
||||
- **Status**: ✅ Working
|
||||
|
||||
## Database Schema
|
||||
|
||||
All scrapers insert into the `tenders` table with the following key fields:
|
||||
|
||||
- `source`: Identifier for the data source (contracts_finder, find_tender, pcs_scotland, sell2wales)
|
||||
- `source_id`: Unique identifier from the source (used for deduplication via UNIQUE constraint)
|
||||
- `title`: Tender title
|
||||
- `description`: Full description
|
||||
- `summary`: Shortened description
|
||||
- `authority_name`: Publishing authority
|
||||
- `location`: Geographic location
|
||||
- `published_date`: When the tender was published
|
||||
- `deadline`: Application deadline
|
||||
- `notice_url`: Link to full notice
|
||||
- `status`: open/closed based on deadline
|
||||
|
||||
## Running Scrapers
|
||||
|
||||
### Individual Scraper
|
||||
```bash
|
||||
cd /home/peter/tenderpilot
|
||||
node scrapers/contracts-finder.js
|
||||
node scrapers/find-tender.js
|
||||
node scrapers/pcs-scotland.js
|
||||
node scrapers/sell2wales.js
|
||||
```
|
||||
|
||||
### All Scrapers
|
||||
```bash
|
||||
cd /home/peter/tenderpilot
|
||||
./run-all-scrapers.sh
|
||||
```
|
||||
|
||||
## Cron Schedule
|
||||
|
||||
The scrapers run automatically every 4 hours, staggered by 10 minutes:
|
||||
|
||||
```cron
|
||||
0 */4 * * * cd /home/peter/tenderpilot && node scrapers/contracts-finder.js >> /home/peter/tenderpilot/scraper.log 2>&1
|
||||
10 */4 * * * cd /home/peter/tenderpilot && node scrapers/find-tender.js >> /home/peter/tenderpilot/scraper.log 2>&1
|
||||
20 */4 * * * cd /home/peter/tenderpilot && node scrapers/pcs-scotland.js >> /home/peter/tenderpilot/scraper.log 2>&1
|
||||
30 */4 * * * cd /home/peter/tenderpilot && node scrapers/sell2wales.js >> /home/peter/tenderpilot/scraper.log 2>&1
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
Check logs:
|
||||
```bash
|
||||
tail -f /home/peter/tenderpilot/scraper.log
|
||||
```
|
||||
|
||||
Check database:
|
||||
```bash
|
||||
PGPASSWORD=tenderpilot123 psql -h localhost -U tenderpilot -d tenderpilot -c "SELECT source, COUNT(*) FROM tenders GROUP BY source;"
|
||||
```
|
||||
|
||||
## Rate Limiting & Ethical Scraping
|
||||
|
||||
All scrapers implement:
|
||||
- Proper User-Agent headers identifying the service
|
||||
- Rate limiting (2-5 second delays between requests)
|
||||
- Pagination limits where applicable
|
||||
- Respectful request patterns
|
||||
|
||||
## Dependencies
|
||||
|
||||
- axios: HTTP client
|
||||
- cheerio: HTML parsing (for web scrapers)
|
||||
- pg: PostgreSQL client
|
||||
- dotenv: Environment variables
|
||||
|
||||
## Maintenance
|
||||
|
||||
- Scrapers use `ON CONFLICT (source_id) DO NOTHING` to avoid duplicates
|
||||
- Old scrapers can update existing records if needed
|
||||
- Monitor for HTML structure changes on scraped sites
|
||||
- API endpoints (Contracts Finder) are more stable than HTML scraping
|
||||
|
||||
## Last Updated
|
||||
|
||||
2026-02-14 - Initial deployment with all four scrapers
|
||||
104
scrapers/contracts-finder.js
Executable file
104
scrapers/contracts-finder.js
Executable file
@@ -0,0 +1,104 @@
|
||||
import axios from 'axios';
|
||||
import pg from 'pg';
|
||||
import dotenv from 'dotenv';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
const pool = new pg.Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://tenderpilot:tenderpilot123@localhost:5432/tenderpilot'
|
||||
});
|
||||
|
||||
async function scrapeTenders() {
|
||||
try {
|
||||
console.log(`[${new Date().toISOString()}] Starting tender scrape...`);
|
||||
|
||||
// Get date from 30 days ago
|
||||
const fromDate = new Date();
|
||||
fromDate.setDate(fromDate.getDate() - 30);
|
||||
const dateStr = fromDate.toISOString().split('T')[0];
|
||||
|
||||
const url = `https://www.contractsfinder.service.gov.uk/Published/Notices/OCDS/Search?stage=tender&output=json&publishedFrom=${dateStr}`;
|
||||
|
||||
console.log(`Fetching from: ${url}`);
|
||||
const response = await axios.get(url, { timeout: 30000 });
|
||||
|
||||
const data = response.data;
|
||||
const releases = data.releases || [];
|
||||
|
||||
console.log(`Found ${releases.length} tenders`);
|
||||
|
||||
let insertedCount = 0;
|
||||
|
||||
for (const release of releases) {
|
||||
try {
|
||||
const tender = release.tender || {};
|
||||
const planning = release.planning || {};
|
||||
const parties = release.parties || [];
|
||||
|
||||
// Find procuring entity
|
||||
const procurer = parties.find(p => p.roles && p.roles.includes('procurer'));
|
||||
|
||||
const sourceId = release.ocid || release.id;
|
||||
const title = tender.title || 'Untitled';
|
||||
const description = tender.description || '';
|
||||
const publishedDate = release.date;
|
||||
const deadline = tender.tenderPeriod?.endDate;
|
||||
const authority = procurer?.name || 'Unknown';
|
||||
const location = planning?.budget?.description || tender.procurementMethod || '';
|
||||
const noticeUrl = release.url || '';
|
||||
const documentsUrl = tender.documents?.length > 0 ? tender.documents[0].url : '';
|
||||
|
||||
// Extract value
|
||||
let valueLow = null, valueHigh = null;
|
||||
if (planning?.budget?.amount?.amount) {
|
||||
valueLow = planning.budget.amount.amount;
|
||||
valueHigh = planning.budget.amount.amount;
|
||||
} else if (tender.value?.amount) {
|
||||
valueLow = tender.value.amount;
|
||||
valueHigh = tender.value.amount;
|
||||
}
|
||||
|
||||
const cpvCodes = tender.classification ? [tender.classification.scheme] : [];
|
||||
|
||||
await pool.query(
|
||||
`INSERT INTO tenders (
|
||||
source, source_id, title, description, summary, cpv_codes,
|
||||
value_low, value_high, currency, published_date, deadline,
|
||||
authority_name, authority_type, location, documents_url, notice_url, status
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
|
||||
ON CONFLICT (source_id) DO NOTHING`,
|
||||
[
|
||||
'contracts_finder',
|
||||
sourceId,
|
||||
title.substring(0, 500),
|
||||
description,
|
||||
description.substring(0, 500),
|
||||
cpvCodes,
|
||||
valueLow,
|
||||
valueHigh,
|
||||
'GBP',
|
||||
publishedDate,
|
||||
deadline,
|
||||
authority,
|
||||
'government',
|
||||
location.substring(0, 255),
|
||||
documentsUrl,
|
||||
noticeUrl,
|
||||
'open'
|
||||
]
|
||||
);
|
||||
insertedCount++;
|
||||
} catch (e) {
|
||||
console.error('Error inserting tender:', e.message);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`[${new Date().toISOString()}] Scrape complete. Inserted/updated ${insertedCount} tenders`);
|
||||
} catch (error) {
|
||||
console.error('Error scraping tenders:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
scrapeTenders();
|
||||
127
scrapers/find-tender.js
Normal file
127
scrapers/find-tender.js
Normal file
@@ -0,0 +1,127 @@
|
||||
import axios from 'axios';
|
||||
import * as cheerio from 'cheerio';
|
||||
import pg from 'pg';
|
||||
import dotenv from 'dotenv';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
const pool = new pg.Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://tenderpilot:tenderpilot123@localhost:5432/tenderpilot'
|
||||
});
|
||||
|
||||
// Rate limiting
|
||||
const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));
|
||||
|
||||
async function scrapeTenders() {
|
||||
try {
|
||||
console.log(`[${new Date().toISOString()}] Starting Find a Tender scrape...`);
|
||||
|
||||
let insertedCount = 0;
|
||||
const maxPages = 5; // Limit to first 5 pages to be respectful
|
||||
|
||||
for (let page = 1; page <= maxPages; page++) {
|
||||
console.log(`Fetching page ${page}...`);
|
||||
|
||||
const url = `https://www.find-tender.service.gov.uk/Search/Results?page=${page}&sort=recent`;
|
||||
|
||||
const response = await axios.get(url, {
|
||||
timeout: 30000,
|
||||
headers: {
|
||||
'User-Agent': 'TenderRadar/1.0 (UK Public Procurement Aggregator; contact@tenderradar.co.uk)'
|
||||
}
|
||||
});
|
||||
|
||||
const $ = cheerio.load(response.data);
|
||||
const tenderElements = $('.search-result');
|
||||
|
||||
if (tenderElements.length === 0) {
|
||||
console.log('No more tenders found, stopping pagination');
|
||||
break;
|
||||
}
|
||||
|
||||
console.log(`Found ${tenderElements.length} tenders on page ${page}`);
|
||||
|
||||
for (let i = 0; i < tenderElements.length; i++) {
|
||||
try {
|
||||
const element = tenderElements.eq(i);
|
||||
|
||||
const titleLink = element.find('.search-result-header a').first();
|
||||
const title = titleLink.text().trim();
|
||||
const noticeUrl = 'https://www.find-tender.service.gov.uk' + titleLink.attr('href');
|
||||
|
||||
// Extract source ID from URL
|
||||
const urlMatch = noticeUrl.match(/\/([A-Z0-9-]+)$/);
|
||||
const sourceId = urlMatch ? urlMatch[1] : noticeUrl;
|
||||
|
||||
const authority = element.find('.search-result-sub-header').text().trim();
|
||||
const description = element.find('.search-result-description').text().trim();
|
||||
|
||||
// Extract dates and value
|
||||
const metadata = element.find('.search-result-metadata').text();
|
||||
let publishedDate = null;
|
||||
let deadline = null;
|
||||
let valueLow = null;
|
||||
|
||||
const publishMatch = metadata.match(/Published:\s*(\d{1,2}\s+\w+\s+\d{4})/);
|
||||
if (publishMatch) {
|
||||
publishedDate = new Date(publishMatch[1]).toISOString();
|
||||
}
|
||||
|
||||
const deadlineMatch = metadata.match(/Deadline:\s*(\d{1,2}\s+\w+\s+\d{4})/);
|
||||
if (deadlineMatch) {
|
||||
deadline = new Date(deadlineMatch[1]).toISOString();
|
||||
}
|
||||
|
||||
const valueMatch = metadata.match(/£([\d,]+)/);
|
||||
if (valueMatch) {
|
||||
valueLow = parseFloat(valueMatch[1].replace(/,/g, ''));
|
||||
}
|
||||
|
||||
await pool.query(
|
||||
`INSERT INTO tenders (
|
||||
source, source_id, title, description, summary, cpv_codes,
|
||||
value_low, value_high, currency, published_date, deadline,
|
||||
authority_name, authority_type, location, documents_url, notice_url, status
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
|
||||
ON CONFLICT (source_id) DO NOTHING`,
|
||||
[
|
||||
'find_tender',
|
||||
sourceId,
|
||||
title.substring(0, 500),
|
||||
description,
|
||||
description.substring(0, 500),
|
||||
[],
|
||||
valueLow,
|
||||
valueLow,
|
||||
'GBP',
|
||||
publishedDate,
|
||||
deadline,
|
||||
authority,
|
||||
'government',
|
||||
'UK',
|
||||
'',
|
||||
noticeUrl,
|
||||
deadline && new Date(deadline) > new Date() ? 'open' : 'closed'
|
||||
]
|
||||
);
|
||||
insertedCount++;
|
||||
} catch (e) {
|
||||
console.error('Error inserting tender:', e.message);
|
||||
}
|
||||
}
|
||||
|
||||
// Rate limiting: wait 2 seconds between pages
|
||||
if (page < maxPages) {
|
||||
await delay(2000);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`[${new Date().toISOString()}] Find a Tender scrape complete. Inserted/updated ${insertedCount} tenders`);
|
||||
} catch (error) {
|
||||
console.error('Error scraping Find a Tender:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
scrapeTenders();
|
||||
153
scrapers/pcs-scotland.js
Normal file
153
scrapers/pcs-scotland.js
Normal file
@@ -0,0 +1,153 @@
|
||||
import axios from 'axios';
|
||||
import * as cheerio from 'cheerio';
|
||||
import pg from 'pg';
|
||||
import dotenv from 'dotenv';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
const pool = new pg.Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://tenderpilot:tenderpilot123@localhost:5432/tenderpilot'
|
||||
});
|
||||
|
||||
function parseDate(dateStr) {
|
||||
if (!dateStr || dateStr.trim() === '') return null;
|
||||
|
||||
try {
|
||||
// Handle format like "13/02/2026"
|
||||
if (dateStr.match(/^\d{2}\/\d{2}\/\d{4}$/)) {
|
||||
const [day, month, year] = dateStr.split('/');
|
||||
const date = new Date(`${year}-${month}-${day}`);
|
||||
return date.toISOString();
|
||||
}
|
||||
|
||||
// Handle format like "16-Mar-26"
|
||||
if (dateStr.match(/^\d{2}-\w+-\d{2}$/)) {
|
||||
const parts = dateStr.split('-');
|
||||
const day = parts[0];
|
||||
const month = parts[1];
|
||||
const year = '20' + parts[2];
|
||||
const date = new Date(`${day} ${month} ${year}`);
|
||||
if (isNaN(date.getTime())) return null;
|
||||
return date.toISOString();
|
||||
}
|
||||
|
||||
// Try general parsing
|
||||
const date = new Date(dateStr);
|
||||
if (isNaN(date.getTime())) return null;
|
||||
return date.toISOString();
|
||||
} catch (e) {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function cleanTitle(title) {
|
||||
// Remove common artifacts
|
||||
return title
|
||||
.replace(/\s*\(Opens in new tab\)\s*/gi, '')
|
||||
.replace(/\s*\(Opens in new window\)\s*/gi, '')
|
||||
.trim();
|
||||
}
|
||||
|
||||
async function scrapeTenders() {
|
||||
try {
|
||||
console.log(`[${new Date().toISOString()}] Starting PCS Scotland scrape...`);
|
||||
|
||||
let insertedCount = 0;
|
||||
|
||||
const url = 'https://www.publiccontractsscotland.gov.uk/search/Search_MainPage.aspx';
|
||||
|
||||
const response = await axios.get(url, {
|
||||
timeout: 30000,
|
||||
headers: {
|
||||
'User-Agent': 'TenderRadar/1.0 (UK Public Procurement Aggregator; contact@tenderradar.co.uk)'
|
||||
}
|
||||
});
|
||||
|
||||
const $ = cheerio.load(response.data);
|
||||
|
||||
// Find all tender rows
|
||||
const tenderRows = $('table tr').filter((i, el) => {
|
||||
return $(el).find('a[href*="search_view.aspx"]').length > 0;
|
||||
});
|
||||
|
||||
console.log(`Found ${tenderRows.length} tenders`);
|
||||
|
||||
for (let i = 0; i < tenderRows.length; i++) {
|
||||
try {
|
||||
const row = tenderRows.eq(i);
|
||||
const cells = row.find('td');
|
||||
|
||||
if (cells.length === 0) continue;
|
||||
|
||||
const dateText = cells.eq(0).text().trim();
|
||||
const detailsCell = cells.eq(1);
|
||||
|
||||
const titleLink = detailsCell.find('a').first();
|
||||
const rawTitle = titleLink.text().trim();
|
||||
const title = cleanTitle(rawTitle);
|
||||
|
||||
if (!title || title.length === 0) continue;
|
||||
|
||||
const noticeUrl = 'https://www.publiccontractsscotland.gov.uk' + titleLink.attr('href');
|
||||
|
||||
const detailsText = detailsCell.text();
|
||||
|
||||
const refMatch = detailsText.match(/Reference No:\s*([A-Z0-9]+)/);
|
||||
const sourceId = refMatch ? refMatch[1] : ('pcs_' + Date.now() + '_' + i);
|
||||
|
||||
const authorityMatch = detailsText.match(/Published By:\s*([^\n]+)/);
|
||||
const authority = authorityMatch ? authorityMatch[1].trim() : 'Unknown';
|
||||
|
||||
const deadlineMatch = detailsText.match(/Deadline Date:\s*(\d{2}-\w+-\d{2})/);
|
||||
const deadline = deadlineMatch ? parseDate(deadlineMatch[1]) : null;
|
||||
|
||||
const noticeTypeMatch = detailsText.match(/Notice Type:\s*([^\n]+)/);
|
||||
const noticeType = noticeTypeMatch ? noticeTypeMatch[1].trim() : '';
|
||||
|
||||
const publishedDate = parseDate(dateText);
|
||||
|
||||
await pool.query(
|
||||
`INSERT INTO tenders (
|
||||
source, source_id, title, description, summary, cpv_codes,
|
||||
value_low, value_high, currency, published_date, deadline,
|
||||
authority_name, authority_type, location, documents_url, notice_url, status
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
|
||||
ON CONFLICT (source_id) DO UPDATE SET
|
||||
title = EXCLUDED.title,
|
||||
description = EXCLUDED.description,
|
||||
summary = EXCLUDED.summary`,
|
||||
[
|
||||
'pcs_scotland',
|
||||
sourceId,
|
||||
title.substring(0, 500),
|
||||
noticeType,
|
||||
noticeType.substring(0, 500),
|
||||
[],
|
||||
null,
|
||||
null,
|
||||
'GBP',
|
||||
publishedDate,
|
||||
deadline,
|
||||
authority,
|
||||
'government',
|
||||
'Scotland',
|
||||
'',
|
||||
noticeUrl,
|
||||
deadline && new Date(deadline) > new Date() ? 'open' : 'closed'
|
||||
]
|
||||
);
|
||||
insertedCount++;
|
||||
} catch (e) {
|
||||
console.error('Error inserting tender:', e.message);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`[${new Date().toISOString()}] PCS Scotland scrape complete. Inserted/updated ${insertedCount} tenders`);
|
||||
} catch (error) {
|
||||
console.error('Error scraping PCS Scotland:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
scrapeTenders();
|
||||
155
scrapers/sell2wales.js
Normal file
155
scrapers/sell2wales.js
Normal file
@@ -0,0 +1,155 @@
|
||||
import axios from 'axios';
|
||||
import * as cheerio from 'cheerio';
|
||||
import pg from 'pg';
|
||||
import dotenv from 'dotenv';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
const pool = new pg.Pool({
|
||||
connectionString: process.env.DATABASE_URL || 'postgresql://tenderpilot:tenderpilot123@localhost:5432/tenderpilot'
|
||||
});
|
||||
|
||||
function parseDate(dateStr) {
|
||||
if (!dateStr || dateStr.trim() === '') return null;
|
||||
|
||||
try {
|
||||
// Handle format like "13/02/2026"
|
||||
if (dateStr.match(/^\d{2}\/\d{2}\/\d{4}$/)) {
|
||||
const [day, month, year] = dateStr.split('/');
|
||||
const date = new Date(`${year}-${month}-${day}`);
|
||||
if (isNaN(date.getTime())) return null;
|
||||
return date.toISOString();
|
||||
}
|
||||
|
||||
// Try general parsing
|
||||
const date = new Date(dateStr);
|
||||
if (isNaN(date.getTime())) return null;
|
||||
return date.toISOString();
|
||||
} catch (e) {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function scrapeTenders() {
|
||||
try {
|
||||
console.log(`[${new Date().toISOString()}] Starting Sell2Wales scrape...`);
|
||||
|
||||
let insertedCount = 0;
|
||||
|
||||
const url = 'https://www.sell2wales.gov.wales/search/Search_MainPage.aspx';
|
||||
|
||||
const response = await axios.get(url, {
|
||||
timeout: 30000,
|
||||
headers: {
|
||||
'User-Agent': 'TenderRadar/1.0 (UK Public Procurement Aggregator; contact@tenderradar.co.uk)'
|
||||
}
|
||||
});
|
||||
|
||||
const $ = cheerio.load(response.data);
|
||||
|
||||
// Find all links to tender detail pages
|
||||
const tenderLinks = $('a[href*="search_view.aspx?ID="]');
|
||||
|
||||
console.log(`Found ${tenderLinks.length} potential tenders`);
|
||||
|
||||
// Group by parent containers to avoid duplicates
|
||||
const processed = new Set();
|
||||
|
||||
for (let i = 0; i < tenderLinks.length; i++) {
|
||||
try {
|
||||
const link = tenderLinks.eq(i);
|
||||
const href = link.attr('href');
|
||||
|
||||
if (!href || processed.has(href)) continue;
|
||||
processed.add(href);
|
||||
|
||||
const title = link.text().trim();
|
||||
if (!title || title.length === 0) continue;
|
||||
|
||||
const noticeUrl = href.startsWith('http') ? href : 'https://www.sell2wales.gov.wales' + href;
|
||||
|
||||
// Get the parent container for this tender
|
||||
const container = link.closest('div, li, tr');
|
||||
const containerText = container.text();
|
||||
|
||||
// Extract reference number from URL
|
||||
const idMatch = href.match(/ID=([A-Z0-9]+)/);
|
||||
const sourceId = idMatch ? idMatch[1] : ('s2w_' + Date.now() + '_' + i);
|
||||
|
||||
// Extract metadata
|
||||
const refMatch = containerText.match(/Reference no:\s*([A-Z0-9]+)/i);
|
||||
const finalRef = refMatch ? refMatch[1] : sourceId;
|
||||
|
||||
const authorityMatch = containerText.match(/Published by:\s*([^\n]+)/i);
|
||||
const authority = authorityMatch ? authorityMatch[1].trim() : 'Unknown';
|
||||
|
||||
const pubDateMatch = containerText.match(/Publication date:\s*(\d{2}\/\d{2}\/\d{4})/i);
|
||||
const publishedDate = pubDateMatch ? parseDate(pubDateMatch[1]) : null;
|
||||
|
||||
const deadlineMatch = containerText.match(/Deadline date:\s*(\d{2}\/\d{2}\/\d{4})/i);
|
||||
const deadline = deadlineMatch ? parseDate(deadlineMatch[1]) : null;
|
||||
|
||||
const noticeTypeMatch = containerText.match(/Notice Type:\s*([^\n]+)/i);
|
||||
const noticeType = noticeTypeMatch ? noticeTypeMatch[1].trim() : '';
|
||||
|
||||
const locationMatch = containerText.match(/Location:\s*([^\n#]+)/i);
|
||||
const location = locationMatch ? locationMatch[1].trim() : 'Wales';
|
||||
|
||||
const valueMatch = containerText.match(/Value:\s*(\d+)/i);
|
||||
let valueLow = null;
|
||||
if (valueMatch) {
|
||||
valueLow = parseInt(valueMatch[1]);
|
||||
}
|
||||
|
||||
// Look for description in nearby paragraphs or divs
|
||||
let description = '';
|
||||
const nearbyP = container.find('p').first();
|
||||
if (nearbyP.length > 0) {
|
||||
description = nearbyP.text().trim();
|
||||
}
|
||||
if (!description || description.length < 10) {
|
||||
description = noticeType || title;
|
||||
}
|
||||
|
||||
await pool.query(
|
||||
`INSERT INTO tenders (
|
||||
source, source_id, title, description, summary, cpv_codes,
|
||||
value_low, value_high, currency, published_date, deadline,
|
||||
authority_name, authority_type, location, documents_url, notice_url, status
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)
|
||||
ON CONFLICT (source_id) DO NOTHING`,
|
||||
[
|
||||
'sell2wales',
|
||||
finalRef,
|
||||
title.substring(0, 500),
|
||||
description.substring(0, 1000),
|
||||
description.substring(0, 500),
|
||||
[],
|
||||
valueLow,
|
||||
valueLow,
|
||||
'GBP',
|
||||
publishedDate,
|
||||
deadline,
|
||||
authority.substring(0, 255),
|
||||
'government',
|
||||
location.substring(0, 255),
|
||||
'',
|
||||
noticeUrl,
|
||||
deadline && new Date(deadline) > new Date() ? 'open' : 'closed'
|
||||
]
|
||||
);
|
||||
insertedCount++;
|
||||
} catch (e) {
|
||||
console.error('Error inserting tender:', e.message);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`[${new Date().toISOString()}] Sell2Wales scrape complete. Inserted/updated ${insertedCount} tenders`);
|
||||
} catch (error) {
|
||||
console.error('Error scraping Sell2Wales:', error.message);
|
||||
} finally {
|
||||
await pool.end();
|
||||
}
|
||||
}
|
||||
|
||||
scrapeTenders();
|
||||
Reference in New Issue
Block a user