73dcf9367bc5d6a7ef4d487c29a09e72381d0a53
Early OFAC years use batch PDFs where one document covers many penalty
cases (e.g. 56 rows sharing the same PDF in 2003). Deriving text_id from
the PDF filename caused all rows sharing a document to overwrite each other
in the DB, reducing 1061 rows to 348.
Fix: text_id = yyyyMMdd_{slugified_name}, which is unique per table row.
Also add ofac-scrape-only command for fast table-only scraping without PDF downloads.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Description
OFAC Civil Penalties scraper
Languages
C#
100%