Early OFAC years use batch PDFs where one document covers many penalty
cases (e.g. 56 rows sharing the same PDF in 2003). Deriving text_id from
the PDF filename caused all rows sharing a document to overwrite each other
in the DB, reducing 1061 rows to 348.
Fix: text_id = yyyyMMdd_{slugified_name}, which is unique per table row.
Also add ofac-scrape-only command for fast table-only scraping without PDF downloads.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scrapes https://ofac.treasury.gov/civil-penalties-and-enforcement-information
for all years 2003-present. Downloads PDF documents and exports metadata.json
per CGSH Publication spec (v3) to S3 experimental bucket under ofac/ prefix.
Commands: ofac-full (all years), ofac-daily (current year incremental).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>