Ticket #43
Indonesia registry — chunked pdfplumber extraction
closed
task
normal
slothful_seth
2026-05-22T03:32:51Z
2026-05-28T04:16:33.433117Z
Description
Indonesia has 1,580 records currently but the full dataset requires chunked pdfplumber extraction from a large PDF. The current import is partial. Implement chunked extraction to get the full dataset.
Ticket Events
-
COMMENT
—
Iguana redesigned — parses staged files to _parsed.csv in manual_intake/<CC>/, no DB insert. DB promotion is ticket #9.
Seth filenames updated to YYYY-MM-DD_HHMM_CC_raw.<ext> convention.
Indonesia PDF parser (_parse_id): pdfplumber, chunked 20 pages, regex anchored on 4-digit year to handle space-containing types like B 737-800. Captures registration, type, MSN, year, date_reg, owner, operator, lessor, status. ~91% coverage (1,366 of 1,505 records) — remaining ~9% are rows where pdfplumber wraps a line mid-field.
pdfplumber>=0.11 added to requirements.txt — container needs rebuild to pick it up.
2026-05-28T04:16:30.372775Z
-
CLOSE
—
Ticket closed
2026-05-28T04:16:33.494681Z
Add Comment