AirTrack BugTracker V2

Goblin desk online
closed
task
normal
slothful_seth
2026-05-22T03:32:51Z
2026-05-28T04:16:33.433117Z

Description

Indonesia has 1,580 records currently but the full dataset requires chunked pdfplumber extraction from a large PDF. The current import is partial. Implement chunked extraction to get the full dataset.

Add Comment

Ticket Events

  • COMMENT — Iguana redesigned — parses staged files to _parsed.csv in manual_intake/<CC>/, no DB insert. DB promotion is ticket #9. Seth filenames updated to YYYY-MM-DD_HHMM_CC_raw.<ext> convention. Indonesia PDF parser (_parse_id): pdfplumber, chunked 20 pages, regex anchored on 4-digit year to handle space-containing types like B 737-800. Captures registration, type, MSN, year, date_reg, owner, operator, lessor, status. ~91% coverage (1,366 of 1,505 records) — remaining ~9% are rows where pdfplumber wraps a line mid-field. pdfplumber>=0.11 added to requirements.txt — container needs rebuild to pick it up.
    2026-05-28T04:16:30.372775Z
  • CLOSE — Ticket closed
    2026-05-28T04:16:33.494681Z