Every element in a Revit-exported PDF carries a tagged content block that encodes its element identifier. bim-cli extracts those tags and the accompanying text — including schedule rows that standard PDF extractors miss — as structured JSON.
Copy this into your AI coding tool:
I have a Revit-exported PDF drawing set. Use bim-cli to extract structured element and text data from it.Step 1 — install bim-cli if not already installed: powershell.exe -NoProfile -ExecutionPolicy Bypass -Command "iwr -useb https://bimcli.com/install.ps1 | iex"
Step 2 — refresh PATH in the current session: bash: export PATH="$PATH:$LOCALAPPDATA/bim-cli" pwsh: $env:Path += ";$env:LOCALAPPDATA\bim-cli"
Step 3 — confirm the pdf driver is ready: bim doctor --json
Step 4 — inspect the PDF: bim pdf info <path-to-drawings.pdf>
Step 5 — extract all BIM element tags (element identifier + page + bounds per element): bim pdf marked <path-to-drawings.pdf>
Step 6 — extract all text content (handles Revit's Type0 Identity-H fonts): bim pdf text <path-to-drawings.pdf>
Both commands return a single JSON object to stdout. Pipe to jq or write to a file.
bim pdf info drawings.pdf
{"ok":true,"result":{"file":"drawings.pdf","sizeBytes":98304000,"pages":142,"pdfVersion":"1.6"}}
bim pdf marked drawings.pdf
{"ok":true,"result":[{"page":1,"tag":"Element_1234567","pathCount":3,"bounds":{"x0":120.4,"y0":340.2,"x1":180.1,"y1":380.5}},{"page":1,"tag":"Element_1234568","pathCount":2,"bounds":{"x0":200.0,"y0":340.2,"x1":260.0,"y1":380.5}},...]}
tag is the marked-content identifier from the Revit PDF content stream. pathCount is the number of graphic path operations inside the block — use it to filter out dimension strings and annotation elements. bounds is the bounding box in PDF user-space units.
bim pdf text drawings.pdf
{"ok":true,"result":[{"page":1,"text":"Room 101","x":245.0,"y":620.3,"fontSize":10.0,"font":"ArialMT"},{"page":1,"text":"47.5 m²","x":320.0,"y":620.3,"fontSize":10.0,"font":"ArialMT"},...]}
Revit uses Type0 Identity-H font encoding. Naive PDF text extractors return empty strings or garbled characters. bim pdf text decodes via ToUnicode CMap streams and recovers the original text.
This scenario is verified end-to-end by the bim-cli test harness. Run it yourself to confirm the installed binary matches this page:
bim scenario run tests/scenarios/pdf-extract-tables.yaml --strict
The YAML defines the prompt above, the commands, and JSONPath assertions over the actual stdout. All three steps must return "ok":true and exit 0.
Scenario source: tests/scenarios/pdf-extract-tables.yaml
Revit sheet sets contain door schedules, room finish schedules, and window schedules embedded as tables in each sheet. Standard tools like camelot-py extract tables but add a Python dependency and have no BIM context. bim pdf marked and bim pdf text provide the same structured output from a single offline Windows binary with no runtime dependency.