Roadmap

What works today, and where this is going

I’m building pdf·markdown in the open, on my own, just me and a keyboard 👨‍💻. So here’s the honest status: what it already does well, what still breaks, and what I’m working on next.

No roadmap theater, no dates I can’t keep. Just where things actually stand, kept current as it changes. If a PDF comes out wrong, that’s not a footnote here. It’s the point of this page. Telling me is the fastest way to get it fixed.

In the works

A desktop app

The same converter, as an app you install. For confidential work that can’t touch the cloud (legal filings, financial statements, anything under NDA), it runs fully offline on your own machine.

What it does well right now

Text-based PDFs. Articles, docs, books with a real text layer come through as clean, structured Markdown.
Headings, lists & simple tables. The structure survives, not a flat wall of text. Many tables that used to fall apart, wide ones and side-by-side ones, now hold together too.
Embedded images. Pictures stored as real images are pulled out and kept in place.
Chinese, Japanese & Korean. CJK text comes through well, not just Latin scripts. Need another language? Tell me and I’ll look into it.
Everyday math & more. Formulas come through, including more complex notation that used to go wrong. Rendered with KaTeX, with clean text fallback for anything that still can’t display.
Academic papers (arXiv style). Two-column layouts now read in the right order (left column, then right) rather than jumping across both at once.
Footnotes that actually work. They come through as clickable superscripts. Hover one to peek at the note without losing your place, click to jump down to it, and hit the little return arrow to land right back where you were reading.
Side-by-side proofreading. The original PDF and the Markdown sit next to each other, so checking the result takes seconds.
No upload by default. The whole conversion runs in your browser, so a confidential filing or a client’s document never leaves your machine. There’s nothing for me to see.

Where it’s still rough

The honest list of where conversions can break today. If you hit one of these, I’d rather you knew going in.

Complex formulas. Most notation now comes through, but the most unusual or dense expressions can still drift.
Complex tables. A lot more holds together now: nested, side-by-side, and tables that continue across a page break. The hardest cases, deeply merged cells or unusually dense grids, can still misalign.
Figures that aren’t real images. Diagrams drawn as vector graphics (not stored as image files) are hard to lift out cleanly.
Scanned PDFs (OCR). Scanned pages are read automatically now, but quality still varies by scan. Clean scans do well, messy or low-resolution ones less so.
Very large files. Hundreds of pages can be slow, since it all runs on your own machine.

A few of these run up against what a browser tab can realistically do. I’m still hunting for better in-browser ways to handle them, and some may ultimately depend on the server-side option below.

Hit one of these? Tell me which file broke. That’s what moves it up the list. I never see your file unless you choose to attach it.

What I’m working on next

Roughly ordered, not scheduled. What you tell me reshuffles it.

Now

Tightening table fidelity, the single most-reported rough spot.
Smoothing the proofread-and-fix pass so corrections are quicker.

Better OCR, so more kinds of scans convert reliably.
The hardest table layouts, deeply merged cells and unusually dense grids.

Maybe

Converting several files in one go.
A server-side option for the heavy jobs (more below).

The bigger picture: a server-side option, maybe

Everything a browser can do, I want to do as well as it can be done, and your files will always stay on your machine here. That’s the whole point.

But several of the hard cases above (better OCR, complex tables and formulas, vector-drawn figures, big batches) are genuinely tough inside a browser tab. For those, there may one day be an optional server-side path to pick up where the browser runs out of room.

Two honest caveats: it’s a direction, not a dated promise, and a server path means your file would leave your device, so it’d always be a clearly-marked choice, never the default. The in-browser tool stays the free, private main event regardless. Whether I build the server side at all, and what it should do, depends on what you tell me you actually need.

Recently shipped

Jul 2026 A new Examples page: a dozen real conversions you can look through before bringing your own file. See them →
Jul 2026 Tables got a real push, tested on the messy real-world kind: multi-column financial statements, mortgage closing disclosures, court filings. Wide tables that used to get torn into pieces now hold together, two side by side no longer get mashed into one, and nested tables with merged headers and sub-rows keep their shape.
Jul 2026 Tables that continue onto the next page now stitch back into one, instead of coming out as two half-tables with the header repeated in the middle.
Jul 2026 Math inside table cells survives now. A cell like ×10⁶ renders as a proper exponent instead of flattening to ×106, and notation like O(n) comes through as real math.
Jul 2026 Checkbox forms come through. Boxes that are ticked or empty now read as ☒ and ☐ instead of quietly vanishing.
Jul 2026 Dense financial tables in Japanese and Chinese hold their structure, not just their text. Tested against real annual-report filings.
Jun 2026 Scanned PDFs now get OCR automatically, no setting to flip. A scanned page comes back as text you can copy, and columned scans like magazine pages read in the right order. It still varies by scan, but it’s on for everyone now.
Jun 2026 Footnotes are now clickable. Hover a superscript to preview the note in place, click to jump down to it, and hit the return arrow to come right back to where you were. No more leftover [^1] markers cluttering the text.
Jun 2026 arXiv-style two-column papers now convert in reading order: left column first, then right, not interleaved.
Jun 2026 Structure holds together in more places. Tables of contents come out as clean “title, then page” lines instead of a row of dots, and bullet lists from slide decks keep their nesting instead of flattening into one paragraph.
Jun 2026 Formula rendering improved, so more complex notation comes through correctly, with cleaner fallback for the cases that still can’t render.
Jun 2026 Math holds together better. Exponents and subscripts now survive, symbols that used to vanish like ≈ and π now come through, and a formula that can’t render cleanly falls back to readable text instead of broken symbols.
Jun 2026 Images with transparent backgrounds no longer come out black. They sit on white, the way they looked in the PDF.
Jun 2026 Figure captions now stay with their image, centered underneath, instead of breaking off as a stray line of text.
Jun 2026 Markdown → PDF, with a set of clean export themes. The other direction, same page. Try it →
Jun 2026 A built-in sample PDF, so you can see the quality before bringing your own file.
Jun 2026 More reliable exports that wait for fonts to load, so text no longer comes out blank.
Jun 2026 Sample files no longer hang on slow connections (download progress + a timeout).

This page is only honest if you help keep it that way. If something broke, felt clunky, or you wish it did one more thing, that’s exactly what I want to hear.

Convert a PDF Tell me what to fix