Image-based PDFs

Scanned PDF Password Recovery, Explained Honestly

Users often assume a scanned PDF is a different animal — that its image content needs some special OCR-aware cracker. It does not. PDF encryption treats image streams, text streams, and vector graphics identically. What changes is everything after unlocking, not the unlock itself.

The one-sentence summary

Password recovery for a scanned PDF is identical to any other PDF. OCR only becomes relevant after the file is already unlocked and you want searchable text.

Why the container matters, not the content

A PDF is a container format. Its encryption wraps the object streams that make up the document: text, images, fonts, forms, annotations. When you set a password, Acrobat or the scanner software encrypts those streams with RC4 or AES based on the PDF version you chose. The cipher does not know or care whether a particular stream holds a paragraph of Helvetica or a 300 DPI TIFF of a contract page.

From the attacker's perspective, this means the work is exactly the same. We test candidate passwords against the encrypted file header, derive a key, and attempt to decrypt a small probe section. The outcome only depends on the password and the cipher, never on whether the underlying content is text or image. Our encryption internals article walks through the exact key derivation.

Where the confusion comes from

Three common misconceptions drive the belief that scanned PDFs are special:

1. "I cannot search or copy text, so recovery must be different"

Searching and copying require a text layer. A scanned PDF without OCR has no text layer, only images. That has nothing to do with encryption. Unlocking the file does not add a text layer; you need an OCR pass afterward.

2. "Image files are bigger, so it takes longer to crack"

File size has no bearing on password recovery speed. The hash we attack is derived from a small portion of the file header and a fixed key derivation function. A 2 KB encrypted PDF and a 200 MB scanned PDF are attacked at the same rate.

3. "Scanners use special encryption"

They use the same PDF 1.4/1.6/1.7 specifications as everyone else. In fact, many office scanners default to weaker encryption for backwards compatibility, which makes recovery easier than a modern Acrobat AES-256 file.

What encryption scanners typically use

Scanner familyDefault encryptionRecovery outlook
Xerox multifunction (older firmware)40-bit RC4Guaranteed recovery
Canon imageRUNNER128-bit RC4 (configurable)Very likely if password is human
Ricoh / Lanier128-bit RC4Likely with dictionary + rules
HP Enterprise scanAES-128Depends on password strength
Mobile apps (iOS/Android scan)AES-128 or AES-256Harder; depends on entropy

Two patterns stand out. First, office-grade scanners skew toward older encryption because they need to produce files that open cleanly in every reader a client might have, including Windows XP Preview. Second, consumer mobile scanning apps are more likely to use modern AES, because they only care about recent iOS and Android PDF viewers. The 40-bit recovery page explains why older RC4 is essentially a solved problem.

The correct order: decrypt first, OCR second

Every OCR engine — Tesseract, ABBYY, Adobe's built-in, cloud services from Google or AWS — expects to read rendered page pixels. While the PDF is encrypted, those pixels are locked inside an unreadable byte stream. No OCR engine can pull text from ciphertext. The correct pipeline is:

  1. Identify the encryption version.
  2. Recover the password using an appropriate method.
  3. Decrypt the PDF to a plain unlocked copy (qpdf or equivalent).
  4. Run OCR on the unlocked copy to add a text layer.
  5. Optionally re-encrypt the result if you need to store it protected again.

Trying to shortcut steps 1 and 2 by OCR'ing a screenshot of the password prompt is, astonishingly, a suggestion that circulates online. It does not work. The password prompt is a UI element rendered by the viewer, not a representation of the document content.

OCR options after unlocking

ToolPlatformBest for
OCRmyPDF (Tesseract backend)Linux, macOS, WSLFree, scriptable, preserves layout
ABBYY FineReaderWindows, macOSHighest accuracy, paid
Adobe Acrobat Pro OCRCross-platformIntegrated into existing Adobe workflow
macOS Preview "Smart Text"macOSQuick one-off OCR, no install
Google Drive uploadWebZero-install, but uploads your file

For anything sensitive, OCRmyPDF on your own machine is the best default. It keeps files local, integrates cleanly with qpdf for decryption-then-OCR pipelines, and supports dozens of languages.

Signed scanned documents

A digitally signed scan adds a wrinkle. The signature validates the byte contents of the signed revision of the PDF. Any change — including decrypting and re-encrypting, or running OCR — modifies the byte stream and invalidates the signature. If the signature is legally important, preserve an untouched original and work on copies.

Keep the sealed original

Archive a copy of the encrypted signed file. This is your evidence of the original signature.

Work on an unlocked copy

Decrypt and OCR a separate file that is clearly labeled as a working copy. Never distribute it as the signed version.

Document the chain of custody

If this is for legal or compliance purposes, record every transformation: who unlocked it, when, and with what tool.

Prefer incremental signing

If you must re-sign the unlocked copy, use an incremental signature so the new signature only covers the new revision.

Multi-page scans and large file tips

Scanned PDFs can grow to hundreds of megabytes. A few practical notes:

  • Upload size: Recovery services only need the hash extracted from the file header. For very large scans, use a tool that extracts the hash locally and uploads only the small hash string.
  • Compression mode: JPEG2000 scans compress smaller than JBIG2 or CCITT in most cases, but this does not affect recovery time — only transfer time.
  • Password reuse: Office scanners often use the same shared password per department. If you recover one file, test that password on the rest of your batch before attacking each one separately.

The batch case ties directly into our batch removal guide, where a single known password can unlock an entire folder in seconds.

Restrictions vs open password still applies

If the scanned PDF opens in any viewer without a prompt and only blocks printing or copying, you do not need recovery at all. Use a browser Print-to-PDF pipeline to export a clean copy. If it will not open, that is a true recovery job.

Read next

For encryption-specific advice, see PDF encryption types. For realistic outcomes, see success rate data.