How PDF Encryption Works
PDF encryption is more nuanced than "password-protect this file." The PDF specification defines an entire cryptographic sub-system with cipher choices, key derivation, permission flags, and crypt filters. This guide walks through the internals so you understand exactly what a password-protected PDF is doing, and where its real strength and weakness live.
What you'll learn
How the encryption dictionary is structured in the PDF itself, why RC4 versions are weak and how AES-256 closes that gap, why the owner password is mostly theater, and how the revision number (R=2 through R=6) determines everything about practical recoverability.
The encryption dictionary: where it all starts
Every encrypted PDF contains an Encrypt dictionary in its trailer. The reader uses this dictionary to figure out how to decrypt the rest of the file. A simplified example:
<<
/Filter /Standard
/V 5
/R 6
/Length 256
/CF << /StdCF << /CFM /AESV3 /Length 32 /AuthEvent /DocOpen >> >>
/StmF /StdCF
/StrF /StdCF
/P -1340
/U <...48 bytes...>
/O <...48 bytes...>
/UE <...32 bytes...>
/OE <...32 bytes...>
/Perms <...16 bytes...>
>>Every field matters:
- V (algorithm version): declares which version of the encryption algorithm. V=1 is 40-bit RC4, V=2 is larger RC4, V=4 introduces crypt filters, V=5 is AES-256.
- R (revision): matches V but tracks standard security handler revisions. R=2 for 40-bit, R=3 for 128-bit RC4, R=4 for AES-128, R=5 for first AES-256, R=6 for current AES-256.
- Length: key length in bits (40, 128, or 256).
- CF, StmF, StrF: crypt filters. StmF handles streams; StrF handles strings. Both typically reference the same filter.
- P: permission bitmask. Negative numbers are normal because the top bits are typically set.
- U, O: user and owner password hashes, used for password verification.
- UE, OE, Perms: added in R=6 to encrypt the key itself and bind the permissions cryptographically.
The key derivation chain
The password you type is never used directly as an encryption key. It is fed through a key derivation function (KDF) along with other inputs. The KDF has evolved dramatically across revisions.
R=2 (40-bit RC4)
Password is padded to 32 bytes with a fixed padding string, concatenated with the owner password hash, the permission flags, and the document's file ID. The whole blob is hashed with MD5 exactly once. The first 5 bytes of the MD5 output become the 40-bit RC4 key.
This is catastrophically weak. The key space is 240 regardless of password complexity. Any attacker with a few hours of GPU time can enumerate every possible key. That is why 40-bit PDFs are guaranteed-recoverable.
R=3 (128-bit RC4) and R=4 (AES-128)
Same starting recipe but the MD5 is iterated 50 times. That slows down password guessing by a factor of 50 (still fast on GPUs). Output is truncated to 128 bits. AES-128 replaces RC4 as the cipher but the KDF is nearly identical.
At this tier the key length is secure (128 bits is beyond brute force), so attacks shift to guessing the password. Dictionary attacks with rule mangling succeed on most human-chosen passwords here.
R=5 (early AES-256, Acrobat 9)
Adobe's first AES-256 implementation. Password is run through SHA-256 with a per-document salt but only once. The single SHA-256 is fast enough that this revision is significantly weaker than its successor.
R=5 was quickly replaced because researchers showed password guessing was roughly as fast as on R=4 despite the stronger cipher. The cipher strength doesn't matter when the key derivation is cheap.
R=6 (current AES-256, PDF 2.0)
The current standard. Password is normalized with SASLprep (handles Unicode), then run through an iterative construction based on SHA-256, SHA-384, and SHA-512. The algorithm runs at least 64 iterations, often hundreds depending on intermediate hash outputs. Each iteration uses AES encryption internally, making it hard to accelerate on GPUs relative to simple hash chains.
The net effect is that every password guess costs roughly 10,000 times what it costs on R=2 through R=4. That single change is what makes modern PDFs hard to crack.
Stream and string encryption
Once the encryption key is derived, the PDF reader must apply it to every encrypted object. In PDF, encrypted content is split between streams (large binary blobs like page content, images, embedded files) and strings (short text values like metadata).
The crypt filter system (introduced in V=4) lets a PDF specify different encryption for each category. In practice, StmF and StrF almost always point at the same filter, but the mechanism allows, for example, encrypted streams with unencrypted metadata, useful for certain signed-document workflows.
For RC4-based revisions, each object also gets a per-object key: the document key is mixed with the object number and generation number, hashed, and truncated. This prevents two identical plaintext strings in different parts of the file from producing the same ciphertext. AES-based revisions skip this mixing and instead use a per-object random IV.
User password vs owner password
Here is the surprise that trips up most users: both passwords derive the same encryption key. The difference is only in the verification step.
- User password: entered to open the document. The reader verifies it against the /U entry, then derives the key and decrypts.
- Owner password: gives administrative control. The reader verifies it against the /O entry, then either derives the same key directly (older revisions) or uses an escrow mechanism (R=6) to recover the key that would normally require the user password.
Because both paths produce the same key, a PDF encrypted with only an owner password (no user password) is effectively using a well-known default for /U. Any compliant PDF library can compute that default, decrypt the file, and then choose to honor or ignore the permission flags.
This is why restrictions (print, copy, edit) are so easy to strip: the flags live in /P and /Perms, but once the document is decrypted, they have no cryptographic teeth. Every non-Adobe viewer that chooses to ignore them can do so. For more on this, see owner vs open password.
Why permission flags are not real security
The /P field is a 32-bit bitmask declaring which operations the document "allows." The bits cover:
| Bit | Meaning |
|---|---|
| 3 | |
| 4 | Modify contents |
| 5 | Copy text and graphics |
| 6 | Add or modify annotations |
| 9 | Fill form fields |
| 10 | Extract for accessibility |
| 11 | Assemble (insert, rotate, delete pages) |
| 12 | High-quality print |
In R=2 through R=5, these bits are only weakly bound to the encryption key (via their inclusion in the /U hash). In R=6, they are bound more tightly through the /Perms entry, which is an AES-encrypted blob containing the permission bits. However, the binding only means that a compliant reader can verify the flags haven't been tampered with; it does not mean a non-compliant reader cannot ignore them once the document is decrypted.
The practical takeaway: permission passwords are a courtesy request to well-behaved software, not a security boundary.
What actually attacks PDF encryption
Understanding the KDF tells you what attacks work:
Key-space brute force (R=2 only)
On 40-bit RC4, the entire 240 key space is reachable. Attacks ignore the password and enumerate keys directly. A single modern GPU covers it in hours.
Dictionary + rule mangling (R=3, R=4, R=5)
Load a large wordlist (RockYou, CrackStation, HashesOrg). Apply transformation rules (capitalize, append digits, substitute characters). Hash each candidate through the KDF. Most human-chosen passwords fall to this approach in minutes.
Mask attacks
If you know structural hints (starts with capital, 8 characters, ends with digits), a mask attack enumerates only candidates matching the pattern. Collapses the search space by orders of magnitude.
Generic brute force (R=6)
Straight alphabet enumeration. Works on R=6 only for passwords up to about 8 characters mixed case plus digits. Above that, even a cluster of high-end GPUs cannot cover the space in a commercially reasonable time.
For realistic outcomes broken down by encryption revision, see PDF password recovery success rates.
Unicode and the SASLprep gotcha
PDF 1.7 and earlier encoded passwords as PDFDocEncoding, which effectively limited them to Latin characters. PDF 2.0 (R=6) switched to UTF-8 after applying SASLprep normalization. SASLprep maps Unicode characters into a canonical form, so that, for example, a composed "é" and a decomposed "e + combining acute" produce the same bytes.
This matters in password recovery: attempting a candidate without SASLprep on an R=6 PDF will fail even if the literal character sequence is correct. Most modern crackers apply SASLprep automatically, but older tools do not, which is one reason they fail on modern files.
Digital signatures and encryption interaction
A PDF can be both encrypted and digitally signed. The signature covers a ByteRange of the file's bytes, including the encryption dictionary. Decrypting the file and re-saving it with a different dictionary (even if functionally equivalent) will invalidate the signature because the signed bytes have changed.
This is why stripping encryption from a signed PDF breaks the signature. Regulatory and legal contexts where both signatures and encryption apply (some invoicing and audit use cases) typically require the encryption to stay in place and the signature to be verified with the password known.
Common misconception
"256-bit encryption" does not mean a PDF is 256 times stronger than 40-bit. It means the encryption key is 256 bits long, which rules out brute-forcing the key space directly. The practical attack surface is still the password, and password strength is what decides whether the file is recoverable in practice.
The lifetime of an opened PDF
Step by step, here's what happens when you enter the password and click Open:
- The reader parses the trailer and locates the Encrypt dictionary.
- It reads V, R, Length, and the crypt filter map.
- Your typed password is SASLprep-normalized (R=6 only), then padded or hashed as the revision requires.
- The padded password is combined with /O, /P, and the first element of the /ID array.
- MD5 or SHA is applied, iterated the required number of times, to derive the document encryption key.
- The reader computes the /U check value using the derived key and compares to the stored /U. If it matches, the password is correct.
- For each encrypted stream or string, the reader uses the document key (mixed with object number for RC4 revisions, or with a stored IV for AES) to decrypt on demand.
- Permission flags are read from /P (or /Perms in R=6) and used to configure the UI (greying out Print, for example).
Nothing after step 6 is cryptographic in the sense of blocking access. Once the key is derived, the rest is mechanical decryption and UI behavior.
Frequently asked questions
Why does Adobe still allow 40-bit RC4 in new PDFs?
For compatibility with legacy readers. Adobe warns against it in Acrobat's UI but the spec retains R=2 support so that old document workflows can still function. You should never create a 40-bit RC4 PDF today; R=6 is the only realistic choice.
Is PDF encryption FIPS-compliant?
AES-256 (R=6) uses FIPS-approved cryptographic primitives (AES in CBC mode, SHA-256/384/512). The full PDF encryption handler has not been formally FIPS 140-certified as an independent module, but the underlying primitives are. For strict FIPS environments, consult your compliance team.
Why do older recovery tools fail on modern PDFs?
They typically don't implement SASLprep, miss the extended AES-256 iterations, or run only CPU-based attacks that are impractically slow against R=6's KDF. Modern tools (hashcat, john the ripper, and commercial services built on them) handle all revisions correctly.
Can a PDF use both RC4 and AES?
In theory yes, via the CF entry which lets each crypt filter specify its own algorithm. In practice, major writers (Acrobat, LibreOffice, Word) pick one algorithm per document. Mixed-cipher PDFs are rare and not widely tested.
How do I tell which revision my PDF uses?
Use qpdf --show-encryption filename.pdf. It prints V, R, Length, and the permission summary. Our upload analyzer does the same check locally in your browser. See PDF encryption types for the full breakdown.
Put the theory to work
If your file is R=2 or R=3, recovery is practical. See 40-bit PDF recovery and the free PDF password check. If it's R=6, read the success rates page before committing time or money.
Analyze your specific file
Upload the PDF on the home page. The analyzer reads the encryption dictionary locally in your browser and reports V, R, and realistic recovery odds, with no data leaving your device until you choose to start a recovery job.