Checkpoint 10: Character Mappings

Every character in your PDF must be mappable to a valid Unicode code point. This ensures that screen readers can accurately announce text and that text can be properly searched, copied, and processed.

What This Means

When you read a PDF, you see characters like letters, numbers, and symbols. But inside the PDF file, each character is stored as a code. That code must map to Unicode, the universal standard that defines every character used in modern computing.

When character mappings are missing or incorrect:

Screen readers may speak gibberish or skip characters entirely
Copying text produces garbled results
Search functions cannot find words
Translation tools fail to process the text

This often happens with:

Symbol fonts (Wingdings, Symbol, custom icons)
Mathematical notation using special fonts
Logos and decorative text using custom fonts
Legacy documents created with older tools
PDFs from some design applications that do not embed Unicode mappings

Why It Matters

Text extraction is fundamental to how assistive technology works with PDFs:

Screen readers extract text to speak aloud. Without proper mappings, users hear nothing or hear random characters
Braille displays convert text to braille patterns. Unmapped characters produce meaningless output
Search functionality depends on matching character codes. Users cannot find content
Text-to-speech requires accurate character identification to pronounce words correctly

A document that looks perfect visually may be completely inaccessible if character mappings are broken. The visual appearance and the underlying text layer become disconnected.

Common Violations

The Matterhorn Protocol defines one failure condition for this checkpoint.

10-001: Character Code Cannot Be Mapped to Unicode (Machine Testable)

What's Wrong: One or more characters in the PDF use glyph codes that cannot be translated to Unicode characters. The PDF knows which shape to draw, but not what character it represents.

Symptoms:

Copying text produces empty results, boxes, or random characters
Screen readers skip content or announce incorrect letters
Searching the PDF does not find words that are visually present
PDF accessibility checkers report "Unicode mapping missing" or similar

Common Causes:

Symbol and Icon Fonts
- Wingdings, Webdings, Symbol font
- Custom icon fonts (FontAwesome used incorrectly)
- Fonts where characters are mapped to private use areas without ToUnicode
Decorative and Logo Fonts
- Custom fonts for logos that do not include Unicode mappings
- Display fonts with incomplete character tables
Mathematical Fonts
- Some equation editors use fonts without proper Unicode mappings
- Legacy math fonts that predate Unicode math support
PDF Creation Issues
- Older PDF creation tools that did not include ToUnicode CMaps
- Print-to-PDF drivers that lose character information
- Some CAD and graphics applications

How to Identify:

Copy and Paste Test:
- Select text in the PDF and copy it
- Paste into a text editor
- If the result is garbled, empty, or shows boxes, mappings are missing
Accessibility Checker:
- Run the PDF/UA check in Adobe Acrobat or PAC
- Look for errors about "Unicode mapping" or "character code"
Font Analysis:
- In Acrobat, go to File > Properties > Fonts
- Review fonts with names like "Symbol" or custom names
- Check if fonts show "Embedded" and "ToUnicode" information

How to Fix in Adobe Acrobat

Fixing character mapping issues often requires going back to the source document. However, some fixes are possible in Acrobat.

Using the Edit PDF Tool (Limited Fix)

For minor issues with embedded fonts:

Go to Tools > Edit PDF
Click on the problematic text
Retype the text using a standard font
The new text will have proper Unicode mappings

Limitation: This only works for small amounts of text and may change formatting.

Replacing Fonts with Preflight

Go to Tools > Print Production > Preflight
Search for "font" fixups
Use the "Embed missing fonts" or "Convert to outlines" fixups
Note: Converting to outlines removes searchability

Using the Touch Up Reading Order Tool

For symbol content that should be treated as images:

Go to Tools > Accessibility > Reading Order
Select the problematic symbol or icon
Click Figure to tag it as an image
Add alternative text that describes the symbol

Adding ToUnicode CMaps Programmatically

For advanced users with programming skills:

Use a PDF library (like iText, PDFBox, or pdf-lib)
Access the font resources in the PDF
Add ToUnicode CMap entries for unmapped characters
This requires knowledge of the font structure and intended characters

How to Fix in Microsoft Word

Prevention is far easier than remediation. Create accessible PDFs from the start.

Use Standard Fonts

Prefer fonts known to work well:
- Arial, Calibri, Times New Roman, Verdana
- Google Fonts with good Unicode support
- Any OpenType font with complete Unicode tables
Avoid:
- Symbol fonts for decorative bullets (use Unicode symbols instead)
- Custom icon fonts unless you verify accessibility
- Fonts with incomplete character sets

Replace Symbol Characters

Instead of using Wingdings or Symbol font:

For bullets: Use Word's built-in bullet feature
For checkmarks: Type the Unicode checkmark character (U+2713: ✓)
For arrows: Use Unicode arrows (U+2192: →, U+2190: ←)
For icons: Insert the symbol from Insert > Symbol using a Unicode font

Common Unicode Replacements

Symbol Need	Wrong Way	Right Way
Checkmark	Wingdings "a"	Unicode ✓ (U+2713)
Arrow	Symbol font	Unicode → (U+2192)
Bullet	Custom font	Unicode bullet (U+2022)
Star	Wingdings	Unicode star (U+2605)
Heart	Wingdings	Unicode heart (U+2665)

Handling Mathematical Content

For equations and formulas:

Use Word's built-in Equation Editor (Insert > Equation)
The equation editor uses Unicode math characters
Avoid legacy Equation Editor 3.0 (creates images)
For complex math, consider MathType, which has accessibility options

PDF Export Settings

Go to File > Save As and choose PDF
Click Options
Under "Include non-printing information," check:
- Document structure tags for accessibility
- PDF/A compliance (if applicable)
This helps ensure fonts are embedded with mappings

Testing Your Fix

Quick Copy-Paste Test

Open the PDF
Use Ctrl+A (or Cmd+A) to select all
Copy and paste into a text editor
Review for:
- Missing text
- Garbled characters
- Boxes or question marks
- Unexpected characters where symbols should be

Accessibility Checker Test

Adobe Acrobat:

Go to Tools > Accessibility > Accessibility Check
Run the full check
Look for font-related errors

PAC (PDF Accessibility Checker):

Open the PDF in PAC
Run the PDF/UA check
Review Checkpoint 10 results
Check the "Fonts" section in the detailed report

Screen Reader Test

Open the PDF in a screen reader
Navigate through the document
Listen for:
- Skipped content
- Garbled speech
- "Graphic" announcements where text should be
- Unexpected character names

Font Embedding Verification

In Adobe Acrobat:

Go to File > Properties
Click the Fonts tab
Verify all fonts show:
- "Embedded" or "Embedded Subset"
- Check for symbol fonts that may lack Unicode

Validation Checklist

All text copies correctly to a text editor
No "Unicode mapping" errors in accessibility check
Screen reader announces all text content correctly
All fonts are embedded in the PDF
Symbol fonts are replaced with Unicode or images with alt text
Mathematical content uses proper Unicode math characters

Special Cases

Scanned Documents

OCR'd documents may have character mapping issues:

Use high-quality OCR software with Unicode support
Proofread OCR output for recognition errors
Consider Adobe Acrobat's "Enhance Scans" feature
Verify copy-paste works after OCR

Multi-language Documents

Documents with multiple scripts need fonts that support all characters:

Ensure fonts include all required Unicode ranges
Test copy-paste for all languages in the document
Verify screen readers handle language switches

Legacy PDFs

Older PDFs often lack Unicode mappings:

If source files exist, regenerate the PDF with modern tools
Use Acrobat's PDF Optimizer to update font mappings
Consider OCR as a last resort (with careful proofreading)

Additional Resources

Official Standards and Guidelines

Technical References

Tools

PAC (PDF Accessibility Checker) - Free PDF/UA validation
veraPDF - Open source PDF validator
FontForge - Open source font editor for examining font tables

This documentation is based on the Matterhorn Protocol 1.02, the definitive reference for PDF/UA validation. For the most current information, consult the PDF Association and W3C WCAG guidelines.