Checkpoint 10: Character Mappings
Every character in your PDF must be mappable to a valid Unicode code point. This ensures that screen readers can accurately announce text and that text can be properly searched, copied, and processed.
What This Means
When you read a PDF, you see characters like letters, numbers, and symbols. But inside the PDF file, each character is stored as a code. That code must map to Unicode, the universal standard that defines every character used in modern computing.
When character mappings are missing or incorrect:
- Screen readers may speak gibberish or skip characters entirely
- Copying text produces garbled results
- Search functions cannot find words
- Translation tools fail to process the text
This often happens with:
- Symbol fonts (Wingdings, Symbol, custom icons)
- Mathematical notation using special fonts
- Logos and decorative text using custom fonts
- Legacy documents created with older tools
- PDFs from some design applications that do not embed Unicode mappings
Why It Matters
Text extraction is fundamental to how assistive technology works with PDFs:
- Screen readers extract text to speak aloud. Without proper mappings, users hear nothing or hear random characters
- Braille displays convert text to braille patterns. Unmapped characters produce meaningless output
- Search functionality depends on matching character codes. Users cannot find content
- Text-to-speech requires accurate character identification to pronounce words correctly
A document that looks perfect visually may be completely inaccessible if character mappings are broken. The visual appearance and the underlying text layer become disconnected.
Common Violations
The Matterhorn Protocol defines one failure condition for this checkpoint.
10-001: Character Code Cannot Be Mapped to Unicode (Machine Testable)
What's Wrong: One or more characters in the PDF use glyph codes that cannot be translated to Unicode characters. The PDF knows which shape to draw, but not what character it represents.
Symptoms:
- Copying text produces empty results, boxes, or random characters
- Screen readers skip content or announce incorrect letters
- Searching the PDF does not find words that are visually present
- PDF accessibility checkers report "Unicode mapping missing" or similar
Common Causes:
-
Symbol and Icon Fonts
- Wingdings, Webdings, Symbol font
- Custom icon fonts (FontAwesome used incorrectly)
- Fonts where characters are mapped to private use areas without ToUnicode
-
Decorative and Logo Fonts
- Custom fonts for logos that do not include Unicode mappings
- Display fonts with incomplete character tables
-
Mathematical Fonts
- Some equation editors use fonts without proper Unicode mappings
- Legacy math fonts that predate Unicode math support
-
PDF Creation Issues
- Older PDF creation tools that did not include ToUnicode CMaps
- Print-to-PDF drivers that lose character information
- Some CAD and graphics applications
How to Identify:
-
Copy and Paste Test:
- Select text in the PDF and copy it
- Paste into a text editor
- If the result is garbled, empty, or shows boxes, mappings are missing
-
Accessibility Checker:
- Run the PDF/UA check in Adobe Acrobat or PAC
- Look for errors about "Unicode mapping" or "character code"
-
Font Analysis:
- In Acrobat, go to File > Properties > Fonts
- Review fonts with names like "Symbol" or custom names
- Check if fonts show "Embedded" and "ToUnicode" information
How to Fix in Adobe Acrobat
Fixing character mapping issues often requires going back to the source document. However, some fixes are possible in Acrobat.
Using the Edit PDF Tool (Limited Fix)
For minor issues with embedded fonts:
- Go to Tools > Edit PDF
- Click on the problematic text
- Retype the text using a standard font
- The new text will have proper Unicode mappings
Limitation: This only works for small amounts of text and may change formatting.
Replacing Fonts with Preflight
- Go to Tools > Print Production > Preflight
- Search for "font" fixups
- Use the "Embed missing fonts" or "Convert to outlines" fixups
- Note: Converting to outlines removes searchability
Using the Touch Up Reading Order Tool
For symbol content that should be treated as images:
- Go to Tools > Accessibility > Reading Order
- Select the problematic symbol or icon
- Click Figure to tag it as an image
- Add alternative text that describes the symbol
Adding ToUnicode CMaps Programmatically
For advanced users with programming skills:
- Use a PDF library (like iText, PDFBox, or pdf-lib)
- Access the font resources in the PDF
- Add ToUnicode CMap entries for unmapped characters
- This requires knowledge of the font structure and intended characters
How to Fix in Microsoft Word
Prevention is far easier than remediation. Create accessible PDFs from the start.
Use Standard Fonts
-
Prefer fonts known to work well:
- Arial, Calibri, Times New Roman, Verdana
- Google Fonts with good Unicode support
- Any OpenType font with complete Unicode tables
-
Avoid:
- Symbol fonts for decorative bullets (use Unicode symbols instead)
- Custom icon fonts unless you verify accessibility
- Fonts with incomplete character sets
Replace Symbol Characters
Instead of using Wingdings or Symbol font:
- For bullets: Use Word's built-in bullet feature
- For checkmarks: Type the Unicode checkmark character (U+2713: ✓)
- For arrows: Use Unicode arrows (U+2192: →, U+2190: ←)
- For icons: Insert the symbol from Insert > Symbol using a Unicode font
Common Unicode Replacements
| Symbol Need | Wrong Way | Right Way |
|---|---|---|
| Checkmark | Wingdings "a" | Unicode ✓ (U+2713) |
| Arrow | Symbol font | Unicode → (U+2192) |
| Bullet | Custom font | Unicode bullet (U+2022) |
| Star | Wingdings | Unicode star (U+2605) |
| Heart | Wingdings | Unicode heart (U+2665) |
Handling Mathematical Content
For equations and formulas:
- Use Word's built-in Equation Editor (Insert > Equation)
- The equation editor uses Unicode math characters
- Avoid legacy Equation Editor 3.0 (creates images)
- For complex math, consider MathType, which has accessibility options
PDF Export Settings
- Go to File > Save As and choose PDF
- Click Options
- Under "Include non-printing information," check:
- Document structure tags for accessibility
- PDF/A compliance (if applicable)
- This helps ensure fonts are embedded with mappings
Testing Your Fix
Quick Copy-Paste Test
- Open the PDF
- Use Ctrl+A (or Cmd+A) to select all
- Copy and paste into a text editor
- Review for:
- Missing text
- Garbled characters
- Boxes or question marks
- Unexpected characters where symbols should be
Accessibility Checker Test
Adobe Acrobat:
- Go to Tools > Accessibility > Accessibility Check
- Run the full check
- Look for font-related errors
PAC (PDF Accessibility Checker):
- Open the PDF in PAC
- Run the PDF/UA check
- Review Checkpoint 10 results
- Check the "Fonts" section in the detailed report
Screen Reader Test
- Open the PDF in a screen reader
- Navigate through the document
- Listen for:
- Skipped content
- Garbled speech
- "Graphic" announcements where text should be
- Unexpected character names
Font Embedding Verification
In Adobe Acrobat:
- Go to File > Properties
- Click the Fonts tab
- Verify all fonts show:
- "Embedded" or "Embedded Subset"
- Check for symbol fonts that may lack Unicode
Validation Checklist
- All text copies correctly to a text editor
- No "Unicode mapping" errors in accessibility check
- Screen reader announces all text content correctly
- All fonts are embedded in the PDF
- Symbol fonts are replaced with Unicode or images with alt text
- Mathematical content uses proper Unicode math characters
Special Cases
Scanned Documents
OCR'd documents may have character mapping issues:
- Use high-quality OCR software with Unicode support
- Proofread OCR output for recognition errors
- Consider Adobe Acrobat's "Enhance Scans" feature
- Verify copy-paste works after OCR
Multi-language Documents
Documents with multiple scripts need fonts that support all characters:
- Ensure fonts include all required Unicode ranges
- Test copy-paste for all languages in the document
- Verify screen readers handle language switches
Legacy PDFs
Older PDFs often lack Unicode mappings:
- If source files exist, regenerate the PDF with modern tools
- Use Acrobat's PDF Optimizer to update font mappings
- Consider OCR as a last resort (with careful proofreading)
Additional Resources
Official Standards and Guidelines
- W3C WCAG 2.1 Success Criterion 1.3.1: Info and Relationships
- PDF Association Matterhorn Protocol 1.02
- ISO 14289-1: PDF/UA-1 Standard
Technical References
Tools
- PAC (PDF Accessibility Checker) - Free PDF/UA validation
- veraPDF - Open source PDF validator
- FontForge - Open source font editor for examining font tables
This documentation is based on the Matterhorn Protocol 1.02, the definitive reference for PDF/UA validation. For the most current information, consult the PDF Association and W3C WCAG guidelines.