Skip to main content
Checkpoint 10High Priority1 failure condition

Checkpoint 10: Character Mappings

All characters in the PDF must map to valid Unicode code points so that text can be accurately extracted and read by assistive technologies.

Related WCAG:1.3.1

Checkpoint 10: Character Mappings

Every character in your PDF must be mappable to a valid Unicode code point. This ensures that screen readers can accurately announce text and that text can be properly searched, copied, and processed.

What This Means

When you read a PDF, you see characters like letters, numbers, and symbols. But inside the PDF file, each character is stored as a code. That code must map to Unicode, the universal standard that defines every character used in modern computing.

When character mappings are missing or incorrect:

  • Screen readers may speak gibberish or skip characters entirely
  • Copying text produces garbled results
  • Search functions cannot find words
  • Translation tools fail to process the text

This often happens with:

  • Symbol fonts (Wingdings, Symbol, custom icons)
  • Mathematical notation using special fonts
  • Logos and decorative text using custom fonts
  • Legacy documents created with older tools
  • PDFs from some design applications that do not embed Unicode mappings

Why It Matters

Text extraction is fundamental to how assistive technology works with PDFs:

  • Screen readers extract text to speak aloud. Without proper mappings, users hear nothing or hear random characters
  • Braille displays convert text to braille patterns. Unmapped characters produce meaningless output
  • Search functionality depends on matching character codes. Users cannot find content
  • Text-to-speech requires accurate character identification to pronounce words correctly

A document that looks perfect visually may be completely inaccessible if character mappings are broken. The visual appearance and the underlying text layer become disconnected.

Common Violations

The Matterhorn Protocol defines one failure condition for this checkpoint.

10-001: Character Code Cannot Be Mapped to Unicode (Machine Testable)

What's Wrong: One or more characters in the PDF use glyph codes that cannot be translated to Unicode characters. The PDF knows which shape to draw, but not what character it represents.

Symptoms:

  • Copying text produces empty results, boxes, or random characters
  • Screen readers skip content or announce incorrect letters
  • Searching the PDF does not find words that are visually present
  • PDF accessibility checkers report "Unicode mapping missing" or similar

Common Causes:

  1. Symbol and Icon Fonts

    • Wingdings, Webdings, Symbol font
    • Custom icon fonts (FontAwesome used incorrectly)
    • Fonts where characters are mapped to private use areas without ToUnicode
  2. Decorative and Logo Fonts

    • Custom fonts for logos that do not include Unicode mappings
    • Display fonts with incomplete character tables
  3. Mathematical Fonts

    • Some equation editors use fonts without proper Unicode mappings
    • Legacy math fonts that predate Unicode math support
  4. PDF Creation Issues

    • Older PDF creation tools that did not include ToUnicode CMaps
    • Print-to-PDF drivers that lose character information
    • Some CAD and graphics applications

How to Identify:

  1. Copy and Paste Test:

    • Select text in the PDF and copy it
    • Paste into a text editor
    • If the result is garbled, empty, or shows boxes, mappings are missing
  2. Accessibility Checker:

    • Run the PDF/UA check in Adobe Acrobat or PAC
    • Look for errors about "Unicode mapping" or "character code"
  3. Font Analysis:

    • In Acrobat, go to File > Properties > Fonts
    • Review fonts with names like "Symbol" or custom names
    • Check if fonts show "Embedded" and "ToUnicode" information

How to Fix in Adobe Acrobat

Fixing character mapping issues often requires going back to the source document. However, some fixes are possible in Acrobat.

Using the Edit PDF Tool (Limited Fix)

For minor issues with embedded fonts:

  1. Go to Tools > Edit PDF
  2. Click on the problematic text
  3. Retype the text using a standard font
  4. The new text will have proper Unicode mappings

Limitation: This only works for small amounts of text and may change formatting.

Replacing Fonts with Preflight

  1. Go to Tools > Print Production > Preflight
  2. Search for "font" fixups
  3. Use the "Embed missing fonts" or "Convert to outlines" fixups
  4. Note: Converting to outlines removes searchability

Using the Touch Up Reading Order Tool

For symbol content that should be treated as images:

  1. Go to Tools > Accessibility > Reading Order
  2. Select the problematic symbol or icon
  3. Click Figure to tag it as an image
  4. Add alternative text that describes the symbol

Adding ToUnicode CMaps Programmatically

For advanced users with programming skills:

  1. Use a PDF library (like iText, PDFBox, or pdf-lib)
  2. Access the font resources in the PDF
  3. Add ToUnicode CMap entries for unmapped characters
  4. This requires knowledge of the font structure and intended characters

How to Fix in Microsoft Word

Prevention is far easier than remediation. Create accessible PDFs from the start.

Use Standard Fonts

  1. Prefer fonts known to work well:

    • Arial, Calibri, Times New Roman, Verdana
    • Google Fonts with good Unicode support
    • Any OpenType font with complete Unicode tables
  2. Avoid:

    • Symbol fonts for decorative bullets (use Unicode symbols instead)
    • Custom icon fonts unless you verify accessibility
    • Fonts with incomplete character sets

Replace Symbol Characters

Instead of using Wingdings or Symbol font:

  1. For bullets: Use Word's built-in bullet feature
  2. For checkmarks: Type the Unicode checkmark character (U+2713: ✓)
  3. For arrows: Use Unicode arrows (U+2192: →, U+2190: ←)
  4. For icons: Insert the symbol from Insert > Symbol using a Unicode font

Common Unicode Replacements

Symbol NeedWrong WayRight Way
CheckmarkWingdings "a"Unicode ✓ (U+2713)
ArrowSymbol fontUnicode → (U+2192)
BulletCustom fontUnicode bullet (U+2022)
StarWingdingsUnicode star (U+2605)
HeartWingdingsUnicode heart (U+2665)

Handling Mathematical Content

For equations and formulas:

  1. Use Word's built-in Equation Editor (Insert > Equation)
  2. The equation editor uses Unicode math characters
  3. Avoid legacy Equation Editor 3.0 (creates images)
  4. For complex math, consider MathType, which has accessibility options

PDF Export Settings

  1. Go to File > Save As and choose PDF
  2. Click Options
  3. Under "Include non-printing information," check:
    • Document structure tags for accessibility
    • PDF/A compliance (if applicable)
  4. This helps ensure fonts are embedded with mappings

Testing Your Fix

Quick Copy-Paste Test

  1. Open the PDF
  2. Use Ctrl+A (or Cmd+A) to select all
  3. Copy and paste into a text editor
  4. Review for:
    • Missing text
    • Garbled characters
    • Boxes or question marks
    • Unexpected characters where symbols should be

Accessibility Checker Test

Adobe Acrobat:

  1. Go to Tools > Accessibility > Accessibility Check
  2. Run the full check
  3. Look for font-related errors

PAC (PDF Accessibility Checker):

  1. Open the PDF in PAC
  2. Run the PDF/UA check
  3. Review Checkpoint 10 results
  4. Check the "Fonts" section in the detailed report

Screen Reader Test

  1. Open the PDF in a screen reader
  2. Navigate through the document
  3. Listen for:
    • Skipped content
    • Garbled speech
    • "Graphic" announcements where text should be
    • Unexpected character names

Font Embedding Verification

In Adobe Acrobat:

  1. Go to File > Properties
  2. Click the Fonts tab
  3. Verify all fonts show:
    • "Embedded" or "Embedded Subset"
    • Check for symbol fonts that may lack Unicode

Validation Checklist

  • All text copies correctly to a text editor
  • No "Unicode mapping" errors in accessibility check
  • Screen reader announces all text content correctly
  • All fonts are embedded in the PDF
  • Symbol fonts are replaced with Unicode or images with alt text
  • Mathematical content uses proper Unicode math characters

Special Cases

Scanned Documents

OCR'd documents may have character mapping issues:

  1. Use high-quality OCR software with Unicode support
  2. Proofread OCR output for recognition errors
  3. Consider Adobe Acrobat's "Enhance Scans" feature
  4. Verify copy-paste works after OCR

Multi-language Documents

Documents with multiple scripts need fonts that support all characters:

  1. Ensure fonts include all required Unicode ranges
  2. Test copy-paste for all languages in the document
  3. Verify screen readers handle language switches

Legacy PDFs

Older PDFs often lack Unicode mappings:

  1. If source files exist, regenerate the PDF with modern tools
  2. Use Acrobat's PDF Optimizer to update font mappings
  3. Consider OCR as a last resort (with careful proofreading)

Additional Resources

Official Standards and Guidelines

Technical References

Tools


This documentation is based on the Matterhorn Protocol 1.02, the definitive reference for PDF/UA validation. For the most current information, consult the PDF Association and W3C WCAG guidelines.

Scan Your PDFs for Accessibility Issues

Beacon automatically detects PDF accessibility violations and shows you exactly how to fix them.

Start Free Scan