Converting PNG Images to Editable Documents with Abbyy FineReader 16

Introduction

In today’s digital world, working with different file formats is a common task. One such scenario involves converting a PNG image into an editable document. In this blog post, we’ll walk you through the process of using Abbyy FineReader 16 to achieve this task efficiently. Join us as we explore the Accessibility Guy channel and learn how to convert a PNG file with table structures and checkboxes into a searchable PDF or Word document.

Video overview

Step 1: Prepare your PNG file

For this demonstration, we’ll use a PNG file containing a table with rows, columns, and checkbox images. This file represents a common scenario where one needs to extract the data from a static image format and convert it into an editable document.

Step 2: Open the PNG file in Abbyy FineReader 16

Launch Abbyy FineReader 16 and select the OCR Editor. Open the PNG file, and the software will automatically run Optical Character Recognition (OCR) on the image.

Step 3: Analyze the table structure

Zoom in on the image panel and check the text panel. If Abbyy doesn’t recognize the table structure correctly, delete all text areas by right-clicking in the image panel. Use the table tool to draw a box around the table and click on “Analyze Table Structure” located in the bottom corner of the window. This action will analyze the table and apply rows and columns accordingly.

Step 4: Recognize the text

Click on the “Recognize” button to OCR the file. If the document is not in English, change the recognition language accordingly, and run the recognition process again. Visually spot-check the text for accuracy.

Step 5: Adjust the table content

In case Abbyy replaces certain icons or images with text, manually type the appropriate text in the corresponding cells. For instance, if the original file had checkboxes indicating whether an item met certain specifications, replace the text with the translated phrase for “meets specifications” or “does not meet specifications.”

Step 6: Export the file as a searchable PDF

Change the export settings to “Searchable PDF” and save the file. If the image quality is blurry, go to Tools > Options > PDF, and adjust the image quality settings. Disable the MRC compression if necessary to improve the text quality.

Step 7: Save the file as a Microsoft Word document

If you prefer working with a Word document, save the file in that format. Keep in mind that the alternate texts might appear in the document, depending on how the file was zoned in Abbyy. To retain the checkboxes, apply picture zones to the cells and manually insert the icons as a table.

Conclusion

Abbyy FineReader 16 makes it simple to convert a PNG file, even with complex table structures, into an editable PDF or Word document. With this step-by-step guide, you can easily transform static images into dynamic, editable documents for further analysis and collaboration. As always, thank you for reading, and stay tuned for more tips and tricks from the Accessibility Guy channel!

Manually Tagging Lists within PDFs for Accessibility

Introduction: Why Manually Tagging Lists in PDFs Matters for Accessibility

Ensuring the accessibility of documents is essential for creating inclusive digital experiences. In this blog post, we will guide you through the process of manually tagging lists in PDFs using Adobe Acrobat Pro, creating a well-structured and accessible document.

Want to learn more about PDF Accessibility?

List tag break down

  1. List Parent Tag <L>
  2. List Item Child Tag <LI>
  3. Label <Lbl>
  4. List Body child Tag <LBody>
  5. Contents of First list item
  6. List item content on page
List Parent Tag <L>

List Item Child Tag <LI>

Label <Lbl>

List Body child Tag <LBody>

Contents of First list item

List item content on page

Step 1: Prepare the PDF and Create a Blank List Tag

Open your PDF document in Adobe Acrobat Pro.

  1. Right-click in the Tags panel
  2. Select “New Tag,”
  3. Create a blank List tag (capital “L”).
Right-click in the Tags panel

Select "New Tag,"

Create a blank List tag (capital "L").

Step 2: Create List Item Tags and Nest Them

Manually create list item (LI) tags and nest them within the List tag. To do this, right-click the List tag, select “New Tag,” and type “LI” (capital “L” and “I”).

LI tags

Step 3: Add Label and L Body Tags

For each LI tag, create Label (LBL) and L Body tags. Right-click each LI tag, select “New Tag,” and type “Lbl” for Label and “LBody” for LBody tags. Drag and drop these tags into their appropriate locations within the LI tags.

Step 4: Create Tags from Selections for List Items and Nested Lists

Select the appropriate tag in the Tags panel, highlight the corresponding content in the document, and use the “Create Tag from Selection” option from the Options menu in the Tags pane. Repeat this process for each list item and nested list.

Step 5: Create a Nested List Structure

For nested lists, create a new List tag structure within the L Body tag of the parent list item. Create new LI tags for each nested list item, then add Label and L Body tags as before.

Step 6: Review the Tag Structure and Run the Accessibility Checker

Review the tag structure to ensure all list items and nested lists are properly tagged. Run the Accessibility Checker to identify any missing content or issues.

Step 7: Fix Missing Content and Rerun the Accessibility Checker

If the Accessibility Checker identifies missing content, use the Reading Order tool or the Tags panel to correct the issue. Rerun the Accessibility Checker to confirm that the document is now accessible.

Conclusion: Enhancing Accessibility through Manually Tagging Lists in PDFs

Manually tagging lists in PDFs using Adobe Acrobat Pro ensures your documents are well-structured and accessible for all users. By following these steps, you can create an inclusive digital experience that adheres to accessibility standards. Stay tuned to the Accessibility Guy channel for more tips and tricks on enhancing document accessibility, and don’t forget to like and subscribe!

Making Your Bilingual Microsoft Word Table Accessible in PDF

Introduction

Welcome to the Accessibility Guy channel! In today’s post, we will be discussing how to convert a bilingual table created in Microsoft Word, which uses both English and Spanish, into a PDF while ensuring that it remains accessible. If you find this helpful, don’t forget to like and subscribe for more content on accessibility.

Video Overview

Step 1: Saving the Word File and Creating a PDF

To begin, save your Word file, which should have an accessible table with English, Spanish, and some PNG checkboxes. Next, under the Acrobat tab, select “Create PDF” and save the file. Since the table was already accessible in Microsoft Word, it should mostly transfer over to the PDF as accessible.

Step 2: Checking the Tags Panel

After converting the table to a PDF, open the tags panel on the far left side of the page to check if the table has been tagged properly. If you see a section tag and a blank p tag, you will need to make some adjustments to ensure the document is accessible.

Step 3: Making the Document Accessible

First, change the section tag to a document tag by right-clicking the section tag, selecting “Properties,” typing in the word “document,” and clicking “Close.” This will help the PDF pass PDF UA and WCAG accessibility standards. Next, change the blank p tag to an artifact by right-clicking the empty container and selecting “Change Tag to Artifact.” For the artifact type, choose “Page” and click “OK.” You can then delete the p tag.

Step 4: Cleaning Up the Table Structure

Go through the table cells to ensure proper formatting, and use the table editor to adjust table headers if necessary. Remove any blank p tags by right-clicking and changing the tag to an artifact. This process will help clean up the table structure, making it more accessible.

Step 5: Running the Accessibility Checker

Once the table structure is in place, run the accessibility checker to identify any issues that may still need to be addressed. In the case of the example provided, the nested alternate text failed. To resolve this, remove the alt text from the path tag, which should resolve the issue.

Step 6: Fixing Missing Alt Text

You can fix missing alt text by using the accessibility checker panel. Right-click on the issue and select “Fix” to add the alt text. Ensure that your alt text is descriptive and helpful for users.

Step 7: Verifying the Spanish Text

Make sure that the Spanish text has been properly recognized. To do this, select the Spanish text and use the “Find Tag from Selection” option. Right-click the p tag and ensure the language setting is correct.

Step 8: Final Checks

Save your file and run the accessibility checker one last time to ensure that everything is in order. If any issues remain, address them accordingly. In the example provided, the title was missing and was fixed by right-clicking and selecting “Fix.”

Conclusion

In this tutorial, we went through the process of converting a bilingual table in Microsoft Word into a PDF while ensuring its accessibility. Although there may be some challenges and bugs along the way, the final result should be a fully accessible PDF document that meets PDF UA and WCAG standards. Thank you for joining us on this journey, and don’t forget to like and subscribe for more accessibility content!

Order 508 documents

What are PDF tags?

Tags are the basis for accessibility within a PDF. Without proper tags there is no accessibility. Tag elements provide semantic information for screen readers, control the reading order, and other important functions. An important first step is to determine if your PDF has tags. Review this post to find out if your document has tags.

Why do PDF tags matter?

Assistive technology will read tags and use them as a method for navigating larger documents. A tagged PDF is essential for those with visual disabilities and anyone who is using assistive technology like JAWS or NVDA.

PDF tags make it possible to identify content like headings, lists, links, tables, forms, and other important features. Not all programs can export a tagged PDF – so make sure you are using the right tools!

Sample screenshot of tags panel

Sample screenshot of the tags panel

Video overview of PDF Tags

Tag Relationships

Tags come in a pair and can sometimes be referred to as a Parent-Child relationships. In the example below the Figure tag is the parent tag and image container is the Child tag.

Every parent tag will have a child tag. This is useful for moving tags around in the tags panel.

The PDF Tags breakdown

If a tag is not properly categorized it will fail accessibility checks and be confusing to its users. Adding tags does not change the visual appearance of the document; it provides invisible layer of formatting within the document that works with screen readers. PDF tags also allows the content to reflow seamlessly on devices with smaller screens, like smartphones and tablets. Here is a brief explanation of what each tag represents:

<P>

The P tag is the most basic and universal tag. This tag is used as body text.

<H1> <H2> <H3> <H4> <H5> <H6>

These are heading tags. Most documents will have a single H1 tag, but larger documents could contain more. Modern assistive technology can recognize up to six heaving levels. Always use headings in order. Think of them like an outline.

  1. The Parent Tag <H1>
  2. The child tag (container)
  3. The content the tag is referencing (content on page)
The Parent Tag <H1>
The child tag (container)
The content the tag is referencing (content on page)

pdf tags

<L> <LI> <Lbl> <LBody>

List elements contain a specific structure. These tags represent the structure of accessible lists. Some accessibility guidelines require the use of Lbl and other guidelines do not.

  1. List Parent Tag <L>
  2. List Item Child Tag <LI>
  3. Label <Lbl>
  4. List Body child Tag <LBody>
  5. Contents of First list item
  6. List item content on page
List Parent Tag <L>
List Item Child Tag <LI>
Label <Lbl>
List Body child Tag <LBody>
Contents of First list item
List item content on page


pdf tags

<Figure>

The figure tag represents any and all images. At this time the figure tag is used for all graphics within a PDF.

  1. <Figure> is a parent tag
  2. The Image is a child tag (container)
  3. The image as content on page
<Figure> is a parent tag

The Image is a child tag (container)

The image as content on page

<Table> <TR> <TH> <TD>

Reading plain text is an easy task for assistive technologies. A table of data presents a complex more task. Proper PDF tag structure makes this possible by identifying essential information including the number of rows and columns as well as column (or row) headers, and which heading each data entry corresponds to. The more complex a table is, the more significant the challenge to tag it correctly.

  1. Table Parent Tag <Table>
  2. Table Row Child tag <TR>
  3. Table Header Cell <TH>
  4. Table Data Cell <TD>
  5. Table on Page
Table Parent Tag <Table>
Table Row Child tag <TR>
Table Header Cell <TH>
Table Data Cell <TD>
Table on Page

<Link>, Link – OBJR

Every link tag needs a Link-OBJR tag.

  1. Parent tag <P>
  2. Link Tag <Link>
  3. Link Reference Object
  4. The link Text on screen
  5. Content on page
Parent tag <P>
Link Tag <Link>
Link Reference Object 
The link Text on screen
Content on page

<Reference> & <Note>

Reference and Note tags are up for interpretation but are commonly used within PDFs to “visually” break content apart.

Reading Order

An accessible PDF provides the instructions to the assistive technologies such as screen readers to read the content properly and in the correct order. The tag order within the tag tree will determine the reading order of the document. For documents without this logical structure, the best case would be that assistive technologies would guess at the correct order that the content should be presented in. In worst cases, the content would be completely unable to be read. The outcome is that the content becomes useless to the user.

How do I apply tags to a document?

There are multiple methods to apply tags to a document. The most common methods are:

Advanced Tag Breakdown

The following is a detailed breakdown of available tag structure within a pdf. It has been adapted from https://accessible-pdf.info/basics/general/overview-of-the-pdf-tags

Grouping elements

PDF tagSemantic meaningPossible and semantically meaningful parent elementsPossible and semantically meaningful child elements
DocumentRepresents a complete documentGrouping elements, Block-level structure elements
PartDivision of a larger document into smaller, associated partsDocumentArtSectDivBlockQuoteCaptionTOCIndex
ArtParts of content which together are conclusive, i.e. an article or part of a documentDocumentPartSectDivBlockQuoteSectDivBlockQuoteCaptionTOCIndex
SectGrouped related content parts, for example several paragraphs, which can be combined into a groupDocumentPartArtSectDivBlockQuoteArtSectDivBlockQuoteCaptionTOCIndex
DivGeneric group element without semantic meaningDocumentPartArtSectDivBlockQuoteArtSectDivBlockQuoteCaptionTOCIndex
BlockQuoteOne or more paragraphs that originate from another author, in other words, that have been quotedDocumentPartArtSectDivArtSectDivCaption
CaptionA caption to describe for example a picture or a tableDocumentPartArtSectDivBlockQuoteTableLSectDivBlockQuote
TOCContainer for table of contents entries. Can be used either as a flat hierarchy (all contained TOCI on one level) or as a complex hierarchy (TOC within a TOCI as a subgroup). Can be contained multiple times in a document, since it can also be used for image or table directories.DocumentPartArtSectDivTOCI
TOCIEntry within a table of contents (TOC).TOCTOCPLblReference
IndexContainer for a subject indexDocumentPartArtSectDivL

Block-level structure elements

Paragraph elements

PDF tagSemantic meaningPossible and semantically meaningful parent elementsPossible and semantically meaningful child elements
POrdinary paragraphDocumentPartArtSectDivBlockQuoteCaptionTOCIInline-level structure elements
H1H2H3H4H5H6Hierarchical headings on levels 1 to 6DocumentPartArtSectDivBlockQuoteInline-level structure elements

List elements

PDF tagSemantic meaningPossible and semantically meaningful parent elementsPossible and semantically meaningful child elements
LList container; groups together all list elements that belong togetherDocumentPartArtSectDivBlockQuoteIndexLICaption
LIContainer of a list entry; can contain an L to create multi-level listsLLblLBodyL
LblComes from the term “label” and represents the numbering or bullet character within a list. It’s not actually a block-level structure element and can also be used in other elements such as TOCI or Caption.LI
LBodyContains the contents of a list entryLIInline-level structure elements

Table elements

PDF tagSemantic meaningPossible and semantically meaningful parent elementsPossible and semantically meaningful child elements
TableTable container; combines all related table elementsDocumentPartArtSectDivBlockQuoteTRCaptionTHeadTBodyTFoot
TRGroups a table rowTableTHeadTBodyTFootTHTD
THTable heading cell; describes the meaning either at horizontal (line) or vertical (column) levelTRInline-level structure elements
TDOrdinary table data cellsTRInline-level structure elements
THeadA group of table rows (TR) to mark them as table header; can be used optionallyTableTR
TBodyA group of table rows (TR) to mark them as table content; can be used optionallyTableTR
TFootA group of table rows (TR) to mark them as table footer; can be used optionallyTableTR

Inline-level structure elements

PDF tagSemantic meaningPossible and semantically meaningful parent elementsPossible and semantically meaningful child elements
SpanGeneric container without semantic meaning; is used, among other things, for visual markups, language changes or for adding ActualText (e.g. for ignoring hyphens)PH1H6LBodyTDQuoteNote
QuoteUsed like BlockQuote for quoted content; however, Quote is used at line levelPH1H6LBodyTDSpan
NoteFootnote or endnote text (not the reference character in the body text). The footer/end-note character within Note and Reference will be placed in a Lbl.PH1H6LBodyTDLblPSpan
ReferenceRefers to another place in the document, e.g. footnote or directory entryPH1H6LBodyTDLbl
CodeMarking of programming languagePH1H6LBodyTD
LinkLink to a web page or to a place within the documentPH1H6LBodyTD
AnnotAnnotations that are not a link or a widget (form field), like comments and videos.PH1H6LBodyTD

Illustration graphic elements

PDF tagSemantic meaningPossible and semantically meaningful parent elementsPossible and semantically meaningful child elements
FigurePhoto or graphicDocumentPartArtSectDivBlockQuotePLBodyTD
FormulaMathematical formulaDocumentPartArtSectDivBlockQuotePH1H6LBodyTD
FormForm elementDocumentPartArtSectDivPTD

How to add tags to a PDF

Need more PDF Training?

Scan and OCR a PDF in Adobe Acrobat Pro DC

Do you ever have an image in your PDF that you need to OCR and make text? Adobe Acrobat has a feature called Scan and OCR which can do just that. This is useful for when you have a scanned image or poor-quality PDF.

Best practice

Try to avoid using images of text as this is a direct violation of accessibility guidelines. This is why the Scan and OCR feature is useful.

Video Overview of how to use the Scan and OCR feature in Adobe Acrobat

Text Overview of how to use the Scan and OCR feature

  1. Select the scan and ocr tool
  2. Select recognize text in this file
  3. Select recognize text
Select the scan and ocr tool

Select recognize text in this file

Select recognize text

How do I test to make sure that it worked?

The best method to ensure that your Scan and OCR worked is to try and copy and paste the text into another program. This will tell you that the OCR took place and will allow you to check for accuracy.

Are there any better OCR programs?

Adobe Acrobats OCR tool is not the best when it comes to accuracy. If you are using OCR for textbooks or high quantities of documents, id refer you to check out Abbyy FineReader. Lucky for you – I have a bunch of videos about this program too.

Need additional help on accessible PDFs? Review the links below

Order 508 documents

Subscribe to The Accessibility Guy posts

* indicates required