RESOURCES

TLDR Breakdown: PDFs have limitations that make them hard for us to work with, and may create quality issues or cost a little more to translate. If you have original and editable source files, please send them to us, especially if your materials have a significant visual design aspect. If you don’t, send your PDFs to us anyway and we can get a quote going for you.

Why We Can't Translate Your PDF

PDF files are so prevalent because they preserve content in its designated format, irrespective of the operating system or software displaying the file. This is important because different browsers, applications and operating systems display not just fonts but colors and layouts slightly differently. They function sort of like a digital photocopy. Unfortunately, the same encoding and rendering attributes that provide this constancy and control also mean that the tools that we use for translation projects can struggle (or fail) to parse text correctly, especially in design-heavy or graphically complex documents. This can mean either potential quality issues, added cost or unpleasant surprises for the client.

The first thing to know is that there are really two kinds of PDFs - digitally created PDFs in which the text is encoded in its own layer, allowing it to be parsed and searched relatively easily, and image PDFs that are usually based on a photograph or a scanned image.

Both images present problems for translation projects, but in different ways:

  • Image PDFs are often not parseable at all, and must be manually re-created or analyzed through optical character recognition (OCR) tools, which can add cost or introduce errors

  • Text PDFs can be parsed through Acrobat or OCR, but images, typography, and intricate designs must be manually re-created by our staff, which duplicates careful work done by the original design team but with less control

Of the two, text PDFs that were composed digitally represent both the lion’s share of PDFs that we see, and the most easily avoided problem, because they were almost always composed in-house using software that we know how to use, like Adobe InDesign and Illustrator or Microsoft Office. Lots of folks send PDFs because they are portable, easily packaged, or just out of habit, but in this case, we really need the original source file. If we have to try to replicate original design specs based on the PDF, we usually have to charge a fee in Desktop Publishing Services, which goes at $65/hr. We are willing to bet your designer would prefer that we get the original too, rather than us going back and trying to match fonts by eye, manually lay out images with text and get all the little details in place in reverse order.

This is one of those cases in which the picture really is worth a thousand words, so here are screenshots of a paper, published with permission from Tableau, that show two pages of the original PDF side-by-side with what they look like post-OCR, in the form that our CAT tools can process.


There are lots of issues with these exports, including:

  • Poor machine parsing of text - confusion regarding word breaks, leading to poor machine readability of segments
  • Mismatched fonts
  • Mismatched text spacing
  • Color changes, both hue changes and within solid blocks of colors

Of these issues, the most time consuming is the machine readability of text, because it will interfere with word counts and identifying repetitions (to learn more about translation memories and repetitions, please see our Localization 101 page). These errors require our staff to go over every piece of text in the document before translation to ensure that word counts are correct and that segments are properly identified, which is a crucial component in establishing terminological consistency.

The visual issues in the export also require fixing, and while our support staff have serious chops, making them fix things that could have remained unbroken from the start just doesn’t make a lot of sense.

For image PDFs, clients should expect to see a small fee on a quote for document recreation, usually around $65-$130 but potentially more, depending on size and complexity. This entails one of our staff members either running OCR if they can, and manually checking every word, or just literally re-typing the entire source document so that we can import the strings into memoQ.

So, in short, we can actually translate that PDF - but not without some hassle and/or added cost, which is almost always avoidable. If you have that source file, save yourself some time and your designer some stress and send it over, and everyone will be happier.

Want to get going on a specific project? Hit our quote page to upload files, enter parameters, and start talking specifics.

Want to learn more? Our pricing page has enough detail for you to make your own ballpark localization budget, and our Localization 101 page gives you the rundown on how projects work and what you might expect.

If you have more questions and want to chat, enter your info in the form below, or shoot a line over to sales@glyphservices.com.