Tuesday, August 09, 2022

Using OCR apps as legibility tests

There are now several apps you can use to copy text from books – Microsoft Lens, or Adobe Scan, for example. They are pretty good. You take a photo and it scans it with an OCR (Optical Character Recognition) routine to turn it into editable text.

I've noticed that, much like people, these apps struggle with less legible typefaces. So it's possible to use them for informal legibility testing - a kind of app-based strudel test we might say.

Here's some text from a paper I'm writing which I scanned with Microsoft Lens in three typefaces. One is Baskerville which is a typical book typeface, the other is Caslon Italic which most of us would think a bit less legible than Baskerville, and lastly I've included Pixelated, which recalls the earlier for matrix printers of the 1970s and which is on the threshold of legibility.

Lens read Baskerville perfectly, but here's what it managed to read in the Pixelated font:

It's a lot happier with Caslon Italic, although it joined a lot of words together.