From Newsgroup: alt.comp.os.windows-11
On Mon, 5/18/2026 3:14 PM, Nil wrote:
On 18 May 2026, Andy Burns <usenet@andyburns.uk> wrote in alt.comp.os.windows-11:
If you don't mind using an online service, there's Google Docs,
but be wary if the docs are private, I think OneNote can do it but
have never used it.
You're right about the privacy concern, but otherwise, Google Docs does
a *FAR* better job at OCR than any OCR program I've ever used. I
supposed it's an AI thing.
There is the Tesseract level of OCR, with the 0/o/O problem.
What LLM-AI potentially adds to that, is the ability to use
the grammar and syntax to aid conversion and decide which of
multiple probabilities makes sense.
And they were doing that before, before LLM-AI, using regular
procedural code (using syntax and grammer to buttress bad conversion).
The LLM-AI just has a larger database to work with.
The most impressive demo I've had of this so far, involved
UEFI Secure Boot messages on an LCD screen. The Break/Pause key
does not work to halt the screen when a Secure Boot error appears.
Only shooting video of it, offers a chance to take note of
some PCR issue.
So I shoot the video, my camera was on an angle, so pitch and
yaw and so on, not perfect. The contrast ratio was not good.
The OCR of that message ? Perfect. Even though no regular
(Tesseract style) conversion would have coughed up anything.
You could not follow the edge of the characters when the
image is degraded that badly. But the PCR issue, is something
the LLM-AI has seen as text before, so it can take a stab
at the message. And visually comparing the picture to the
OCR, it was perfect.
There is a claim that some of the LLM-AI can do OCR as
a native function. What's unclear, is how a person figures
out what input modes exist. The LLM-AI "needs to be a little
bit agentic" to be instructed to grab a file from the
file system, if you have an OCR task in a PNG for it.
I have seen some screenshots of LLM-AI that no longer
blow up when you ask then "what are your capabilities?" :-)
Previously, that was not a question you could ask.
If you "scan to PDF", using mutool (mupdf), you should be able
to extract the pixmap from each page if you want to
process it as a PNG or JPG.
But as for being easy to do, the SnippingTool function
(which accepts an image file as input), at least the
interface works. The processing of the columnar
appearance of a document, is pretty basic and needs work.
You either get a "reasonable" conversion, or you end up
with words "sprayed all over the place", depending on
the white-space pattern of the doc page. It still cannot
"process a table" properly. I have artificially created
tables for it, to "help it", and it made no difference
to the botched output. So it's "not an input quality" problem.
Paul
--- Synchronet 3.21d-Linux NewsLink 1.2