[lug] OT: Scanning and OCRing

Craig Talbert craig.talbert at gmail.com
Mon Feb 26 15:21:45 MST 2007


FYI - I ended up getting a copy of Abbyy FineReader 8.0, and it works
pretty well -- though sometimes it's a little to sure of itself (e.g.
in it's check process after it's done the basic OCR some things that
seem to be problematic for it &'s, Is, 1's, -'s, etc, it leaves it's
original interpretation of rather than asking you if it was right).

While it's pretty good at identifying columns and the like, it works
much better if you tell it exactly which areas are text that you want
read.  A lot of times if you let it "read" your document without
directions it will miss text that is far way from other  large text
blocks (e.g. page numbers in corners).

It also have some other oddities that make it difficult to add in text
that it missed and change the font properties (underlines, italics,
etc) when it got them wrong.

As for scanners, I decided not to spend the money on a new one and I'm
using my old school Cannon 2010-BJC with scanner cartridge. It takes
about a minute to scan a page, but I can get some reading done while
it works.

- Craig

On 2/17/07, Sean Reifschneider <jafo at tummy.com> wrote:
> On Wed, Feb 14, 2007 at 07:34:41PM -0700, Craig Talbert wrote:
> >afford it anyway :), but I'm wondering if there's anyone here who has
> >ever digitized on this kind of scale who can point me in the right
>
> Bell and Howell has some scanners that have a 500 page input hopper, and
> will scan both sides of the page as it passes through the device.  As I
> recall, it'll scan a page in just a few seconds, less than 10 I'm fairly
> sure.  As far as I know, it as open source drivers available for it, or at
> least when I wrote the driver for it the company that was paying me for it
> said they were going to try to get it included in SANE.  I didn't really
> follow it after I built it.
>
> The problem is that these devices are not cheap.  I got the impression they
> were in the $10k range.
>
> I've only played passingly with the different OCR programs, but I haven't
> had much luck.
>
> Sean
> --
>  When I do good, I feel good; when I do bad, I feel bad, and that is my
>  religion.  -- Abraham Lincoln
> Sean Reifschneider, Member of Technical Staff <jafo at tummy.com>
> tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability
>
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
>



More information about the LUG mailing list