[lug] [LUG] Organize .pdf docs

Mark Donald malleable at gmail.com
Wed Jun 10 05:42:21 MDT 2015


Steve -

Depends a bit on the type of .pdf docs.

If your PDFs that you want to search are structured text PDFs (e.g.
generated output from LibreOffice) or image over text PDFs (e.g.
scanned docs that have already been OCRed) then the self index+grep
already recommended would work.  For that type of PDF, I organize and
store them on an Alfresco Community server VM that I replicate and
travel with.  It is my overall document management system for much,
much more than just PDFs and provides Lucene search.  OTOH, there is a
learning curve as Alfresco is an enterprise level platform .  I use it
because I have done enterprise deployments of it and it is a champ at
email integration in that space.  OpenKM might be a lighter server
style solution.  For something less 'server-ish' you might take look
at Mendeley or Zotero.

If you are starting with image only PDFs and need to OCR them to
concert them into a searchable Image+text format, I don't know of
anything FOSS/Linux that handles that particularly well, though some
of the other recommendations already made will accomplish that with
varying degrees of success (e.g. Evernote Premium, Google Drive).  If
anyone knows of complete implementations of high quality FOSS OCR
applications (not just engines) I'd love to learn about it.!

Cheers,

-Mark

On Mon, Jun 8, 2015 at 2:00 PM, <lug-request at lug.boulder.co.us> wrote:
>
> Send LUG mailing list submissions to
>         lug at lug.boulder.co.us
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.lug.boulder.co.us/mailman/listinfo/lug
> or, via email, send a message with subject or body 'help' to
>         lug-request at lug.boulder.co.us
>
> You can reach the person managing the list at
>         lug-owner at lug.boulder.co.us
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of LUG digest..."
>
>
> Today's Topics:
>
>    1. Re: Organize .pdf Docs (Jeffrey S. Haemer)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 8 Jun 2015 09:28:06 -0600
> From: "Jeffrey S. Haemer" <jeffrey.haemer at gmail.com>
> To: "Boulder (Colorado) Linux Users Group -- General Mailing List"
>         <lug at lug.boulder.co.us>
> Subject: Re: [lug] Organize .pdf Docs
> Message-ID:
>         <CAABvdFwXMezwdDqLDU6i+jm04E0O18L0kso9tjOqV_85NVwf-Q at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Steve,
>
> I save most of my stuff to Google Drive. I just searched it for "Gregory's
> Theorem," which I knew was in one of the PDF documents there, and Google
> found it without breaking a sweat. That's OS-agnostic, but might not be
> open-source enough for your needs.
>
> On Fri, Jun 5, 2015 at 10:59 AM, Will <will.sterling at gmail.com> wrote:
>
> > I save them to Evernote.  Their contents along with me notes are then
> > searchable.
> >
> > On Fri, Jun 5, 2015 at 9:42 AM, Davide Del Vento <
> > davide.del.vento at gmail.com> wrote:
> >
> >> Create your own text index and search that, and/or convert the pdfs to
> >> text and grep into them.
> >>
> >> On Fri, Jun 5, 2015 at 9:26 AM, Stephen Queen <svqueen at gmail.com> wrote:
> >>
> >>> I am constantly referring to various .pdf documents. At times I'll read
> >>> some fact that is important but fairly obscure. Several week/months/ even
> >>> years later I'll want to refer back to this fact. I know I've saved the
> >>> .pdf, but I won't know which one it is in a big mess of .pdf's. I know I'm
> >>> not the only person who needs to deal with this. How do others organize and
> >>> find information in their pdf repositories? I'm looking for an open source
> >>> and linux solution.
> >>>
> >>> Steve
> >>>
> >>> _______________________________________________
> >>> Web Page:  http://lug.boulder.co.us
> >>> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >>> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
> >>>
> >>
> >>
> >> _______________________________________________
> >> Web Page:  http://lug.boulder.co.us
> >> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> >> Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
> >>
> >
> >
> > _______________________________________________
> > Web Page:  http://lug.boulder.co.us
> > Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> > Join us on IRC: irc.hackingsociety.org port=6667 channel=#hackingsociety
> >
>
>
>
> --
> Jeffrey Haemer <jeffrey.haemer at gmail.com>
> 720-837-8908 [cell], http://seejeffrun.blogspot.com [blog],
> http://www.youtube.com/user/goyishekop [vlog]
> *????????? ??? ??? ????? ????? ??????.*
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20150608/52cae66c/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> LUG mailing list
> LUG at lug.boulder.co.us
> http://lists.lug.boulder.co.us/mailman/listinfo/lug
>
>
> End of LUG Digest, Vol 140, Issue 5
> ***********************************


More information about the LUG mailing list