[lug] Grep question

Matt Thompson thompsma at colorado.edu
Fri Jul 23 13:08:19 MDT 2004


On Fri, 2004-07-23 at 11:31, David Morris wrote:
> On Fri, Jul 23, 2004 at 03:46:23AM -0600, Daniel Webb wrote:
> 
> > Come to think of it, what I really want is to be able to grep inside any
> > kind of file or archive that can possibly be converted to text (for
> > example, something.pdf.gz could be gunzipped, then pdftotext used to
> > convert to text).  Does THAT exist?
> 
> No that does not exist, but it would be relatively simple to
> create a script which does that for known file types.  Just
> a list of if-then-else statements to handle various file
> types and convert them to text before doing the grep
> (piping everything through stdout/stdin when possible to
> avoid unneeded temporary files).  The 'file' program can be
> of particular use here in determining what type each input
> file really is.

If someone here is bored enough to try this, I think a new lesspipe.sh
would get you there about halfway or more.  Its core is a long elif
list, and you can already do a "less tar.gz:contained_file" to see
something in a tarball.

Now, oddly, it doesn't handle .pdf.gz, but does .ps.gz.  I'm guessing
that's a bug between pdftotext, gzip, and -, but I'm not bored enough to
track it down.
-- 
Learning just means you were wrong and they were right. - Aram
   Matt Thompson -- http://ucsub.colorado.edu/~thompsma/
   440 UCB, Boulder, CO  80309-0440
   JILA A510, 303-492-4662
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20040723/3ec7c81d/attachment.pgp>


More information about the LUG mailing list