[lug] Stupid WGET question

George Sexton gsexton at mhsoftware.com
Thu Feb 17 09:30:53 MST 2005


-E just tacks .html onto the end of the file name, after any query string.

-A doesn't work either.

It seems that wget has a default spec that it will download, and -A can just
limit those values.

I tried adding the file extension to the mime.types file entry for
text/html, but that didn't work either.

George Sexton
MH Software, Inc.
http://www.mhsoftware.com/
Voice: 303 438 9585
  

> -----Original Message-----
> From: lug-bounces at lug.boulder.co.us 
> [mailto:lug-bounces at lug.boulder.co.us] On Behalf Of Matt Thompson
> Sent: Thursday, February 17, 2005 8:23 AM
> To: Boulder LUG
> Subject: Re: [lug] Stupid WGET question
> 
> On Wed, 2005-02-16 at 17:11 -0700, George Sexton wrote:
> > Anyone know how to have wget retrieve non-HTML files when 
> it traverses an
> > HTML page?
> > 
> > For example, I have an HTML page that has links to iCal 
> files on it. I want
> > WGET to retrieve the .HTML file, and all .ICS files 
> referenced from that
> > page.
> > 
> > Here's the URL:
> > 
> > http://www.mhsoftware.com/caldemo/iCal.html
> 
> Hmm...there might be a way to do this, but you'll probably have to
> fiddle around to get what you need.  For example, if you need to get
> the .css files for the .html page, you'd have to grab those too.
> 
> The way I might do it is:
> 
> wget --mirror -A{.ics,.html} http://...
> 
> This should go through the entire tree and grab every .ics and .html
> file preserving directory structure.  I think.  Try it and 
> see, I guess.
> You can use -A and -R to accept/reject file extensions if you need
> more/less.
> 
> HTH,
> Matt
> 
> -- 
> Learning just means you were wrong and they were right. - Aram
>    Matt Thompson -- http://ucsub.colorado.edu/~thompsma/
>    440 UCB, Boulder, CO  80309-0440
>    JILA A510, 303-492-4662
> 
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: lug.boulder.co.us port=6667 channel=#colug
> 
> 




More information about the LUG mailing list