[lug] wget page-requisites

Chris McDermott csmcdermott at gmail.com
Wed Jan 12 13:57:09 MST 2011


On Wed, Jan 12, 2011 at 1:05 PM, Davide Del Vento <
davide.del.vento at gmail.com> wrote:

>
> Thanks. This solves the simple single-page example, but of course life
> is always harder than simple examples. My actual wget is doing
> --mirror of the whole domain and adding the --span-hosts mess that
> out.
> What I want is a --span-host that works only for the --page-requisites
> and not for the recursion. It doesn't seem like a weird request at
> all, I want the pages that I am downloading to be complete with their
> requisites (images) even if they are hosted somewhere else, but I
> don't want to recurse the whole web (as it happens if I do a
> span-host). Any ideas?
>
> I guess I could count the deepest level of the domain I am mirroring,
> and use that as recursion level instead of the infinite that mirror
> uses. But if I get that wrong, I don't mirror the whole site. And then
> I have to continuously maintain that number, which is a pain. And
> then, even if not the whole internet-for-sure I am still downloading
> the world and his dog. This must be possible, isn't it?
>
> Using curl or anything else instead of wget is an option, if they are
> more flexible than wget.
>
> Thanks,
> Dav
>

Well, your other option is to use "--domains=" to restrict recursion to just
the comma-separated list of domains specified.  Or "--exclude-domains=" if
that's easier.  But that's not a huge improvement either.  I agree it's
annoying.  For what it's worth, this is from the man page:

Actually, to download a single page and all its requisites (even if they
> exist on separate websites), and make sure the lot displays properly
> locally, this author likes to use a few options in addition to *-p*:
>
> wget -E -H -k -K -p http://<site>/<document>
>
>
Not sure if that gets you closer to where you want to be...

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20110112/6d331670/attachment.html>


More information about the LUG mailing list