cancel
Showing results for 
Search instead for 
Did you mean: 

wget plus regex

 
SOLVED
Go to solution
Piotr Kirklewski
Super Advisor

wget plus regex

Hi there
I need to know why the following comman does not work.
(ERROR 404: Not Found)

wget -q -O- http://mirrors.kernel.org/gentoo/releases/x86/autobuilds/current-iso/install-x86-minimal-[1-9].iso

Jesus is the King
7 REPLIES
Goran Koruga
Honored Contributor

Re: wget plus regex

Hello.

You want to use the "{..}" construct if you want shell to generate arguments (note that this doesn't work with all the shells):

wget -q -O- http://mirrors.kernel.org/gentoo/releases/x86/autobuilds/current-iso/install-x86-minimal-{1..9}.iso

Regards,
Goran
Steven Schweda
Honored Contributor

Re: wget plus regex

> wget [...]

wget --version
uname -a

> I need to know why the following comman
> does not work.
> [...]

I'd need to know what you expected it to do.

Whom were you expecting to expand your
regular expression? Your shell? Wget? The
(remote) HTTP server?

If you're interested in learning what wget
tried to do, then you might try adding "-d"
to your wget command.

Did you look at what's available in:

http://mirrors.kernel.org/gentoo/releases/x86/autobuilds/current-iso/

???


> wget -q -O- [...]

Opinion:

The "-O" option in wget may be particularly
unwise if you were planning to fetch multiple
files.

What, exactly, are you trying to do?
Piotr Kirklewski
Super Advisor

Re: wget plus regex

GNU Wget 1.11.4
Linux pxe001bri 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

wget -O- http://mirrors.kernel.org/gentoo/releases/x86/autobuilds/current-iso/install-x86-minimal-20101123.iso

The 20101123 bit of the URL is a subject to frequent changes I do not want to modify this bit every time I need to update my pxe.
Jesus is the King
Piotr Kirklewski
Super Advisor

Re: wget plus regex

I'm trying to download the most current minimal installation iso for gentoo.

{1..9} does not work either.

Jesus is the King
Matti_Kurkela
Honored Contributor
Solution

Re: wget plus regex

Goran assumed you wanted to retrieve these 9 files:
install-x86-minimal-1.iso
install-x86-minimal-2.iso
install-x86-minimal-3.iso
install-x86-minimal-4.iso
install-x86-minimal-5.iso
install-x86-minimal-6.iso
install-x86-minimal-7.iso
install-x86-minimal-8.iso
install-x86-minimal-9.iso

From your reply, I see that is not correct - but that was not obvious from your original post.

Wget does not use regexps in download URLs: only shell-style wildcards. Even those are available only if wget can download a directory listing, i.e. with the FTP protocol only.

(When you browse to "http://mirrors.kernel.org/gentoo/releases/x86/autobuilds/current-iso/", what you see is an autogenerated index in HTML format. There is no standard way to programmatically identify it as such, and thus it cannot be processed as a directory listing.)

You'll also want the wildcards interpreted by wget, not the shell, so you must quote the wildcard-containing parameters.

You also forgot to include 0 in your wildcard expression: [0-9] instead of [1-9].

Because the filename format is "install-x86-minimal-YYYYMMDD.iso", you must repeat the [0-9] part the appropriate number of times. (Or you might use * instead.)

This will dump the latest ISO to standard output:

wget -q -O- 'ftp://mirrors.kernel.org/gentoo/releases/x86/autobuilds/current-iso/install-x86-minimal-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].iso'

But you might want to do something like this instead:

wget -O gentoo.iso 'ftp://mirrors.kernel.org/gentoo/releases/x86/autobuilds/current-iso/install-x86-minimal-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].iso'

This will cause the ISO to be downloaded and saved as "gentoo.iso" in the current directory.

MK
MK
Goran Koruga
Honored Contributor

Re: wget plus regex

Hello.

I did not want to download anything, it was the original poster who did.

I merely pointed out how he should use the {..} construct to generate URL-s with the help of shell.

But since he failed to tell us how the actual files are named, the proposed solution obviously failed.

Regards,
Goran
Steven Schweda
Honored Contributor

Re: wget plus regex

Another HTTP possibility to explore might
look something like:

wget -r -e robots=off -A .html,.iso \
http://mirrors.kernel.org/gentoo/releases/x86/autobuilds/current-iso/

That is, a recursive download on the
".../current-iso/" index page, accepting only
(the original) ".html" listing and the one
".iso" file to be found there. (Being sure
to ignore the "robots.txt" file, which
doesn't want you doing things like this.)

Again, use of "-O" could wreck this whole
plan, and if anyone ever puts another ".iso"
file in there, then you'd get that, too.