EnglishFrenchSpanish

OnWorks favicon

urlgrabber - Online in the Cloud

Run urlgrabber in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command urlgrabber that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


urlgrabber - a high-level cross-protocol url-grabber.

SYNOPSIS


urlgrabber [OPTIONS] URL [FILE]

DESCRIPTION


urlgrabber is a binary program and python module for fetching files. It is designed to be
used in programs that need common (but not necessarily simple) url-fetching features.

OPTIONS


--help, -h
help page specifying available options to the binary program.

--copy-local
ignored except for file:// urls, in which case it specifies whether urlgrab should
still make a copy of the file, or simply point to the existing copy.

--throttle=NUMBER
if it's an int, it's the bytes/second throttle limit. If it's a float, it is first
multiplied by bandwidth. If throttle == 0, throttling is disabled. If None, the
module-level default (which can be set with set_throttle) is used.

--bandwidth=NUMBER
the nominal max bandwidth in bytes/second. If throttle is a float and bandwidth == 0,
throttling is disabled. If None, the module-level default (which can be set with
set_bandwidth) is used.

--range=RANGE
a tuple of the form first_byte,last_byte describing a byte range to retrieve. Either
or both of the values may be specified. If first_byte is None, byte offset 0 is
assumed. If last_byte is None, the last byte available is assumed. Note that both
first and last_byte values are inclusive so a range of (10,11) would return the 10th
and 11th bytes of the resource.

--user-agent=STR
the user-agent string provide if the url is HTTP.

--retry=NUMBER
the number of times to retry the grab before bailing. If this is zero, it will retry
forever. This was intentional... really, it was :). If this value is not supplied or
is supplied but is None retrying does not occur.

--retrycodes
a sequence of errorcodes (values of e.errno) for which it should retry. See the doc on
URLGrabError for more details on this. retrycodes defaults to -1,2,4,5,6,7 if not
specified explicitly.

MODULE USE EXAMPLES


In its simplest form, urlgrabber can be a replacement for urllib2's open, or even python's
file if you're just reading:

from urlgrabber import urlopen
fo = urlopen(url)
data = fo.read()
fo.close()

Here, the url can be http, https, ftp, or file. It's also pretty smart so if you just give
it something like /tmp/foo, it will figure it out. For even more fun, you can also do:

from urlgrabber import urlopen
local_filename = urlgrab(url) # grab a local copy of the file
data = urlread(url) # just read the data into a string

Now, like urllib2, what's really happening here is that you're using a module-level object
(called a grabber) that kind of serves as a default. That's just fine, but you might want
to get your own private version for a couple of reasons:

* it's a little ugly to modify the default grabber because you have to
reach into the module to do it
* you could run into conflicts if different parts of the code
modify the default grabber and therefore expect different
behavior

Therefore, you're probably better off making your own. This also gives you lots of
flexibility for later, as you'll see:

from urlgrabber.grabber import URLGrabber
g = URLGrabber()
data = g.urlread(url)

This is nice because you can specify options when you create the grabber. For example,
let's turn on simple reget mode so that if we have part of a file, we only need to fetch
the rest:

from urlgrabber.grabber import URLGrabber
g = URLGrabber(reget='simple')
local_filename = g.urlgrab(url)

The available options are listed in the module documentation, and can usually be specified
as a default at the grabber-level or as options to the method:

from urlgrabber.grabber import URLGrabber
g = URLGrabber(reget='simple')
local_filename = g.urlgrab(url, filename=None, reget=None)

AUTHORS


Written by: Michael D. Stenner <[email protected]> Ryan Tomayko
<[email protected]>

This manual page was written by Kevin Coyner <[email protected]> for the Debian system
(but may be used by others). It borrows heavily on the documentation included in the
urlgrabber module. Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU General Public License, Version 2 any later version published
by the Free Software Foundation.

RESOURCES


Main web site: http://linux.duke.edu/projects/urlgrabber/

04/09/2007 URLGRABBER(1)

Use urlgrabber online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    Osu!
    Osu!
    Osu! is a simple rhythm game with a well
    thought out learning curve for players
    of all skill levels. One of the great
    aspects of Osu! is that it is
    community-dr...
    Download Osu!
  • 2
    LIBPNG: PNG reference library
    LIBPNG: PNG reference library
    Reference library for supporting the
    Portable Network Graphics (PNG) format.
    Audience: Developers. Programming
    Language: C. This is an application that
    can also...
    Download LIBPNG: PNG reference library
  • 3
    Metal detector based on  RP2040
    Metal detector based on RP2040
    Based on Raspberry Pi Pico board, this
    metal detector is included in pulse
    induction metal detectors category, with
    well known advantages and disadvantages.
    RP...
    Download Metal detector based on RP2040
  • 4
    PAC Manager
    PAC Manager
    PAC is a Perl/GTK replacement for
    SecureCRT/Putty/etc (linux
    ssh/telnet/... gui)... It provides a GUI
    to configure connections: users,
    passwords, EXPECT regula...
    Download PAC Manager
  • 5
    GeoServer
    GeoServer
    GeoServer is an open-source software
    server written in Java that allows users
    to share and edit geospatial data.
    Designed for interoperability, it
    publishes da...
    Download GeoServer
  • 6
    Firefly III
    Firefly III
    A free and open-source personal finance
    manager. Firefly III features a
    double-entry bookkeeping system. You can
    quickly enter and organize your
    transactions i...
    Download Firefly III
  • More »

Linux commands

Ad