EnglishFrenchSpanish

OnWorks favicon

pullnews - Online in the Cloud

Run pullnews in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command pullnews that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


pullnews - Pull news from multiple news servers and feed it to another

SYNOPSIS


pullnews [-BhnOqRx] [-a hashfeed] [-b fraction] [-c config] [-C width] [-d level] [-f
fraction] [-F fakehop] [-g groups] [-G newsgroups] [-H headers] [-k checkpt] [-l logfile]
[-m header_pats] [-M num] [-N timeout] [-p port] [-P hop_limit] [-Q level] [-r file] [-s
to-server[:port]] [-S max-run] [-t retries] [-T connect-pause] [-w num] [-z article-pause]
[-Z group-pause] [from-server ...]

REQUIREMENTS


The "Net::NNTP" module must be installed. This module is available as part of the libnet
distribution and comes with recent versions of Perl. For older versions of Perl, you can
download it from <http://www.cpan.org/>.

DESCRIPTION


pullnews reads a config file named pullnews.marks, and connects to the upstream servers
given there as a reader client. This file is looked for in pathdb when pullnews is run as
the user set in runasuser in inn.conf (which is by default the "news" user); otherwise,
this file is looked for in the running user's home directory.

By default, pullnews connects to all servers listed in the configuration file, but you can
limit pullnews to specific servers by listing them on the command line: a whitespace-
separated list of server names can be specified, like from-server for one of them. For
each server it connects to, it pulls over articles and feeds them to the destination
server via the IHAVE or POST commands. This means that the system pullnews is run on must
have feeding access to the destination news server.

pullnews is designed for very small sites that do not want to bother setting up
traditional peering and is not meant for handling large feeds.

OPTIONS


-a hashfeed
This option is a deterministic way to control the flow of articles and to split a
feed. The hashfeed parameter must be in the form "value/mod" or "start-end/mod". The
Message-ID of each article is hashed using MD5, which results in a 128-bit hash. The
lowest 32 bits are then taken by default as the hashfeed value (which is an integer).
If the hashfeed value modulus "mod" plus one equals "value" or is between "start" and
"end", pullnews will feed the article. All these numbers must be integers.

For instance:

pullnews -a 1/2 Feeds about 50% of all articles.
pullnews -a 2/2 Feeds the other 50% of all articles.

Another example:

pullnews -a 1-3/10 Feeds about 30% of all articles.
pullnews -a 4-5/10 Feeds about 20% of all articles.
pullnews -a 6-10/10 Feeds about 50% of all articles.

You can use an extended syntax of the form "value/mod:offset" or
"start-end/mod:offset" (using an underscore "_" instead of a colon ":" is also
recognized). As MD5 generates a 128-bit return value, it is possible to specify from
which byte-offset the 32-bit integer used by hashfeed starts. The default value for
"offset" is ":0" and thirteen overlapping values from ":0" to ":12" can be used. Only
up to four totally independent values exist: ":0", ":4", ":8" and ":12".

Therefore, it allows to a generate a second level of deterministic distribution.
Indeed, if pullnews feeds "1/2", it can go on splitting thanks to "1-3/9:4" for
instance. Up to four levels of deterministic distribution can be used.

The algorithm is compatible with the one used by Diablo 5.1 and up.

-b fraction
Backtrack on server numbering reset. Specify the proportion (0.0 to 1.0) of a group's
articles to pull when the server's article number is less than our high for that
group. When fraction is 1.0, pull all the articles on a renumbered server. The
default is to do nothing.

-B Feed is header-only, that is to say pullnews only feeds the headers of the articles,
plus one blank line. It adds the Bytes: header field if the article does not already
have one, and keeps the body only if the article is a control article.

-c config
Normally, the config file is stored in pullnews.marks in pathdb when pullnews is run
as the news user, or otherwise in the running user's home directory. If -c is given,
config will be used as the config file instead. This is useful if you're running
pullnews as a system user on an automated basis out of cron or as an individual user,
rather than the news user.

See "CONFIG FILE" below for the format of this file.

-C width
Use width characters per line for the progress table. The default value is 50.

-d level
Set the debugging level to the integer level; more debugging output will be logged as
this increases. The default value is 0.

-f fraction
This changes the proportion of articles to get from each group to fraction and should
be in the range 0.0 to 1.0 (1.0 being the default).

-F fakehop
Prepend fakehop as a host to the Path: header of articles fed.

-g groups
Specify a collection of groups to get. groups is a list of newsgroups separated by
commas (only commas, no spaces). Each group must be defined in the config file, and
only the remote hosts that carry those groups will be contacted. Note that this is a
simple list of groups, not a wildmat expression, and wildcards are not supported.

-G newsgroups
Add the comma-separated list of groups newsgroups to each server in the configuration
file (see also -g and -w).

-h Print a usage message and exit.

-H headers
Remove these named headers (colon-separated list) from fed articles.

-k checkpt
Checkpoint (save) the config file every checkpt articles (default is 0, that is to say
at the end of the session).

-l logfile
Log progress/stats to logfile (default is "stdout").

-m header_pats
Feed an article based on header matching. The argument is a number of whitespace-
separated tuples (each tuple being a colon-separated header and regular expression).
For instance:

-m "Hdr1:regexp1 !Hdr2:regexp2 #Hdr3:regexp3 !#Hdr4:regexp4"

specifies that the article will be passed only if the "Hdr1:" header matches "regexp1"
and the "Hdr2:" header does not match "regexp2". Besides, if the "Hdr3:" header
matches "regexp3", that header is removed; and if the "Hdr4:" header does not match
"regexp4", that header is removed.

-M num
Specify the maximum number of articles (per group) to process. The default is to
process all new articles. See also -f.

-n Do nothing but read articles -- does not feed articles downstream, writes no rnews
file, does not update the config file.

-N timeout
Specify the timeout length, as timeout seconds, when establishing an NNTP connection.

-O Use an optimized mode: pullnews checks whether the article already exists on the
downstream server, before downloading it. It may help for huge articles or a slow
link to upstream hosts.

-p port
Connect to the destination news server on a port other than the default of 119. This
option does not change the port used to connect to the source news servers.

-P hop_limit
Restrict feeding an article based on the number of hops it has already made. Count
the hops in the Path: header (hop_count), feeding the article only when hop_limit is
"+num" and hop_count is more than num; or hop_limit is "-num" and hop_count is less
than num.

-q Print out less status information while running.

-Q level
Set the quietness level ("-Q 2" is equivalent to "-q"). The higher this value, the
less gets logged. The default is 0.

-r file
Rather than feeding the downloaded articles to a destination server, instead create a
batch file that can later be fed to a server using rnews. See rnews(1) for more
information about the batch file format.

-R Be a reader (use MODE READER and POST commands) to the downstream server. The default
is to use the IHAVE command.

-s to-server[:port]
Normally, pullnews will feed the articles it retrieves to the news server running on
localhost. To connect to a different host, specify a server with the -s flag. You
can also specify the port with this same flag or use -p.

-S max-run
Specify the maximum time max-run in seconds for pullnews to run.

-t retries
The maximum number (retries) of attempts to connect to a server (see also -T). The
default is 0.

-T connect-pause
Pause connect-pause seconds between connection retries (see also -t). The default is
1.

-w num
Set each group's high water mark (last received article number) to num. If num is
negative, calculate Current+num instead (i.e. get the last num articles). Therefore,
a num of 0 will re-get all articles on the server; whereas a num of "-0" will get no
old articles, setting the water mark to Current (the most recent article on the
server).

-x If the -x flag is used, an Xref: header is added to any article that lacks one. It
can be useful for instance if articles are fed to a news server which has xrefslave
set in inn.conf.

-z article-pause
Sleep article-pause seconds between articles. The default is 0.

-Z group-pause
Sleep group-pause seconds between groups. The default is 0.

CONFIG FILE


The config file for pullnews is divided into blocks, one block for each remote server to
connect to. A block begins with the host line (which must have no leading whitespace) and
contains just the hostname of the remote server, optionally followed by authentication
details (username and password for that server). Note that authentication details can
also be provided for the downstream server (a host line could be added for it in the
configuration file, with no newsgroup to fetch).

Following the host line should be one or more newsgroup lines which start with whitespace
followed by the name of a newsgroup to retrieve. Only one newsgroup should be listed on
each line.

pullnews will update the config file to include the time the group was last checked and
the highest numbered article successfully retrieved and transferred to the destination
server. It uses this data to avoid doing duplicate work the next time it runs.

The full syntax is:

<host> [<username> <password>]
<group> [<time> <high>]
<group> [<time> <high>]

where the <host> line must not have leading whitespace and the <group> lines must.

A typical configuration file would be:

# Format group date high
data.pa.vix.com
rec.bicycles.racing 908086612 783
rec.humor.funny 908086613 18
comp.programming.threads
nnrp.vix.com pull sekret
comp.std.lisp

Note that an earlier run of pullnews has filled in details about the last article
downloads from the two rec.* groups. The two comp.* groups were just added by the user
and have not yet been checked.

The nnrp.vix.com server requires authentication, and pullnews will use the username "pull"
and the password "sekret".

Use pullnews online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

Linux commands

Ad