This is the command wwwstat that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
PROGRAM:
NAME
wwwstat - summarize WWW server (httpd) access statistics
SYNOPSIS
wwwstat [-F system_config] [-f user_config] [options...] [--] [ summary | logfile | + | -
]...
DESCRIPTION
wwwstat reads a sequence of httpd common logfile format (CLF) access_log files and/or
prior wwwstat output summary files and/or the standard input and outputs a summary of the
access statistics in HTML.
Since wwwstat does not make any changes to the input files or write any files in the
server directories, it can be run by any user with read access to the input logfile(s) and
summary file(s). This allows people other than the webmaster to run specialized analyses
of just the things they are interested in summarizing.
wwwstat provides World Wide Web (WWW) access statistics, which does not necessarily
correspond to statistics on individual users. It counts the number of HTTP requests
received by the server and the amount of bytes transmitted in response to those requests,
according to what is in the logfile(s), and outputs those counts as tables broken down by
category of request.
wwwstat output summaries can be read by gwstat to produce fancy graphs of the summarized
statistics. The splitlog program can be used to split a large logfile into separate files
by entry prefix or URL path.
wwwstat is a perl script, which means you need to have a perl interpreter to run the
program. It has been tested with perl versions 4.036 and 5.002.
Output Sections
wwwstat's output consists of a set of cross-reference links, the sum totals and averages
for the processed data, and a sequence of amount-by-category tables partitioned into
sections. The section categories are based on the characteristics evident from the access
request, as provided by the common logfile format (see NOTES). These include:
Request Date e.g., "Feb 2 1996"
Request Hour e.g., "00" through "23"
Client Domain The Fully-Qualified Domain Name (FQDN) suffix that corresponds to an
organization type or country name.
Reversed Subdomain The FQDN, usually minus the first (machine name) component, and
reversed so that it is easier to read when sorted.
URL/Archive Grouping based on Request-URI or non-success status code.
Identity The user identity based on IdentityCheck token or Authorization field.
Each section can be enabled/disabled using the configuration files or command-line options
(see Section Display Options).
Output Table Format
Inside each section, the statistics are presented as a preformatted table.
%Reqs %Byte Bytes Sent Requests category-type
----- ----- ------------ -------- |---------------
NN.NN NN.NN NNNNNNNNNNNN NNNNNNNN | category-value
100.0 100.0 NNNNNNNNNNNN NNNNNNNN | category-value
Requests Requests received for this category-value.
Bytes Sent Bytes transmitted for this category-value.
%Reqs (<Requests>/<Total Requests>)*100.
%Byte (<Bytes Sent>/<Total Bytes>)*100.
The table can be sorted by category-value (-sort key), number of requests received (-sort
req), or number of bytes received (-sort byte). It can also be limited to the -top N
entries.
OPTIONS
Configuration Options
These options define how wwwstat should establish defaults and interpret the command-line.
-F filename
Get system configuration defaults from the given file. If used, this must be the
first argument on the command-line, since it needs to be interpreted before the
other command options. The file wwwstat.rc is included with the distribution as an
example of this file; it contains perl source code which directly sets the control
and display options provided by wwwstat. If filename is not a pathname, the
include path (see FILES) is searched for filename. An empty string as filename
will disable this feature. [-F "wwwstat.rc"]
-f filename
Get user configuration defaults from the given file. If used, this must be the
first argument on the command-line after -F (if any). The file is the same format
as for the -F option (see wwwstat.rc). If filename is not a pathname, the include
path (see FILES) is searched for filename. An empty string as filename will
disable this feature. [-f ".wwwstatrc"]
-- Last option (the remaining arguments are treated as input files).
Diagnostic Options
These options provide information about wwwstat usage or about some unusual aspects of the
logfile(s) being processed.
-h Help - display usage information to STDERR and then exit.
-v Verbose display to STDERR of each log entry processed.
-x Display to STDERR all requests resulting in HTTP error responses.
-e Display to STDERR all invalid log entries. Invalid log entries can occur if the
server is miswriting or overwriting its own log, if the request is made by a broken
client or proxy, or if a malicious attacker is trying to gain privileged access to
your system. For the latter reason, the webmaster should run wwwstat with this
option on a regular basis.
Display Options
These options modify the output format.
-H string
Use the given string as the HTML title and heading for output.
-X string
Use the given string as the cross-reference URL to the last summary output. Any
occurrence of the characters "%M" or "%Y" are replaced by the month and year,
respectively, of the month prior to the first log entry date. The empty string
will exclude any cross-reference.
-R Display the daily stats table sorted in reverse. This option is primarily for use
with the gwstat program for producing graphs of the output.
-l
-L Do (-l) or don't (-L) display the full DNS hostname of clients in your local domain
(which is determined by the configured value of $AppendToLocalhost) in the section
on subdomain statistics. The default [-L] is to strip the machine name from local
addresses.
-o
-O Do (-o) or don't (-O) display the full DNS hostname of clients outside your local
domain in the section on subdomain statistics. The default [-O] is to strip the
machine name from outside addresses.
-u
-U Do (-u) or don't (-U) display the IP address of clients with unresolved domain
names in the section on subdomain statistics. The -dns option can be used to
resolve some names, but not all IP hosts have a DNS name (SLIP/PPP connections) and
sometimes a host's DNS service is inaccessible. The default [-U] is to group all
such addresses under the category "Unresolved".
-dns
-nodns Do (-dns) or don't (-nodns) use the system's hostname lookup facilities to find the
DNS hostname associated with any unresolved IP addresses. Looking up a DNS name may
be very slow, particularly when the results are negative (no DNS name), which is
why a caching capability is included as well. [-nodns]
-cache filename
Use the given DBM database as the read/write persistent DNS cache (the .dir and
.pag extensions are appended automatically). Cached entries (including negative
results) are removed after the time configured for $DNSexpires [two months]. No
caching is performed if filename is the empty string, which may be needed if your
system does not support DBM or NDBM functionality. Running -dns without a
persistent cache is not recommended. [-cache "dnscache"]
-trunc N
Truncate the URLs listed in the archive section after the Nth hierarchy level. This
option is commonly used to reduce the output size and memory requirements of
wwwstat by grouping the requests by directory tree instead of listing every URL.
The default [-trunc 0] is to display every requested URL.
-files
-nofiles
Do (-files) or don't (-nofiles) include the last component of a URL (usually the
filename) in the archive section. This option is commonly used to reduce the output
size and memory requirements of wwwstat by grouping the requests by directory
instead of listing every URL. The default [-files] is to display the entire
requested URL.
-link
-nolink
Do (-link) or don't (-nolink) add a hypertext link around each archive URL. This
option is useful for local maintenance, but it is not recommended for publication
of the HTML results (it often results in links to temporary or nonexistant
resources, and leads people/robots to resources that might not be publically
available). [-nolink]
-cgi
-nocgi Do (-cgi) or don't (-nocgi) prefix the summary output with CGI header fields
appropriate for use with the HTTP common gateway interface. Using wwwstat as a CGI
script is not recommended - it is usually better to simply run the wwwstat program
periodically and serve the static output file. [-nocgi]
Section Display Options
These options change the display of entire sections (as opposed to the entries within
those sections). They allow the user to enable or disable an entire section, set the
sorting method for that section, and limit the number of displayed entries for that
section. These options are context-sensitive and processed in the order given.
-all
-noall Include (-all) or exclude (-noall) all of the display sections. The -noall option
is commonly used just prior to one or more of the other section options, such that
only the listed sections are displayed.
-daily
-nodaily
Include (-daily) or exclude (-nodaily) the section of statistics by request date
and set the scope for later -sort and -top options to this section.
-hourly
-nohourly
Include (-hourly) or exclude (-nohourly) the section of statistics by request hour
and set the scope for later -sort and -top options to this section.
-domain
-nodomain
Include (-domain) or exclude (-nodomain) the section of statistics by the client's
Internet domain and set the scope for later -sort and -top options to this section.
-subdomain
-nosubdomain
Include (-subdomain) or exclude (-nosubdomain) the section of statistics by the
client's Internet subdomain (reversed for display) and set the scope for later
-sort and -top options to this section.
-archive
-noarchive
Include (-archive) or exclude (-noarchive) the section of statistics by requested
URL/archive and set the scope for later -sort and -top options to this section.
-r
-ident
-noident
Include (-r or -ident) or exclude (-noident) the section of statistics by the
identity of the user (if IdentityCheck is ON) or the authentication userid (if
supplied) and set the scope for later -sort and -top options to this section. DO
NOT PUBLISH this information, as that would reveal security-related identities and
be a violation of privacy. This option is provided for administrative purposes
only.
-sort (key|byte|req)
Sort this section by its primary key, the number of bytes transmitted, or the
number of requests received. [-sort key]
-top N Display only the top N entries for this section. This option assumes that the -sort
option has been set to either bytes or requests.
-both Display both the top N entries for this section [10, sorted by requests], and then
the full section (all entries) sorted by key.
Search Options
These options are used to limit the analysis to requests matching a pattern. The pattern
is supplied in the form of a perl regular expression, except that the characters "+" and
"." are escaped automatically unless the -noescape option is given. Enclose the pattern
in single-quotes to prevent the command shell from interpreting some special characters.
Multiple occurrences of the same option results in an OR-ing of the regular expressions.
Search options are only applied to logfile entries; any summary files input must have been
created with the same search options.
-a regexp
-A regexp
Include (-a) or exclude (-A) all requests containing a hostname/IP address matching
the given perl regular expression.
-c regexp
-C regexp
Include (-c) or exclude (-C) all requests resulting in an HTTP status code matching
the given perl regular expression.
-d regexp
-D regexp
Include (-d) or exclude (-D) all requests occurring on a date (e.g., "Feb 2 1994")
matching the given perl regular expression.
-t regexp
-T regexp
Include (-t) or exclude (-T) all requests occurring during the hour (e.g., "23" is
11pm - 12pm) matching the given perl regular expression.
-m regexp
-M regexp
Include (-m) or exclude (-M) all requests using an HTTP method (e.g., "HEAD")
matching the given perl regular expression.
-n regexp
-N regexp
Include (-n) or exclude (-N) all requests on a URL (archive name) matching the
given perl regular expression.
-noescape
Do not escape the special characters ("+" and ".") in the remaining search options.
INPUT
After parsing the options, the remaining arguments on the command-line are treated as
input arguments and are read in the order given. If no input arguments are given, the
configured default logfile is read [+].
- Read from standard input (STDIN).
+ Read the default logfile. [as configured]
filename...
Read the given file and determine from the first line whether it is a previous
output summary or a CLF logfile. If the filename's extension indicates that is is
compressed (gz|z|Z), then pipe it through the configured decompression program
[gunzip -c] first. Summary files must have been created with the same (or similar)
configuration and command-line options as the currently running program; if not,
weird things will happen.
USAGE
wwwstat is used for many purposes:
o as a diagnostic utility for measuring server activity, finding incorrect URL
references, and detecting attempted misuse of the server;
o as a public relations tool for measuring technology or information transfer (i.e.,
Is the message getting out? To the right people?);
o as an archival tool for tracking web usage over time without storing the entire
logfile; and,
o most often, as an easy mechanism for justifying all the hard work that went into
creating the web content that people out there are requesting.
In most cases, wwwstat is run on a periodic basis (nightly, weekly, and/or monthly) by a
wrapper program as a crontab entry shortly after midnight, typically in conjunction with
rotating the current logfile. The output is usually directed to a temporary file which
can later be moved to a published location. The temporary file is necessary to avoid
erasing your published file during wwwstat's processing (which would look very odd if
someone tried to GET it from your web).
wwwstat can be run as a CGI script (-cgi), but that is not recommended unless the input
logfile is very small.
All of the command-line options, and a few options that are not available from the
command-line, can be changed within the user and system configuration files (see
wwwstat.rc). These files are actually perl library modules which are executed as part of
the program's initialization. The example provided with the distribution includes
complete documentation on what variables can be set and their range of values.
Perl Regular Expressions
The Search Options and many of the configuration file settings allow for full use of perl
regular expressions (with the exception that the -a, -A, -n and -N options treat '+' and
'.' characters as normal alphabetic characters unless they are preceded by the -noescape
option). Most people only need to know the following special characters:
^ at start of pattern, means "starts with pattern".
$ at end of pattern, means "ends with pattern".
(...) groups pattern elements as a single element.
? matches preceding element zero or one times.
* matches preceding element zero or more times.
+ matches preceding element one or more times.
. matches any single character.
[...] denotes a class of characters to match. [^...] negates the class. Inside a class,
'-' indicates a range of characters.
(A|B|C) matches if A or B or C matches.
Depending on your command shell, some special characters may need to be escaped on the
command line or enclosed in single-quotes to avoid shell interpretation.
EXAMPLES
Summarize requests from commercial domains.
wwwstat -a '.com$'
Summarize requests from the host kiwi.ics.uci.edu
wwwstat -a '^kiwi.ics.uci.edu$'
Summarize requests not from kiwi.ics.uci.edu
wwwstat -A '^kiwi.ics.uci.edu$'
Summarize requests resulting in temporary redirects
wwwstat -c '302'
Summarize requests resulting in server errors
wwwstat -c '^5'
Summarize unsuccessful requests
wwwstat -C '^2' -C '304'
Summarize requests in first week of the month
wwwstat -d ' [1-7] '
Summarize requests in second week of the month
wwwstat -d ' ([89]|1[0-4]) '
Summarize requests in third week of the month
wwwstat -d ' (1[5-9]|2[01]) '
Summarize requests in fourth week of the month
wwwstat -d ' 2[2-8] '
Summarize requests in leftover days of the month
wwwstat -d ' (29|30|31) '
Summarize requests in February
wwwstat -d 'Feb'
Summarize requests in year 1994
wwwstat -d '1994'
Summarize requests not in April
wwwstat -D 'Apr'
Summarize requests between midnight and 1am
wwwstat -t '00'
Summarize requests not received between noon and 1pm
wwwstat -T '12'
Summarize requests with a gif extension
wwwstat -n '.gif$'
Summarize requests under user's URL
wwwstat -n '^/~user/'
Summarize requests not under "hidden" paths
wwwstat -N '/hidden/'
ENVIRONMENT
HOME Location of user's home directory, placed on INC path.
LOGDIR Used instead of HOME if latter is undefined.
PERLLIB A colon-separated list of directories in which to look for include and
configuration files.
Use wwwstat online using onworks.net services