colmux - Online in the Cloud

This is the command colmux that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


colmux - multiplex communications to multiple systems running collectl from a single
system

SYNOPSIS


colmux [-command "collectl-switches... [-p filespec]]" [-address addr1[,addr2,...]|-addr
filename] [-cols col1[,col2...]] | [-column num]

DESCRIPTION


This utility gathers up data generated by collectl from multiple systems and multiplexes
it into a single consolidated format. It runs in essentially 2 distinct modes, the first
is known as real-time, because data is retrieved and displayed in real time. The second
is playback mode because data is played back from existing collectl data files.

There are also 2 general formats for the data being displayed. The first is a multi-line
display in which the data is displayed in the native form that collectl displays it,
except it is sorted by a distint column, essentially allowing one to see the TOP producers
of that data. The second format is a single line display in which one or more distinct
data elements from each source is displayed on the same line. This latter format is never
sorted, but rather positionally organized by the name of the system that generated it.

Collectl will be then be executed, using any optional switches specified by -command, on
each of the systems specified by -address OR read those addresses from a file it the
target of that switch is a filename rather than a list of hosts OR on the local system if
-address is not specified. See collectl for details of the various switches. In some
cases certain collectl switches will not make sense in a colmux environment and if chosen
will generate an error. Further, if hosts are specified with -address, they should be a
individual addresses or hostnames separated by commas. In turn, any of them can be in
what those familiar with pdsh would recognize as -w format.

Colmux will then execute the collectl command, gather the results from all sources for a
particular interval and display them one result per line, sorted by the specified column
OR all on the same line in groups specified by -cols. The number of lines displayed is
set to the size of the terminal window by default, but can be changed using -lines. The
one exception is the use of -nosort which only applies to the playback of existing
collectl raw files. In this mode all records for a particular interval will be displayed
and the sorting bypassed, making this a speedy and convenient mechanism for gathering all
data from all systems in one place for potential further processing.

Colmux will never modify the size of the terminal window so to see more or wider lines
either expand the window or override the number of display lines and run it again. If the
number display lines is set greater then the terminal height or 0, colmux will no longer
overlay the previous window and simply run in a continuous scrolling mode.

Common Switches

-address list|pdsh|filename
Specify any combination of addresses as hostnames OR in pdsh -w format OR a
filename containing a list of hostnames/addresses, 1 per line. You MUST have
passwordless ssh access to these nodes. If a different username is required, be
sure to specify addresses in username@host format noting you do not have to have
the same username on each host. If specified, these usernames will override those
specified with the -username switch. rsh access is not supported.

-command switches
One can specify virtually any collectl command here, both in real-time or playback
mode. Some switches may only be used during one mode or the other and colmux will
usually let you know if you specify an invalid combination or an otherwise
restricted switch. Only those directly affecting colmux are listed below:

--from, --thru
Limit the timeframe for data being played back, noting you can include both
the from and thru times with the --from switch if you separate then with a
hyphen.

-o time-format
This is a "magic" switch in that it not only tells collectl how to display
dates/times (no other options are permitted using -o other than those from
the set [dDTm]), it also tells colmux how to display dates/times too.

In single line mode, the timestamp will either come from the host system in
real-time mode OR the first host when run in playback mode. This is the
most common use/need for this switch. But be careful in choosing column
numbers with -cols as the position of the data shifts by 1 when time is
included and by 2 if date and time are. Using -test will correctly show the
shifted positions but only if you include -o with the command at the same
time you use -test.

In real-time/top mode this switch is not allowed since colmux simply reports
the current time of the system it is running on.

When playing back data multi-line formatted data from one or more files, a
timestamp for each interval is reported, consisting of the time of that
interval. When this switch is included, each line will be tagged with an
appropriate timestamp since on rare occasions they may not necessarily all
be identical.

-p playback-file
This switch tells colmux to run in playback mode. The filename should
include the directory location and is usually specified with wild cards,
limiting the selected file(s) to a specific date. When those files are on
the same host (-address is not specified), they may be for multiple hosts,
but when the files are on remote hosts they must all be for be that unique
host. If the file specification includes the string TODAY or YESTERDAY they
will be replaced with *yyyymmdd* for that date.

-P
Run collectl in plot-format. This allows one to specify just about any
combination of subsystems since all data is always displayed on a single
line. However, due to the lack of formatting, this also makes no sense for
multi-line displays and is therefore only supported in single-line format.

-help
Show a brief help message and exit.

-hostwidth n
By default, colmux set the hostwidth to 8, unless it sees something wider and for
most situations this is sufficient. However, if one specifies hostnames that are
aliases of the longer hostname, colmux has no way of knowing the real hostlengths
until after it starts receiving data from collectl and the formatting will be off
if the hostnames are longer than the default. To overcome this problem, use this
switch to force the hostname to be wider.

-lines
Change the number of lines that are displayed for each interval in multi-line mode.
The default will be determined by the terminal size returned by the linux resize
command if present. If that command is not present, the size will be initially set
to 24. If -lines is greater than the terminal size or 0, top-like behavior will
not be used when in real-time mode.

Single-line format controls the number of lines displayed between headers. A value
of 0 will only display the header one time.

-noescape
Colmux uses brute-force screen formatting, that is it generates its own VT100
escape sequences to clear lines and/or move the cursor. On some occasions you may
want to disable this sequences if you wish to recode the output and do your own
post-processing of it. This switch will do just that.

-port
Sometimes a remote version of collectl is already using the default socket. This
allows one to start another instance and override that value.

-test
This tells colmux to execute the specified collectl command either locally or on
the first remote system specified by -address, print the associated header with the
selected column(s) highlighted and also include each column name along with its
ordinal number, making it fairly easy to make sure you've selected the right
column(s).

-username name
Use this username for ALL ssh commands. It can be overridden for specific hosts by
specifying them with the -address switch with the desired hostnames.

-version
Display the version and exit. It will also report if Term::ReadKey is installed
and if so what its version number is.

Playback Mode Specific

The following additional switches only apply to playback mode. There are no real-time
mode specific switches.

-delay seconds
Introduce a delay between intervals in seconds. You can specify fractional values.
Not using this switch will cause the output to be displayed as fast as it can be
rendered.

-home
Move the cursor to the home position (upper left-hand corner) of the display to use
a top-like display format. This ONLY applies to multi-line mode when in playback
mode and provides a mechanism for displaying recorded data in a top-like fashion.

-hostfilter addr[,addr]
When playing back files for multiple hosts on the local system, sometimes you do
not want to play back ALL the host files. This filter allows you to specify only
those hosts which you want to process. The format of the list of addresses is
specified in the same way as -address except that you cannot specify a filename.

-nosort
Intended primarily for output that would be redirected to a file, do not sort or
include any escape sequences in the output.

Multi-Line Format

When there is more output then will fit on the screen, colmux includes the text:
Displaying: lines xx thru yy out of zz
on the right-side of the top line of the display, where xx is typically 1.

However, once colmux is running, one might want to look at subsequent lines, ie
those below the bottom of the screen and therefore invisible. If the ReadKey
module is installed, one can simply use the PageDown key to move down the display
and the PageUp key to move in the other direction. If ReadKey is not installed,
typing the multi-key sequences pd<ENTER> or pu<ENTER> will cause the same thing to
happen.

-colhelp
When you wish to change the sort column and the arrow keys aren't available to you,
it may be cumbersome to identify the number of the column to type in followed by
RETURN. This tells colmux to display the numbers over each column eliminating the
need to manually count them and find the one you want.

-column num
Set the sort column to this number. The column numbering is determined by the
columns returned by collectl for the requested command. Since date/time columns
are optional for non-plot data, their inclusion will change the numbering of the
columns so if you are not sure you selected the correct column, you should first
execute your command with -test included.

You can also change the column number interactively with the RIGHT/LEFT arrow keys
IF the ReadKey module is installed (see colmux -version) OR simply type it in
followed by the <ENTER> key.

-finalcr
There is a real odd case in which you might want to pipe colmux real-time output to
a script for further processing. However, if you do this you can't read the final
line with a routine that expects a terminating CR, like python's readline().
Rather, that last line and the one that follows will be returned as one long
string. This switch tell colmux to insert that final CR, which WILL mess up the
screen under normal operations, so be forewarned.

-hostformat char:pos
There are times one has long hostnames which can either take up valuable screen
real estate or are simply painful to look at. This switch may evolve over time and
is currently targetted as hostnames that have repeating parts along with a unique
part, separated by a character such as a hyphen. This switch allows you to specify
a single character followed by the piece of the hostname you'd like to see
displayed. For example, if you have a hostname like aaa-bbbb-cccc-dddd,
-hostformat -:3 will cause the cccc piece to be displayed.

-nobold
Do not highlight the selected column. This may be useful when redirecting output
to a file and you do not want the associated escape sequences to be written to it.

-reverse
Reverse the default sort order. You can also change the direction of the sort
interactively with the UP/DOWN arrow keys IF the ReadKey module is installed (see
colmux -version)
OR simply type the r key and <ENTER>.

-zero
Do not display any rows with 0 in the sort column. You can also type
z<ENTER>interactively.

Single-Line Format

-col1000
Divide each column by 1000 before display

-colk
Divide each column by 1024 before display

-collog10
Remap large numbers to a smaller number of values by taking the log10 of them and
further transforming by the followign mapping: 0,1 to 0, 10 to 10, 100 to 20, 1000
to 30, 10000 to 40, ... 1e9 to 90.

-cols num,...
Group all data together for each host by column number(s). As with -column, you
can confirm the correct column(s) have been selected by first running with -test.

-colnodet
Do not show data for individual hosts, just display the totals.

-colnodiv num,...
Do not divide the specified column numbers by 1000 or 1024 when col1000 or colk or
apply the colllog10 transformation when specified. A typical usage is if you want
to look at cpu loads as well as network or disk stats in which case you may want to
divide the latter by 1024 but not the cpu.

-colnoinst
Do no include instance portion (and surrounding brackets) in totals column headers.

-coltotal
Include the totals for each column to the right.

-colwidth
Set the output columns to this width, typically used in conjunction with -col1000
or colk to allow more hosts to fit onto the same line. It can also be used if the
host names are too narrow for column headers and you have room to display wider
names.

Exception Reporting Specific

In single-line format, rather than wait for all hosts to report their data, colmux simply
reports the last data seen when the time to generate a line of output has come. In most
cases, these do reflect the most recent data values but in times of load, the data may be
late getting to colmux and so a previous value may be reported. If the age of that data
exceeds a defined number of intervals, the default is currently 2, an exception value will
be reported of -1. At other times it has been seen where kernel/driver bugs may cause
incorrect values to be reported as negative numbers and those values are also reported as
-1. Both the age and exception values can be changed with the following switches.

-age number
When initially starting up and all hosts have not yet reported any data, colmux
will display a -1 to indicate no data has been seen yet. If during processing a
host fails to report in -age intervals, the default is 2, colmux will also report a
-1 indicating the data is stale.

-negdataval val
In some cases, there could be erroneous data reported as negative numbers (though
sometimes negative numbers are valid). When specified, replace any negative
numbers with this value.

-nodataval val
This switch allows you to change the -1 that is normally reported for missing or
stale data to the specified value, most commonly 0.

Diagnostics

The following switches are intended more for diagnostic purposes than normal operation,
though are also worth using on appropriate occasions.

-debug val
This switch is for generating diagnostic information at various levels. It is
actually a bit mask, whose values are listed in the beginning on colmux itself.
Perhaps the most useful value is 1 as it will cause colmux to display all the
remote commands issues to each host in the address list and can often reveal
problems when things don't seem to be working correctly

-nocheck
This switch was initially included in an earlier version when remote host checking
was causing problem in some cases and by skipping those checks, colmux would run
more reliably. While it is felt that as of V3.2.0 these reachability checks are
now reliable and should not be skipped, this switch has been left in place.

-quiet
By default and when -nocheck not specified, colmux checks the versions of all
collectl instances against that of the first node found to be running collectl and
if different, reports the mismatch. This switch suppresses that warning.

When a connection is received from an unexpected address, a warning is also
reported and the request promptly ignored. This switch also suppresses those
messages as well. For more information on problems connecting, see CONNECTION
PROBLEMS.

-reachable
By default, when a node is found to not be reachable, colmux will remove it from
its list of hosts and continue execution. This switch will tell colmux to exit
when all hosts are not reachable.

Miscellaneous

There are 2 switches whose descriptions don't really fit anywhere else:

-colbin path
On rare occasions, such as testing a patch to collectl in a copy NOT in /usr/bin,
you may want to tell colmux to use that copy instead of the standard one. Use this
switch to point to that copy. Naturally that copy must exist in that location on
all systems.

-keepalive secs
Colmux uses ssh to start collectl on each remote machine and then communications
between collectl and colmux occur over a socket. Normally, ssh is configured to
timeout after an interval of inactivity, such as 30 minutes, which means a long-
running colmux session will begin to lose connections when this interval is
reached. By specifying a keepalive interval, you're telling the ssh to send a
periodic keepalive to the other end so that connection doesn't get dropped.

-retaddr addr
Tell remote collectls to open a socket on this address instead of the preselected
one. For more details on this, see CONNECTION PROBLEMS.

-timeout secs
By default, collectl waits up to 10 seconds for remote instances of collectl to
connect back. On slower networks or when a very large number of instances have
been started, they may fail to connect back in time. This switch will extend that
timeout, but it also requires collectl V3.6.4 be used because earlier version do
not support this feature.

-timerange secs
When colmux starts up and checks the connectivity to all the machines specified by
-addr, it also gets their current date/time and using that computes the range of
system times across all nodes. If that time is found to be more then -timerange
seconds, colmux generates a warning as this difference could cause reporting
probems. One can increase the range to get rid of the message (not recommended
unless other factors are preventing nodes from responding quickly enough to the
date command) OR suppress the warning with -quiet.

PLAYBACK MODE RESTRICTIONS


All logs being played back must have been collected using the same interval as colmux only
looks at the first file/host to determine the appropriate value.

It is assumed all clocks are reasonably well synchronized as colmux uses time to determine
which data is to be displayed as a set.

All files must be in the same directory on all systems and that directory must be included
in the playback file specification

All files on a remote host must be for that host only

EXAMPLES


Run collectl on 3 nodes, showing CPU, Disk and Network statistics once a second and sorted
by column 1, which happens to be total cpu.

colmux -addr abc,def,xyz

Dynamically display top processes on nodes n1-n10 of a cluster once a second, sorted by
column 5.

colmux -addr n[1-10] -command "-sZ :1" -column 5

Do the same for yesterday, between the hours of 5AM and 6AM, being sure to stall for 1/2
second between intervals. Note, if you leave off -addr you could put all the logs into
/var/log/collectl on the local host and play them back from there.

colmux -addr n[1-10] -command "-sZ -p/var/log/collectl/YESTERDAY -from 05:00-06:00"
-column 5 -delay .5

Look at the amount of mapped and slab memory consumed on nodes n1-n10 and n15 in real-
time, every 2 seconds using single-line format. Include totals and preface each line with
the time. Since memory sizes tend to be rather large, divide each by 1024 so we see MB
rather than KB. Note that the columns numbers are always displayed are ascending order
regardless of their order in -cols. To be sure, first test the column numbers.

colmux -addr n[1-10,15] -command "-sm -i2 -oT" -cols 6,7 -coltot -colk -test
colmux -addr n[1-10,15] -command "-sm -i2 -oT" -cols 6,7 -coltot -colk

Display most active disks, based on KB written, on nodes n1, n4 and n5.

colmux -addr n1,n4,n5 -command "-sD" -column 6

Here is a cool trick. Collectl currently lets you look at top processes with the --top
switch and even choose a sort column by name. However, if you want to change the column
you need to exit, then rerun collectl with a different sort column name. But if you run
it like this example, you get the power of colmux to dynamically change the sort columns
with the arrow keys! You can also use this technique to have collectl dynamically sort
any local multi-line data such as slabs or even detail data like CPU, Disk, Lustre and
Networks too! Naturally this technique works just as well with playing back data as well.

colmux -command "-sZ -i:1"

RESTRICTIONS


colmux requires passwordless ssh between the node it is running on those it is monitoring.
also be sure the port you are using for communications, the default is 2655, if open

CONNECTION PROBLEMS


The way colmux works is to choose an address it wants to communicate over and starts up
one or more remote copies of collectl, telling them to connect back to colmux using that
address. The easiest way to see this, is to run colmux with -noesc, which tells it NOT to
issue any escape sequences and therefore not to run in full screen mode. The addional
switch of -debug 1 tells it to show the remote collectl startup command. When there is a
communications problem you will typically see 'connection timed out' messages displayed.

There are actually a couple of possibilities here, one of which is a firewall is
preventing connections and the easiest way to test this is run collectl on the local
machine like this: collectl -Aserver. This tells collectl run as a server, listening for
connections just like colmux. Then log into a remote machine and run
/usr/share/collectl/util/client.pl addr-of-server which tells client.pl to open a socket
to that copy of collectl. It should fail just like when it was run via colmux, so try
opening the firewall and try it again. If it fixes the problem, it was indeed the
firewall blocking things and colmux should now work just fine.

Sometimes there are multiple interfaces defined on the machine hosting colmux and in some
cases only some addresses will allow socket connections. Again, using client.pl on the
remote machine try connecting back to collectl over different addresses and when you find
one that works, tell colmux to use that address for communication via the -retaddr switch.

Use colmux online using onworks.net services



Latest Linux & Windows online programs