< Previous | Contents | Next >
groff
groff is a suite of programs containing the GNU implementation of troff. It also in- cludes a script that is used to emulate nroff and the rest of the roff family as well.
While roff and its descendants are used to make formatted documents, they do it in a way that is rather foreign to modern users. Most documents today are produced using word processors that are able to perform both the composition and layout of a document in a single step. Prior to the advent of the graphical word processor, documents were of- ten produced in a two-step process involving the use of a text editor to perform composi- tion, and a processor, such as troff, to apply the formatting. Instructions for the format- ting program were embedded into the composed text through the use of a markup lan- guage. The modern analog for such a process is the web page, which is composed using a text editor of some kind and then rendered by a web browser using HTML as the markup language to describe the final page layout.
We’re not going to cover groff in its entirety, as many elements of its markup language deal with rather arcane details of typography. Instead we will concentrate on one of its macro packages that remains in wide use. These macro packages condense many of its low-level commands into a smaller set of high-level commands that make using groff much easier.
For a moment, let’s consider the humble man page. It lives in the /usr/share/man directory as a gzip compressed text file. If we were to examine its uncompressed con- tents, we would see the following (the man page for ls in section 1 is shown):
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | head
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.35.
.TH LS "1" "April 2008" "GNU coreutils 6.10" "User Commands"
.SH NAME
ls \- list directory contents
.SH SYNOPSIS
.B ls
[\fIOPTION\fR]... [\fIFILE\fR]...
.SH DESCRIPTION
.\" Add any additional description here
.PP
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | head
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.35.
.TH LS "1" "April 2008" "GNU coreutils 6.10" "User Commands"
.SH NAME
ls \- list directory contents
.SH SYNOPSIS
.B ls
[\fIOPTION\fR]... [\fIFILE\fR]...
.SH DESCRIPTION
.\" Add any additional description here
.PP
Compared to the man page in its normal presentation, we can begin to see a correlation between the markup language and its results:
[me@linuxbox ~]$ man ls | head
LS(1) User Commands
LS(1)
[me@linuxbox ~]$ man ls | head
LS(1) User Commands
NAME
NAME
ls - list directory contents
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
SYNOPSIS
ls [OPTION]... [FILE]...
The reason this is of interest is that man pages are rendered by groff, using the man- doc macro package. In fact, we can simulate the man command with the following pipe- line:
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | groff -mandoc -T ascii | head
LS(1) User Commands LS(1)
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | groff -mandoc -T ascii | head
LS(1) User Commands LS(1)
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
Here we use the groff program with the options set to specify the mandoc macro package and the output driver for ASCII. groff can produce output in several formats. If no format is specified, PostScript is output by default:
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | groff -mandoc | head
%!PS-Adobe-3.0
%%Creator: groff version 1.18.1
%%CreationDate: Thu Feb 5 13:44:37 2009
%%DocumentNeededResources: font Times-Roman
%%+ font Times-Bold
%%+ font Times-Italic
%%DocumentSuppliedResources: procset grops 1.18 1
%%Pages: 4
%%PageOrder: Ascend
%%Orientation: Portrait
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | groff -mandoc | head
%!PS-Adobe-3.0
%%Creator: groff version 1.18.1
%%CreationDate: Thu Feb 5 13:44:37 2009
%%DocumentNeededResources: font Times-Roman
%%+ font Times-Bold
%%+ font Times-Italic
%%DocumentSuppliedResources: procset grops 1.18 1
%%Pages: 4
%%PageOrder: Ascend
%%Orientation: Portrait
We briefly mentioned PostScript in the previous chapter, and will again in the next chap- ter. PostScript is a page description language that is used to describe the contents of a printed page to a typesetter-like device. If we take the output of our command and store it to a file (assuming that we are using a graphical desktop with a Desktop directory):
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | groff -mandoc >
~/Desktop/foo.ps
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | groff -mandoc >
~/Desktop/foo.ps
An icon for the output file should appear on the desktop. By double-clicking the icon, a page viewer should start up and reveal the file in its rendered form:
Figure 4: Viewing PostScript Output With A Page Viewer In GNOME
What we see is a nicely typeset man page for ls! In fact, it’s possible to convert the Post- Script file into a PDF (Portable Document Format) file with this command:
[me@linuxbox ~]$ ps2pdf ~/Desktop/foo.ps ~/Desktop/ls.pdf
[me@linuxbox ~]$ ps2pdf ~/Desktop/foo.ps ~/Desktop/ls.pdf
The ps2pdf program is part of the ghostscript package, which is installed on most Linux systems that support printing.
Tip: Linux systems often include many command line programs for file format
conversion. They are often named using the convention of format2format. Try us- ing the command ls /usr/bin/*[[:alpha:]]2[[:alpha:]]* to iden- tify them. Also try searching for programs named formattoformat.
For our last exercise with groff, we will revisit our old friend distros.txt once more. This time, we will use the tbl program which is used to format tables to typeset our list of Linux distributions. To do this, we are going to use our earlier sed script to add markup to a text stream that we will feed to groff.
First, we need to modify our sed script to add the necessary markup elements (called re- quests in groff) that tbl requires. Using a text editor, we will change distros.sed to the following:
# sed script to produce Linux distributions report
1 i
.TS
center box;\ cb s s
cb cb cb\ l n c.
Linux Distributions Report
=
# sed script to produce Linux distributions report
1 i
.TS
center box;\ cb s s
cb cb cb\ l n c.
Linux Distributions Report
=
Name
_
Version
Released
Name
_
s/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)$/\3-\1-\2/
$ a
.TE
s/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)$/\3-\1-\2/
$ a
.TE
Note that for the script to work properly, care must been taken to see that the words “Name Version Released” are separated by tabs, not spaces. We’ll save the resulting file as distros-tbl.sed. tbl uses the .TS and .TE requests to start and end the table. The rows following the .TS request define global properties of the table which, for our example, are centered horizontally on the page and surrounded by a box. The remaining lines of the definition describe the layout of each table row. Now, if we run our report- generating pipeline again with the new sed script, we’ll get the following :
[me@linuxbox ~]$ sort -k 1,1 -k 2n distros.txt | sed -f distros-tbl
.sed | groff -t -T ascii 2>/dev/null
+------------------------------+
| Linux Distributions Report |
+------------------------------+
[me@linuxbox ~]$ sort -k 1,1 -k 2n distros.txt | sed -f distros-tbl
.sed | groff -t -T ascii 2>/dev/null
+------------------------------+
| Linux Distributions Report |
+------------------------------+
| Name Version Released |
+------------------------------+
|Fedora | 5 | 2006-03-20 | | |
|Fedora | 6 | 2006-10-24 | | |
|Fedora | 7 | 2007-05-31 | | |
|Fedora | 8 | 2007-11-08 | | |
|Fedora | 9 | 2008-05-13 | | |
|Fedora | 10 | 2008-11-25 | | |
|SUSE | 10.1 | 2006-05-11 | | |
|SUSE | 10.2 | 2006-12-07 | | |
|SUSE | 10.3 | 2007-10-04 | | |
|SUSE | 11.0 | 2008-06-19 | | |
|Ubuntu | 6.06 | 2006-06-01 | | |
|Ubuntu | 6.10 | 2006-10-26 | | |
|Ubuntu | 7.04 | 2007-04-19 | | |
|Ubuntu | 7.10 | 2007-10-18 | | |
|Ubuntu | 8.04 | 2008-04-24 | | |
|Ubuntu | 8.10 | 2008-10-30 | | |
+------------------------------+
Adding the -t option to groff instructs it to pre-process the text stream with tbl. Likewise, the -T option is used to output to ASCII rather than the default output medium, PostScript.
The format of the output is the best we can expect if we are limited to the capabilities of a terminal screen or typewriter-style printer. If we specify PostScript output and graphically view the output, we get a much more satisfying result:
[me@linuxbox ~]$ sort -k 1,1 -k 2n distros.txt | sed -f distros-tbl
.sed | groff -t > ~/Desktop/foo.ps
[me@linuxbox ~]$ sort -k 1,1 -k 2n distros.txt | sed -f distros-tbl
.sed | groff -t > ~/Desktop/foo.ps
Figure 5: Viewing The Finished Table