< Previous | Contents | Next >
aspell
The last tool we will look at is aspell, an interactive spelling checker. The aspell program is the successor to an earlier program named ispell, and can be used, for the most part, as a drop-in replacement. While the aspell program is mostly used by other programs that require spell-checking capability, it can also be used very effectively as a stand-alone tool from the command line. It has the ability to intelligently check various types of text files, including HTML documents, C/C++ programs, email messages, and other kinds of specialized texts.
To spell check a text file containing simple prose, it could be used like this:
aspell check textfile
aspell check textfile
where textfile is the name of the file to check. As a practical example, let’s create a simple text file named foo.txt containing some deliberate spelling errors:
[me@linuxbox ~]$ cat > foo.txt
The quick brown fox jimped over the laxy dog.
[me@linuxbox ~]$ cat > foo.txt
The quick brown fox jimped over the laxy dog.
Next we’ll check the file using aspell:
[me@linuxbox ~]$ aspell check foo.txt
[me@linuxbox ~]$ aspell check foo.txt
As aspell is interactive in the check mode, we will see a screen like this:
The quick brown fox jimped over the laxy dog.
The quick brown fox jimped over the laxy dog.
1) jumped
2) gimped
3) comped
4) limped
5) pimped
i) Ignore
r) Replace
a) Add
b) Abort
6) wimped
7) camped
8) humped
9) impede
0) umped
I) Ignore all
R) Replace all
l) Add Lower
x) Exit
1) jumped
2) gimped
3) comped
4) limped
5) pimped
i) Ignore
r) Replace
a) Add
b) Abort
?
?
At the top of the display, we see our text with a suspiciously spelled word highlighted. In the middle, we see ten spelling suggestions numbered zero through nine, followed by a list of other possible actions. Finally, at the very bottom, we see a prompt ready to accept our choice.
If we press the 1 key, aspell replaces the offending word with the word “jumped” and moves on to the next misspelled word, which is “laxy.” If we select the replacement “lazy,” aspell replaces it and terminates. Once aspell has finished, we can examine our file and see that the misspellings have been corrected:
[me@linuxbox ~]$ cat foo.txt
The quick brown fox jumped over the lazy dog.
[me@linuxbox ~]$ cat foo.txt
The quick brown fox jumped over the lazy dog.
Unless told otherwise via the command line option --dont-backup, aspell creates a backup file containing the original text by appending the extension .bak to the file- name.
Showing off our sed editing prowess, we’ll put our spelling mistakes back in so we can reuse our file:
[me@linuxbox ~]$ sed -i 's/lazy/laxy/; s/jumped/jimped/' foo.txt
[me@linuxbox ~]$ sed -i 's/lazy/laxy/; s/jumped/jimped/' foo.txt
The sed option -i tells sed to edit the file “in-place,” meaning that rather than sending the edited output to standard output, it will rewrite the file with the changes applied. We also see the ability to place more than one editing command on the line by separating them with a semicolon.
Next, we’ll look at how aspell can handle different kinds of text files. Using a text edi- tor such as vim (the adventurous may want to try sed), we will add some HTML markup to our file:
<html>
<head>
<title>Mispelled HTML file</title>
</head>
<body>
<p>The quick brown fox jimped over the laxy dog.</p>
</body>
</html>
<html>
<head>
<title>Mispelled HTML file</title>
</head>
<body>
<p>The quick brown fox jimped over the laxy dog.</p>
</body>
</html>
Now, if we try to spell check our modified file, we run into a problem. If we do it this way:
[me@linuxbox ~]$ aspell check foo.txt
[me@linuxbox ~]$ aspell check foo.txt
we’ll get this:
<html>
<html>
<head>
<head>
<title>Mispelled HTML file</title>
<title>Mispelled HTML file</title>
</head>
<body>
</head>
<body>
<p>The quick brown fox jimped over the laxy dog.</p>
<p>The quick brown fox jimped over the laxy dog.</p>
</body>
</body>
</html>
</html>
1) HTML
2) ht ml
3) ht-ml
4) Hamel
5) Hamil
6) hotel
1) HTML
2) ht ml
3) ht-ml
i) Ignore
r) Replace
a) Add
b) Abort
I) Ignore all
R) Replace all
l) Add Lower
x) Exit
i) Ignore
r) Replace
a) Add
b) Abort
?
?
aspell will see the contents of the HTML tags as misspelled. This problem can be overcome by including the -H (HTML) checking-mode option, like this:
[me@linuxbox ~]$ aspell -H check foo.txt
[me@linuxbox ~]$ aspell -H check foo.txt
which will result in this:
<html>
<html>
<head>
<head>
<title>Mispelled HTML file</title>
<title>Mispelled HTML file</title>
</head>
<body>
</head>
<body>
<p>The quick brown fox jimped over the laxy dog.</p>
<p>The quick brown fox jimped over the laxy dog.</p>
</body>
</body>
</html>
</html>
1) Mi spelled
2) Mi-spelled
3) Misspelled
4) Dispelled
5) Spelled
i) Ignore
r) Replace
6) Misapplied
7) Miscalled
8) Respelled
9) Misspell
0) Misled
I) Ignore all
R) Replace all
1) Mi spelled
2) Mi-spelled
3) Misspelled
4) Dispelled
5) Spelled
i) Ignore
r) Replace
a) Add
b) Abort
l) Add Lower
x) Exit
a) Add
b) Abort
?
?
The HTML is ignored and only the non-markup portions of the file are checked. In this mode, the contents of HTML tags are ignored and not checked for spelling. However, the contents of ALT tags, which benefit from checking, are checked in this mode.
Note: By default, aspell will ignore URLs and email addresses in text. This be- havior can be overridden with command line options. It is also possible to specify which markup tags are checked and skipped. See the aspell man page for details.