pretzel - Online in the Cloud

Run pretzel in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command pretzel that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

pretzel - the universal prettyprinter generator

SYNOPSIS

pretzel [-qtgdh] [-o outfile] fileprefix

pretzel [-qtgdh] [-o outfile] file1 file2

DESCRIPTION

Pretzel is a program that generates a prettyprinter module from a formal description of
the way a certain language should be prettyprinted. A prettyprinter is a function or
program that rearranges source code to enhance its readability. Prettyprinters generated
by pretzel output LaTeX source code that can be used within your own documents. NB that
pretzel produces modules, not programs!

You have to provide two input files to pretzel that specify the way given source code
should be prettyprinted. These two files are called the formatted token file (suffix .ft)
and the formatted grammar file (suffix .fg).

From this input, pretzel generates two things: a valid flex(1) file that forms the
prettyprinting scanner and a valid bison(1) input file that can be used to build the
prettyprinting parser (which is the actual prettyprinter). There is a shell script
pretzel-it that faciliates using pretzel (see pretzel-it(1)). This man page is only meant
as a quick reference to pretzel usage. Look into the main documentation of pretzel if you
are new to all this.

Invoking pretzel
Invoking pretzel can take two forms: Either invoke it specifying only the common prefix of
the two input files, or specify both files seperately on the command line. If you specify
both files, the formatted token file comes first.

Examples
Say your input files are called foo.ft and foo.fg. Then you can say

pretzel foo

to invoke pretzel properly. If your files are called foo.ft and bar.fg then you would have
to say

pretzel foo.ft bar.fg

to do the job.

OPTIONS

Pretzel recognizes the following options:

-q Run quietly.

-t Process formatted token file only.

-g Process formatted grammar file only (options -t and -g are mutually
exclusive).

-d Print debug information to the screen.

-h Print full usage message.

-o name
Use name as prefix of the generated output files.

THE INPUT FILES

This section summarizes the format of the input files and the format command primitives
that pretzel supports.

The formatted token file
The formatted token file contains a list of token definitions with their corresponding
"prettyprinted" form. The prettyprinted form of a token will be called an attribute or a
translation.

The general outline of the formatted token file is

declarations

%%

token definitions

Normally, the declarations part is empty. You can put a general description of the file
here (as a C comment) and redefinitions of the default interface go here as well.

The token definitions section of the formatted token file contains a series of token
definitions of the form:

pattern token attribute

The pattern must be a valid regular expression (in terms of flex(1)) and must be
unindented. The token specifies the symbolic name of the token for the pattern and begins
at the first non-whitespace character after the pattern. The token name must be a legal
name for an identifier in Pascal notation and must be all in upper case. (Underlines are
allowed but not at the beginning of a word.)

The attribute for this token, that is it's prettyprinted form, consists of all text
between the two curling brackets { and }. Attributes can be either simple strings
(surrounded by double quotes), format commands (see below), your own C++ code (enclosed in
angled brackets [ and ], see below) or a combination of both joined together by an
optional + sign. Attribute definitions can cover several lines and the starting { needn't
stand on the same line as the token definition; however subsequent lines must be indented
with at least one blank or one tab.

If you define strings as part of an attribute definition, you have to specify them in a C
kind of fashion, i.e. you can insert newlines and tabs with \n and \t. But if you want to
insert a backslash into a string, you mustn't forget to put two backslashes \\ into the
input file. This is especially noteworthy if you are using TeX as typesetter.

If the definition of the attribute is omitted pretzel creates an attribute for this
pattern by default. The default attribute consists of the string containing the text
matched by the corresponding pattern.

The user himself may also refer to the matched text by using the sequence **. Thus

"foo" BAR

"foo" BAR { ** }

"foo" BAR { "foo" }

all have the same meaning.

You can use a | sign as a token name; this signals that the current regular expression has
the same token name (and also the same attribute) as the token specified in the following
line (empty lines are ignored). An attribute definition behind a | is illegal. However
you may specify regular expressions with neither a token name nor an attribute to give a
default rule or to eat up whitespace.

The declarations and the token definitions must be separated by a line containing only the
two characters %%.

Examples
The following examples are all legal token definitions:

[0-9] DIGIT

"{" OPEN { "\\{" indent force }

[a-z][a-z0-9]* ID { "{\\it " ** "}" }

"function" |

"procedure" PROC_INTRO { big_force + ** }

[\t\ \n] |

.

The formatted grammar file
In the formatted grammar file the user encodes the general prettyprinting grammar for the
programming language. This is done by specifying a context free grammar of the language
and by adding information about the creation of new attributes in every rule. Its general
outline looks like this:

token declarations

%%

grammar rules

The token declarations section may be empty and the separator between the two parts of the
file %% must appear unindented on a single line by itself.

The grammar rules section contains the collection of rules of the context free grammar
that can be accompanied by an attribute definition. A rule is specified by stating the
resulting token, a colon and then the series of tokens which will be reduced by this rule.
The rule is ended by a semicolon. A block definition in Pascal for example might look like
this:

block : BEGIN stmt_list END ;

Following the token list on the right side of the colon can be an attribute definition;
this definition states, how the translation of the produced symbol is obtained from the
tokens on the right side of the rule.

An attribute definition is bracketed amidst curling brackets { and } and can again consist
of strings (in double quotes), format commands or C code (enclosed in angled brackets [
and ], see below) joined together by an optional +. But here you can also refer to the
attributes of the tokens on the right side of the rule. This is done in a slightly awkward
notation with a number that is preceded with a $ dollar sign. The numbers refer to the
order of appearance of the symbols on the right side of the rule. So $1 refers to the
first token of the rule, $2 to the second, and so on.

Again attribute definitions are allowed to span several lines and strings must be
specified in C manner.

The attribute definition may be omitted. If this is so, pretzel will by default form the
attribute of the produced symbol from the simple concatenation of the attributes on the
right side of the rule. Of course you may also have empty right sides of a rule (to
produce things out of nothing) or simply concatenate two or more rules resulting in the
same symbol with a |.

For every terminal token that appears in the grammar rules a special line has to be
written into the declarations section of the file. These definitions are of the form

%token tokenname

It is very important not to forget this.

Examples
For example, here again is the possible definition of a block in Pascal, now with an
example attribute definition:

block : BEGIN stmt_list END { $1 $2 force $3 } ;

The attribute of a block will therefore consist of the attributes of the BEGIN and
stmt_list tokens, joined together with a force command and the translation of the END
token.

These two lines mean the same:

stmt : block SEMI ;

stmt : block SEMI { $1 $2 } ;

These are legal rules too:

stmt_list : { force }
| stmt_list stmt SEMI { $1 $2 $3 force };

Comments and Code
There is a very simple way of putting comments into the formatted token and formatted
grammar files. This is done in a C++ kind of manner by preceding the comment with a double
slash //. All characters between this sign and the end of the line are ignored by
pretzel.

In both files you can put additional C/C++ code before and after the definitions/grammar
sections. If you want to insert code at the end of your file, you have to put a second %%
on a line by itself and put the code behind it. C/C++ code before the definitions/rules
section has to be tied in with a %{, %} pair. Inserting extra code is interesting for
people who want to access it from within the attribute definition.

Code within attribute definitions
From version 2.0 onwards pretzel allows to insert C++ code into attribute definitions.
This is how pretzel expects you to write code inside your pretzel input files:

Code fragments are bracketed within angled brackets. Any angled brackets that appear
within the C code must be escaped with a backslash. There can blocks of code before and
behind the attribute definition which are called starting code and endingcode. Only one
starting or ending code block is allowed. Both are totally optional, but if you want to
specify either or, you need an attribute definition. Starting code is executed before the
attribute of the new token is built, ending code is executed after building the attribute
and before returning to the calling function (in the scanner).

Code parts within attribute definitions must return a pointer to an Attribute class object
(see file attr/attr.nw in the pretzel distribution for details). Within the formatted
token file, the matched text is visible to you in form of a char* yytext variable. The
symbolic names of the tokens are available by the same name that pretzel gives them.
Starting code, code within attribute definitions and ending code is totally optional. But
at any place where they are allowed, only one bracketed code bit may be placed. Here's an
example from the formatted grammar file:

id : ID { [lookup($1) ? create("{\\bf ") :

create("{\\it ")] $1 "}" };

This example shows how to format an identifier depending on whether it is in a lookup
table or not. Identifiers could be installed in the table for example like this:

typedef : TYPEDEF_LIKE INT_LIKE ID

[ install($3); ]

{ $1 $2 "{\\bf " $3 "}" };

More examples can be found in the Pretzelbook. Common routines to escape identifiers, to
build and manage lookup tables, to convert to and from Attribute* or to output debug
information can be found in the files belonging to the C prettyprinter in the directory
languages/cee of the pretzel distribution.

The set of format commands
Here's a list of the format commands supported by pretzel and their meaning:
null empty command.
indent indents the next line a little more.
outdent
takes back the last indentation (de-indent).
force forces a line break.
break_space
denotes a possible space for a line break.
opt1...opt9
denotes an optional line break with the continuation line indented a litte with
respect to the normal starting position.
backup denotes a small backspace.
big_force
forces a line break and inserts a little extra space.
no_indent
causes the current line to be output flushleft.
cancel obliterates any break_space, opt, force or big_force command that immediatly
precedes or follows it and also cancels any backup command that follows it.

For a complete reference on how to write pretzel input, look into the Pretzelbook
which is included in the pretzel distribution.

Format command preprocessing
The format commands are preprocessed according to the following two rules:

1. A sequence of consecutive
break_space, force, and/or big_force commands is replaced by a single command (the
maximum of the given ones).

2. The cancel command cancels any break_space, opt, force or big_force command that
immediatly precede or follow it and also cancels any backup command that follows
it.

THE OUTPUT FILES

If pretzel runs without error, you will obtain the definition of a C++ prettyprinter class
in form of two files. The first file is a valid bison(1) file from which the actual
prettyprinting parser class can be obtained. The second file (generated from the formatted
token file) can be processed with the flex(1) scanner generator to form the prettyprinting
scanner class used by the parser.

The bison file
The generated bison file contains the definitions for a prettyprinting parser class that
is a subclass of the following abstract base class (contained in the file Pparse.h within
the pretzel include directory):

#include<iostream>

#include"attr.h"

#include"output.h"

class Pparse {

public:
Pparse() {};

~Pparse() {};

virtual int prettyprint(istream*, ostream*) = 0;

virtual int prettyprint(istream*, Output*) = 0;
};

The prettyprinter generated by pretzel will be a subclass of the following form:

#include Pparse.h // include abstract base class

class PPARSE_NAME : public Pparse {

public:
PPARSE_NAME(); ~PPARSE_NAME();

int prettyprint(istream*, ostream*);

int prettyprint(istream*, Output*);

void debug_on(); void debug_off();
};

The name of the class may be changed by redefining the preprocessor macro PPARSE_NAME
within the formatted grammar file. The actual prettyprinting function is prettyprint that
reads text from an input stream (i.e. a C++ istream object) and outputs the results to an
output stream (i.e. a C++ ostream object, see ios(3C++)). The second overloaded version
of prettyprint takes an Output object (see the file output/output.nw and the Pretzelbook
in the pretzel distribution for details) and uses this to output the prettyprinted code.
The debug functions can be used to turn debugging output to cerr on and off.

The flex file
The prettyprinting parser class relies on the service of a prettyprinting scanner that can
be produced using the second pretzel file. It contails a complete definition of a scanner
subclass of this abstract base class (see file Pscan.h in the pretzel include directory):

#include<iostream> #include"attr.h"

class Pscan {

public:
Pscan(istream*) {}; ~Pscan() {};

virtual int scan(Attribute**) = 0;
};

The scanner must be initialized with a C++ istream pointer from which it takes its input.
A call to the actual scan function returns an integer (the token code of the token just
scanned or 0 on end-of-file) plus a call by reference attribute containing the contents of
the token (see file attr/attr.nw from the pretzel distribution).

The produced prettyprinting scanner class is a subclass and looks like this:

#include Pscan.h // include abstract base class

class PSCAN_NAME : public Pscan {

public:
PSCAN_NAME(istream*);

~PSCAN_NAME();

int scan(Attribute**);

The name of the scanner can be changed within the formatted token file by redefining the
PSCAN_NAME macro within the declarations section. The scanner class expects to find token
definitions common to the scanner and the parser in a file called ptokdefs.h and will try
to include this file. You either have to provide this file yourself or use the -d option
of Bison to create one that fits a formatted grammar (see bison(1)). You may change the
name of the file that the scanner expects by redefining the PTOKDEFS_NAME macro in the
declarations section of the formatted token file. Commen header files for the abstract
base classes and the default subclasses reside in the pretzel include directory.

Use pretzel online using onworks.net services