EnglishFrenchSpanish

OnWorks favicon

tagsoup - Online in the Cloud

Run tagsoup in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command tagsoup that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


tagsoup - convert nasty, ugly HTML to clean XHTML

SYNOPSIS


java -jar /usr/share/java/tagsoup.jar [ options ] [ files ]

DESCRIPTION


Rectify arbitrary HTML into clean XHTML, using a tailored description of HTML. The output
will be well-formed XML, but not necessarily valid XHTML.

--files
multiple input files should be processed into corresponding output files

--encoding=encoding
specifies the encoding of input files

--output-encoding=encoding
specifies the encoding of the output (if the encoding name begins with ``utf'', the
output will not contain character entities; otherwise, all non-ASCII characters are
represented as entities)

--html output rectified HTML rather than XML, omitting the XML declaration and any
namespace declarations

--method=html
output rectified HTML rather than XML (end-tags are omitted for empty elements, and
no character escaping is done in script and style elements)

--omit-xml-declaration
omit the XML declaration

--lexical
output lexical features (specifically comments and any DOCTYPE declaration)

--nons suppress namespaces in output

--nobogons
suppress unknown non-HTML elements in output

--nodefaults
suppress default attribute values

--nocolons
change explicit colons in element and attribute names to underscores

--norestart
don't restart any restartable elements

--ignorable
pass through ignorable whitespace (whitespace in element-only content) via SAX
method handler ignorableWhitespace

--any treat unknown non-HTML elements as allowing any content (default)

--emptybogons
treat unknown non-HTML elements as empty elements

--norootbogons
don't allow unknown non-HTML elements to be root elements

--doctype-system=system-id
force DOCTYPE declaration to be output with specified system identifier

--doctype-public=public-id
force DOCTYPE declaration to be output with specified public identifier

--standalone=[yes|no]
specify standalone pseudo-attribute in output XML declaration

--version=version
specify version pseudo-attribute in output XML declaration (does not affect actual
version of XML output)

--nocdata
treat the CDATA-content elements script and style as ordinary elements (mostly for
testing)

--pyx output PYX format rather than XML (mostly for testing)

--pyxin
input is PYX-format HTML (mostly for testing)

--reuse
reuse the same Parser object internally (for testing only)

--help output basic help

--version
output version number

TagSoup is a parser and reformatter for nasty, ugly HTML. Its normal processing mode is
to accept HTML files on the command line, or from the standard input if none are given,
and output them as clean XML to the standard output. The encoding is assumed to be the
platform-local encoding on input, and is always UTF-8 on output.

When the --files option is given, each input file is processed into an output file of the
corresponding name, with the extension changed to xhtml. If the extension is already
xhtml, it is changed to xhtml_.

TagSoup will repair, by whatever means necessary, violations of XML well-formedness. In
particular, it will fix up malformed attribute names and supply missing attribute-value
quotation marks. More significantly, it supplies end-tags where HTML allows them to be
omitted, and sometimes where it doesn't. It will even supply start-tags where necessary;
for example, if a document begins with a <li> tag, TagSoup will automatically prefix it
with <html><body><ul>.

Use tagsoup online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    Osu!
    Osu!
    Osu! is a simple rhythm game with a well
    thought out learning curve for players
    of all skill levels. One of the great
    aspects of Osu! is that it is
    community-dr...
    Download Osu!
  • 2
    LIBPNG: PNG reference library
    LIBPNG: PNG reference library
    Reference library for supporting the
    Portable Network Graphics (PNG) format.
    Audience: Developers. Programming
    Language: C. This is an application that
    can also...
    Download LIBPNG: PNG reference library
  • 3
    Metal detector based on  RP2040
    Metal detector based on RP2040
    Based on Raspberry Pi Pico board, this
    metal detector is included in pulse
    induction metal detectors category, with
    well known advantages and disadvantages.
    RP...
    Download Metal detector based on RP2040
  • 4
    PAC Manager
    PAC Manager
    PAC is a Perl/GTK replacement for
    SecureCRT/Putty/etc (linux
    ssh/telnet/... gui)... It provides a GUI
    to configure connections: users,
    passwords, EXPECT regula...
    Download PAC Manager
  • 5
    GeoServer
    GeoServer
    GeoServer is an open-source software
    server written in Java that allows users
    to share and edit geospatial data.
    Designed for interoperability, it
    publishes da...
    Download GeoServer
  • 6
    Firefly III
    Firefly III
    A free and open-source personal finance
    manager. Firefly III features a
    double-entry bookkeeping system. You can
    quickly enter and organize your
    transactions i...
    Download Firefly III
  • More »

Linux commands

Ad