Home
HTML to LaTeX (htmltolatex) is a Java
program for converting HTML (XHTML also supported) pages into LaTeX
format (or possibly to other markup formats - which depends only
on the configuration). Program is distributed under GNU/GPL licence.
Java 1.5 is required for running.
News
2010-03-01 - looking for project contributors
2006-09-30 - version 1.0 released (jar created, two packages can be downloaded now -
'all' package contains source, javadoc, samples, 'bin' package contains only executables)
2005-09-10 - version 0.1 released
Features
CSS supported, but only properties defining formatting (ie. color,
text-align, font-weight)
only basic style names supported (ie.
.class,
p.class,
#id,
p#id), multiple
style names supported (ie. h1, h2, { ... }),
entries like ul li li, p>span not supported
mappings between HTML tags and LaTeX commands can be defined
mappings between HTML entities (both named and numerical) and LaTeX commands can be defined
mappings between CSS properties and LaTeX commands can be defined
all of the configuration options are stored in a XML file
hyperlinks can be converted to footnotes, bibliography items, links using
hyperref package or ignored
HTML comments are converted to LaTeX comments
HTML comments starting with "LaTeX:" (case is ignored) are put in the output
file like non-comments (so it's possible to include LaTeX commands in HTML file)
program tries to recognize badly formed HTML documents, still it's
strongly recommended to convert valid (or at least completely well-formed)
documents, ie. foo will be converted as
foo
title and cite attributes are
converted to footnotes
Samples
Download
Feedback
If you find find a bug or have some feature requests please use
sourceforge.net or email me.
Related Links
Word-to-LaTeX (Word-to-XML) - converts Microsoft Word documents to LaTeX or XML format
Author
Michal Kebrt personal page (contains Word-to-LaTeX (Word-to-XML) convertor, mkrss (RSS reader) and
a couple of other programs)
michal.kebrt __AT__ gmail __DOT__ com