textidote

授权协议 GPL-3.0 License
开发语言
所属分类 企业应用、 LaTeX排版系统
软件类型 开源软件
地区 不详
投 递 者 吴刚毅
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

TeXtidote: a correction tool for LaTeX documents and other formats

Downloads

Have you ever thought of using a grammar checker on LaTeX files?

If so, you probably know that the process is far from simple. Since LaTeXdocuments contain special commands and keywords (the so-called "markup") thatare not part of the "real" text, you cannot run a grammar checker directly onthese files: it cannot tell the difference between markup and text. The otheroption is to remove all this markup, leaving only the "clear" text; however,when a grammar tool points to a problem at a specific line in this clear text,it becomes hard to retrace that location in the original LaTeX file.

TeXtidote solves this problem; it can read your original LaTeX file andperform various sanity checks on it: for example, making sure that everyfigure is referenced in the text, enforcing the correct capitalization oftitles, etc. In addition, TeXtidote can remove markup from the file and sendit to the Language Tool library, whichperforms a verification of both spelling and grammar in a dozen languages.What is unique to TeXtidote is that it keeps track of the relative position ofwords between the original and the "clean" text. This means that it cantranslate the messages from Language Tool back to their proper locationdirectly in your source file.

You can see the list of all the rules checked by TeXtidote at the end of thisfile.

TeXtidote also supports spelling and grammar checking of files in the Markdown format.

Getting TeXtidote

You can either install TeXtidote by downloading it manually, or by installingit using a package.

Under Debian systems: install package

Under Debian systems (Ubuntu and derivatives), you can install TeXtidote usingdpkg. Download the latest .deb file in theReleases page; supposeit is called textidote_X.Y.Z_all.deb. You can install TeXtidote by typing:

$ sudo apt-get install ./textidote_X.Y.Z_all.deb

The ./ is mandatory; otherwise the command won't work.

Manual download

You can also download the TeXtidote executable manually: this works on alloperating systems. Simply make sure you have Java version 8 or later installedon your system. Then, download the latestrelease ofTeXtidote; put the JAR in the folder of your choice.

Using TeXtidote

TeXtidote is run from the command line. The TeXtidote repository contains asample LaTeX file calledexample.tex.Download this file and save it to the folder where TeXtidote resides. You thenhave the choice of producing two types of "reports" on the contents of yourfile: an "HTML" report (viewable in a web browser) and a "console" report.

HTML report

To run TeXtidote and perform a basic verification of the file, run:

java -jar textidote.jar --output html example.tex > report.html

In Linux, if you installed TeXtidote using apt-get, you can also call itdirectly by typing:

textidote --output html example.tex > report.html

Here, the --output html option tells TeXtidote to produce a report in HTML format;the > symbol indicates that the output should be saved to a file, whose nameis report.html. TeXtidote will run for some time, and print:

TeXtidote v0.8 - A linter for LaTeX documents
(C) 2018-2019 Sylvain Hallé - All rights reserved

Found 23 warnings(s)
Total analysis time: 2 second(s)

Once the process is over, switch to your favorite web browser, and open thefile report.html (using the File/Open menu). You should see something like this:

Screenshot

As you can see, the page shows your original LaTeX source file, where someportions have been highlighted in various colors. These correspond to regionsin the file where an issue was found. You can hover your mouse over thesecolored regions; a tooltip will show a message that describes the problem.

If you don't write any filename (or write -- as the filename), TeXtidotewill attempt to read one from the standard input.

Plain report

To run TeXtidote and display the results directly in the console, simply omitthe --output html option (you can also use --output plain), and do not redirect the output to a file:

java -jar textidote.jar example.tex

TeXtidote will analyze the file like before, but produce a report that lookslike this:

* L25C1-L25C25 A section title should start with a capital letter. [sh:001]
  \section{a first section}
  ^^^^^^^^^^^^^^^^^^^^^^^^^
* L38C1-L38C29 A section title should not end with a punctuation symbol.
  [sh:002]
  \subsection{ My subsection. }
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* L15C94-L15C99 Add a space before citation or reference. [sh:c:001]
   things, like a citation\cite{my:paper} .The text

Each element of the list corresponds to a "warning", indicating thatsomething in the text requires your attention. For each warning, the positionin the original source file is given: LxxCyy indicates line xx, column yy. Thewarning is followed by a short comment describing the issue, and an excerptfrom the line in question is displayed. The range of characters where theproblem occurs is marked by the "^^^^" symbols below the text. Each of thesewarnings results from the evaluation of some "rule" on the text; an identifierof the rule in question is also shown between brackets.

Single line report

Another option to display the results directly in the console is the single line report:

java -jar textidote.jar --output singleline example.tex

Textidote will analyze the file like before, but this time the report looks like this:

example.tex(L25C1-L25C25): A section title should start with a capital letter. "\section{a first section}"
example.tex(L38C1-L38C29): A section title should not end with a punctuation symbol. "\subsection{ My subsection. }"
example.tex(L15C94-L15C99): Add a space before citation or reference. "things, like a citation\cite{my:paper} .The text"

Each line corresponds to a warning, and is parseable by regular expressionseasily, e.g., for further processing in another tool. The file is given at thebeginning of the line, followed by the position in parentheses. Then, thewarning message is given, and the excerpt causing the warning is printed indouble quotes (""). Note, that sometimes it may happen that a position cannotbe determined. In this case, instead of LxxCyy, ? is printed.

You can disable the use of color in any form of command-line output using the--no-color switch.

Spelling, grammar and style

You can perform further checks on spelling and grammar, by passing the--check option at the command line. For example, to check text in English,you run:

java -jar textidote.jar --check en example.tex

The --check parameter must be accompanied by a two-letter code indicatingthe language to be used. Language Tool is a powerful library that can verifyspelling, grammar, and even provide suggestions regarding style. TeXtidotesimply passes a cleaned-up version of the LaTeX file to Language Tool,retrieves the messages it generates, and coverts the line and column numbersassociated to each message back into line/column numbers of the originalsource file. For more information about the kind of verifications made byLanguage Tool, please refer to its website.

Additionally, the --firstlang lang option can be used to make Language Tool check for false friends in your first language.For example, to check a text in english, when your first language is german, you may run:

java -jar textidote.jar --check en --firstlang de example.tex

The language codes you can use are:

  • de: (Germany) German, and the variants de_AT (Austrian) and de_CH(Swiss)
  • en: (US) English, and the variants en_CA (Canadian) and en_UK(British)
  • es: Spanish
  • fr: French
  • nl: Dutch
  • pt: Portuguese
  • pl: Polish

Using a dictionary

If you have a list of words that you want TeXtidote to ignore when checkingspelling, you can use the --dict parameter to specify the location of atext file:

java -jar textidote.jar --check en --dict dico.txt example.tex

The file dico.txt must be a plain text file contain a list of words to beignored, with each word on a separate line. (The list is case sensitive.)

If you already spell checked you file using Aspell andsaved a local dictionary(as is done for example by thePaperShell environment),TeXtidote can automatically load this dictionary when invoked. Morespecifically, it will look for a file called .aspell.XX.pws in the folderwhere TeXtidote is started (this is the filename Aspell gives to localdictionaries). The characters XX are to be replaced with the two-letterlanguage code. If such a file exists, TeXtidote will load it and mention it atthe console:

Found local Aspell dictionary

Ignoring rules

You may want to ignore some of TeXtidote's advice. You can do so by specifyingrule IDs to ignore with the --ignore command line parameter. For example,the ID of the rule "A section title should start with a capital letter" issh:001 (rule IDs are shown between brackets in the reports given byTeXtidote); to ignore warnings triggered by this rule, you call TeXtidote asfollows:

java -jar textidote.jar --ignore sh:001 myfile.tex

If you want to ignore multiple rules, separate their IDs with a comma (but nospace).

Ignoring environments

TeXtidote can be instructed to remove user-specified environments using the --remove command line parameter. For example:

$ java -jar textidote.jar --remove itemize myfile.tex

This command will remove all text lines between \begin{itemize} and \end{itemize} before further processing the file.

Ignoring macros

The same can be done with macros:

$ java -jar textidote.jar --remove-macros foo myfile.tex

This command will remove all occurrences of use-defined command \foo in the text. Alternate syntaxes like \foo{bar} and \foo[x=y]{bar} are also recognized and deleted.

Replacing strings

Before TeXtidote analyses a file, you can ask it to apply a set offind/replace operations (for example, to replace a macro by some predefinedcharacter string). You can write these patterns into a text file and pass themto the program at the command line:

$ java -jar textidote.jar --replace replacements.txt

Here, replacements.txt is the file that contains the find/replace patterns,fomatted as follows:

# Empty lines beginning with a pound sign are ignored
# Search and replace patterns are separated by a tab
find	replace
foo		bar
# Patterns can also be regular expressions
abc\d+[^x]	123

Reading a sub-file

By default, TeXtidote ignores everything before the \begin{document}command. If you have a large document that consists of multiple included LaTeX"sub-files", and you want to check one such file that does not contain a\begin{document}, you must tell TeXtidote to read all the file using the--read-all command line option. Otherwise, TeXtidote will ignore the wholefile and give you no advice.

TeXtidote also automatically follows sub-files that are embedded from a main document using \input{filename} and \include{filename} (braces are mandatory). Any such non-commented instruction will add the corresponding filename to the running queue. If you want to exclude an \input from being processed, you must surround the line with ignore begin/end comments (see below, Helping TeXtidote).

Removing markup

You can also use TeXtidote just to remove the markup from your original LaTeXfile. This is done with the option --clean:

java -jar textidote.jar --clean example.tex

By default, the resulting "clean" file is printed directly at the console. Tosave it to a file, use a redirection:

java -jar textidote.jar --clean example.tex > clean.txt

You will see that TeXtidote performs a very aggressive deletion of LaTeXmarkup:

  • All figure, table and tabular environments are removed
  • All equations are removed
  • All inline math expressions ($...$) are replaced by "X"
  • All \cite commands are replaced by "0"
  • All \ref commands are replaced by "[0]"
  • Commands that alter text (\textbf, \emph, \uline, \footnote)are removed (but the text is kept)
  • Virtually all other commands are simply deleted

Surprisingly, the result of applying these modifications is a text that isclean and legible enough for a spelling or grammar checker to providesensible advice.

As was mentioned earlier, TeXtidote keeps a mapping between character rangesin the "cleaned" file, and the same character ranges in the original LaTeXdocument. You can get this mapping by using the --map option:

java -jar textidote.jar --clean --map map.txt example.tex > clean.txt

The --map parameter is given the name of a file. TeXtidote will put in thisfile the list of correspondences between character ranges. This file is madeof lines that look like this:

L1C1-L1C24=L1C5-L128
L1C26-L1C28=L1C29-L1C31
L2C1-L2C10=L3C1-L3C10
...

The first entry indicates that characters 1 to 24 in the first line of theclean file correspond to characters 5 to 28 in the first line of the originalLaTeX file --and so on. This mapping can have "holes": for example, character25 line 1 does not correspond to anything in the original file (this happenswhen the "cleaner" inserts new characters, or replaces characters from theoriginal file by something else). Conversely, it is also possible thatcharacters in the original file do not correspond to anything in the cleanfile (this happens when the cleaner deletes characters from the original).

Character encodings

TeXtidote uses the OS default encoding when reading files (e.g. utf-8 in Linux, cp1252 in Windows). You can override this setting using the --encoding command line option:

java -jar textidote.jar --encoding cp1252 example.tex

Using a configuration file

If you need to run TeXtidote with many command line arguments (for example:you load a local dictionary, ignore a few rules, apply replacements, etc.), itmay become tedious to invoke the program with a long list of arguments everytime. TeXtidote can be "configured" by putting those arguments in a textfile called .textidote in the directory from which it is called. Here is anexample of what such a file could contain:

--output html --read-all
--replace replacements.txt
--dict mydict.txt
--ignore sh:001,sh:d:001
--check en mytext.tex

As you can see, arguments can be split across multiple lines. You can thencall TeXtidote without any arguments like this:

textidote > report.html

If you call TeXtidote with command line arguments, they will be merged withwhatever was found in .textidote. You can also tell TeXtidote to explicitlyignore that file and only take into account the command line arguments usingthe --no-config argument.

Markdown input

TeXtidote also supports files in the Markdown format. The only difference is that rules specific to LaTeX (references to figures, citations) are not evaluated.

Simply call TeXtidote with a Markdown input file instead of a LaTeX file. The format is auto-detected by looking at the file extension. However, if you pass a file through the standard input, you must tell TeXtidote that the input file is Markdown by using the command line parameter --type md. Otherwise, TeXtidote assumes by default that the input file is LaTeX.

Helping TeXtidote

It order to get the best results when using TeXtidote, it is advisable thatyou follow a few formatting conventions when writing your LaTeX file:

  • Avoid putting multiple \begin{environment} and/or \end{environment} onthe same line
  • Keep the arguments of a command on a single line. Commands (such as\title{}) that have their opening and closing braces on different linesare not recognized by TeXtidote and will result in garbled output andnonsensical warnings.
  • Do not hard-wrap your paragraphs. It is easier for TeXtidote to detectparagraphs if they have no hard carriage returns inside. (If you need wordwrapping, it is preferable to enable it in your text editor.)
  • Put headings like \section or \paragraph alone on their line andseparate them from the text below by a blank line.

As a rule, it is advisable to first see what your text looks like using the--clean option, to make sure that TeXtidote is performing checks onsomething that makes sense.

If you realize that a portion of LaTeX markup is not handled properly andmesses up the rest of the file, you can tell TeXtidote to ignore a regionusing a special LaTeX comment:

% textidote: ignore begin
Some weird LaTeX markup that TeXtidote does not
understand...
% textidote: ignore end

The lines between textidote: ignore begin and textidote: ignore end willbe handled by TeXtidote as if they were comment lines.
When you are using markdown you can also selectively ignore parts of the document:

<!-- textidote: ignore begin -->
This should be ignored
<!-- textidote: ignore end -->

Linux shortcuts

To make using TeXtidote easier, you can create shortcuts on your system. Hereare a few recommended tips.

First, we recommend you create a folder called /opt/textidote and put thebig textidote.jar file there (this requires root privileges). This step isalready taken care of if you installed the TeXtidote package using apt-get.

Command line shortcut

(This step is not necessary if TeXtidote has been installed with apt-get.)In/usr/local/bin, create a file called textidote with the followingcontents:

#! /bin/bash
java -jar /opt/textidote/textidote.jar "$@"

Make this file executable by typing at the command line:

sudo chmod +x /usr/local/bin/textidote

(These two operations also require root privileges.) From then on, you caninvoke TeXtidote on the command line from any folder by simply typingtextidote, e.g.:

textidote somefile.tex

Desktop shortcut

If you use a desktop environment such as Gnome or Xfce, you can automatethis even further by creating a TeXtidote icon on your desktop. First,create a file called /opt/textidote/textidote-desktop.sh with the followingcontents, and make this file executable:

#!/bin/bash
if [ -x /usr/bin/notify-send ]; then
  err() { notify-send -a TeXtidote -i /opt/textidote/textidote-icon.svg "$*"; }
else
  err() { printf "%s\n" "$*" >&2; }
fi

[ $# -lt 1 ] && err "At least one file should be provided as input" && exit
dir=$(dirname "$1")

pushd "$dir" || err "$dir does not exist" && exit
java -jar /opt/textidote/textidote.jar --check en --output html "$@" > /tmp/textidote.html
popd || exit

xdg-open /tmp/textidote.html &

This script enters into the directory of the file passed as an argument,calls TeXtidote, sends the HTML report to a temporary file, and opens thedefault web browser to show that report.

Then, on your desktop (typically in your ~/Desktop folder), create anotherfile called TeXtidote.desktop with the following contents:

[Desktop Entry]
Version=1.0
Type=Application
Name=TeXtidote
Comment=Check text with TeXtidote
Exec=/opt/textidote/textidote-desktop.sh %F
Icon=/opt/textidote/textidote-icon.svg
Path=
Terminal=false
StartupNotify=false

This will create a new desktop shortcut; make this file executable. From thenon, you can drag LaTeX files from your file manager with your mouse and dropthem on the TeXtidote icon. After the analysis, the report will automaticallypop up in your web browser. Voilà!

Tab completions

You can auto-complete the commands you type at the command-line using the TABkey (as you are probably used to). If you installed TeXtidote using apt-get,auto-completion for Bash comes built-in.You can also enable auto-completion for other shells as follows.

Zsh

Users of Zsh can also enable auto-completion; in your~/.zshrc file, add the line

source /opt/textidote/textidote.zsh

(Create the file if it does not exist.) You must then restart your Zsh shellfor the changes to take effect.

Visual Studio Code integration

Users of Visual Studio Code can integrate TeXtidote by calling it with the --output singleline and --no-color options and parse its results. Moreover, user cphyc also wrote a nice build task.

Emacs integration

Emacs users can benefit from TeXtidote through flycheck.
A dedicated flycheck-checker can be defined as in the following init.el/.emacs snippet (by user soli).

(flycheck-define-checker tex-textidote
    "A LaTeX grammar/spelling checker using textidote.

    See https://github.com/sylvainhalle/textidote"
    :modes (latex-mode plain-tex-mode)
    :command ("java" "-jar" (eval (expand-file-name "~/PATH/TO/textidote.jar")) "--read-all"
              "--check" (eval (if ispell-current-dictionary (substring ispell-current-dictionary 0 2) "en"))
              "--no-color" source-inplace)
    :error-patterns (
                     (warning line-start "* L" line "C" column "-" (one-or-more alphanumeric) " "
                              (message (one-or-more (not (any "]"))) "]")))
							  
(add-to-list 'flycheck-checkers   'tex-textidote)

Rules checked by TeXtidote

Here is a list of the rules that are checked on your LaTeX file by TeXtidote.Each rule has a unique identifier, written between square brackets.

Language Tool

In addition to all the rules below, the --check xx option activates all therules verified by Language Tool(more than 2,000 grammar and spelling errors). Note that the verification timeis considerably longer when using that option.

If the --check option is used, you can add the --languagemodel xx option to find errors using n-gram data. In order to do so, xx must be a path pointing to an n-gram-index directory. Please refer to the LanguageTool page (link above) on how to use n-grams and what this directory should contain.

Style

  • A section title should start with a capital letter. [sh:001]
  • A section title should not end with a punctuation symbol. [sh:002]
  • A section title should not be written in all caps. The LaTeX stylesheettakes care of rendering titles in caps if needed. [sh:003]
  • Use a capital letter when referring to a specific section, chapteror table: 'Section X'. [sh:secmag, sh:chamag, sh:tabmag]
  • A (figure, table) caption should end with a period. [sh:capperiod]

Citations and references

  • There should be one space before a \cite or \ref command [sh:c:001], andno space after [sh:c:002].
  • Do not use 'in [X]' or 'from [X]': the syntax of a sentence should not bechanged by the removal of a citation. [sh:c:noin]
  • Do not mix \cite and \citep or \citet in the same document.[sh:c:mix]
  • When citing more than one reference, do not use multiple \cite commands;put all references in the same \cite. [sh:c:mul, sh:c:mulp]

Figures

  • Every figure should have a label, and every figure should be referenced atleast once in the text. [sh:figref]
  • Use a capital letter when referring to a specific figure: 'Figure X'.[sh:figmag]

Structure

  • A section should not contain a single sub-section. More generally, adivision of level n should not contain a single division of level n+1.[sh:nsubdiv]
  • The first heading of a document should be the one with the highest level.For example, if a document contains sections, the first section cannot bepreceded by a sub-section. [sh:secorder]
  • There should not be a jump down between two non-successive sectionlevels (e.g. a \section followed by a \subsubsection without a\subsection in between). [sh:secskip]
  • You should avoid stacked headings, i.e. consecutive headings withouttext in between. [sh:stacked]

Hard-coding

  • Figures should not refer to hard-coded local paths. [sh:relpath]
  • Do not refer to sections, figures and tables using a hard-coded number.Use \ref instead. [sh:hcfig, sh:hctab, sh:hcsec, sh:hccha]
  • You should not break lines manually in a paragraph with \\. Either start anew paragraph or stay in the current one. [sh:nobreak]
  • If you are writing a research paper, do not hard-code page breaks with\newpage. [sh:nonp]

LaTeX subtleties

  • Use a backslash or a comma after the last period in "i.e.", "e.g." and "et al.";otherwise LaTeX will think it is a full stop ending a sentence. [sh:010, sh:011]
  • There should not be a space before a semicolon or a colon. If in yourlanguage, typographic rules require a space here, LaTeX takes care ofinserting it without your intervention. [sh:d:005, sh:d:006]

Potentially suspicious

  • There should be at least N words between two section headings (currentlyN=100). [sh:seclen]

Building TeXtidote

First make sure you have the following installed:

  • The Java Development Kit (JDK) to compile. TeXtidote requires version 8of the JDK (and probably works with later versions).
  • Ant to automate the compilation and build process

Download the sources for TeXtidote fromGitHub or clone the repositoryusing Git:

git clone git@github.com:sylvainhalle/textidote.git

Compiling

First, download the dependencies by typing:

ant download-deps

Then, compile the sources by simply typing:

ant

This will produce a file called textidote.jar in the folder. Thisfile is runnable and stand-alone, or can be used as a library, so it can bemoved around to the location of your choice.

In addition, the script generates in the docs/doc folder the Javadocdocumentation for using TeXtidote.

Testing

TeXtidote can test itself by running:

ant test

Unit tests are run with jUnit; a detailed report ofthese tests in HTML format is available in the folder tests/junit, whichis automatically created. Code coverage is also computed withJaCoCo; a detailed report is availablein the folder tests/coverage.

About the author

TeXtidote was written by Sylvain Hallé, FullProfessor in the Department of Computer Science and Mathematics atUniversité du Québec à Chicoutimi, Canada.

Like TeXtidote?

TeXtidote is free software licensed under the GNU General Public License3. It is released aspostcardware: if you use andlike the software, please tell the author by sending a postcard of your townat the following address:

Sylvain Hallé
Department of Computer Science and Mathematics
Univerité du Québec à Chicoutimi
555, boulevard de l'Université
Chicoutimi, QC
G7H 2B1 Canada

If you like TeXtidote, you might also want to look atPaperShell, a templateenvironment for writing scientific papers in LaTeX.

Why is it called TeXtidote?

TeXtidote is a play on Antidote, which is a spelling/grammar checker wellknown to French-speaking users and works with word processors. So TeXtidote islike a version of Antidote for TeX.

相关阅读

相关文章

相关问答

相关文档