arxiv-latex-cleaner

授权协议 Apache-2.0 License
开发语言
所属分类 企业应用、 LaTeX排版系统
软件类型 开源软件
地区 不详
投 递 者 盖昊东
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

arxiv_latex_cleaner

This tool allows you to easily clean the LaTeX code of your paper to submit toarXiv. From a folder containing all your code, e.g. /path/to/latex/, itcreates a new folder /path/to/latex_arXiv/, that is ready to ZIP and upload toarXiv.

Example call:

arxiv_latex_cleaner /path/to/latex --im_size 500 --images_whitelist='{"images/im.png":2000}'

Or simply from a config file

arxiv_latex_cleaner /path/to/latex --config cleaner_config.yaml

Installation:

pip install arxiv-latex-cleaner
arxiv_latex_cleaner is only compatible with Python >=3

Alternatively, you can download the source code:

git clone https://github.com/google-research/arxiv-latex-cleaner
cd arxiv-latex-cleaner/
python -m arxiv_latex_cleaner --help

And install as a command-line program directly from the source code:

python setup.py install

Main features:

Privacy-oriented

  • Removes all auxiliary files (.aux, .log, .out, etc.).
  • Removes all comments from your code (yes, those are visible on arXiv and youdo not want them to be). These also include \begin{comment}\end{comment}and \iffalse\fi environments.
  • Optionally removes user-defined commands entered with commands_to_delete(such as \todo{} that you redefine as the empty string at the end).
  • Optionally allows you to define custom regex replacement rules through acleaner_config.yaml file.

Size-oriented

There is a 50MB limit on arXiv submissions, so to make it fit:

  • Removes all unused .tex files (those that are not in the root and notincluded in any other .tex file).
  • Removes all unused images that take up space (those that are not actuallyincluded in any used .tex file).
  • Optionally resizes all images to im_size pixels, to reduce the size of thesubmission. You can whitelist some images to skip the global size usingimages_whitelist.
  • Optionally compresses .pdf files using ghostscript (Linux and Mac only).You can whitelist some PDFs to skip the global size usingimages_whitelist.

TikZ picture source code concealment

To prevent the upload of tikzpicture source code or raw simulation data, thisfeature:

  • Replaces the tikzpicture environment \begin{tikzpicture} ... \end{tikzpicture} with the respective\includegraphics{EXTERNAL_TIKZ_FOLDER/picture_name.pdf}.
  • Requires externally compiled TikZ pictures as .pdf files in folderEXTERNAL_TIKZ_FOLDER. See section 53 in thePGF/TikZ manual on TikZ pictureexternalization.
  • Only replaces environments with preceding\tikzsetnextfilename{picture_name} command (as in\tikzsetnextfilename{picture_name}\begin{tikzpicture} ... \end{tikzpicture}) where the externalized picture_name.pdf filenamematches picture_name.

More sophisticated pattern replacement based on regex group captures

Sometimes it is useful to work with a set of custom LaTeX commands when writinga paper. To get rid of them upon arXiv submission, one can simply revert them toplain LaTeX with a regular expression insertion.

{
    "pattern" : '(?:\\figcomp{\s*)(?P<first>.*?)\s*}\s*{\s*(?P<second>.*?)\s*}\s*{\s*(?P<third>.*?)\s*}',
    "insertion" : '\parbox[c]{{ {second} \linewidth}} {{ \includegraphics[width= {third} \linewidth]{{figures/{first} }} }}',
    "description" : "Replace figcomp"
}

The pattern above will find all \figcomp{path}{w1}{w2} commands and replacethem with\parbox[c]{w1\linewidth}{\includegraphics[width=w2\linewidth]{figures/path}}.Note that the insertion template is filled with thenamed groups capturesfrom the pattern. Note that the replacement is processed before all\includegraphics commands are processed and corresponding file paths arecopied, making sure all figure files are copied to the cleaned version. See alsocleaner_config.yaml for details on how to specify thepatterns.

Usage:

usage: arxiv_latex_cleaner@v0.1.22 [-h] [--resize_images] [--im_size IM_SIZE]
                                   [--compress_pdf]
                                   [--pdf_im_resolution PDF_IM_RESOLUTION]
                                   [--images_whitelist IMAGES_WHITELIST]
                                   [--keep_bib]
                                   [--commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]]
                                   [--use_external_tikz USE_EXTERNAL_TIKZ]
                                   [--config CONFIG] [--verbose]
                                   input_folder

Clean the LaTeX code of your paper to submit to arXiv. Check the README for
more information on the use.

positional arguments:
  input_folder          Input folder containing the LaTeX code.

optional arguments:
  -h, --help            show this help message and exit
  --resize_images       Resize images.
  --im_size IM_SIZE     Size of the output images (in pixels, longest side).
                        Fine tune this to get as close to 10MB as possible.
  --compress_pdf        Compress PDF images using ghostscript (Linux and Mac
                        only).
  --pdf_im_resolution PDF_IM_RESOLUTION
                        Resolution (in dpi) to which the tool resamples the
                        PDF images.
  --images_whitelist IMAGES_WHITELIST
                        Images (and PDFs) that won't be resized to the default
                        resolution,but the one provided here. Value is pixel
                        for images, and dpi forPDFs, as in --im_size and
                        --pdf_im_resolution, respectively. Format is a
                        dictionary as: '{"path/to/im.jpg": 1000}'
  --keep_bib            Avoid deleting the *.bib files.
  --commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]
                        LaTeX commands that will be deleted. Useful for e.g.
                        user-defined \todo commands. For example, to delete
                        all occurrences of \todo1{} and \todo2{}, run the tool
                        with `--commands_to_delete todo1 todo2`.Please note
                        that the positional argument `input_folder` cannot
                        come immediately after `commands_to_delete`, as the
                        parser does not have any way to know if it's another
                        command to delete.
  --use_external_tikz USE_EXTERNAL_TIKZ
                        Folder (relative to input folder) containing
                        externalized tikz figures in PDF format.
  --config CONFIG       Read settings from `.yaml` config file. If command
                        line arguments are provided additionally, the config
                        file parameters are updated with the command line
                        parameters.
  --verbose             Enable detailed output.

Testing:

python -m unittest arxiv_latex_cleaner.tests.arxiv_latex_cleaner_test

Note

This is not an officially supported Google product.

 相关资料
  • A small script to collect your LaTeX files for submission to the arXiv. Particularly useful if you use biblatex, and you can use it directly on Overleaf. Usage Install with pip install arxiv-collector

  • Description: The project hosts an aesthetic and simple LaTeX style suitable for "preprint" publications such as arXiv and bio-arXiv, etc.It is based on the nips_2018.sty style. This styling maintains

  • arXiv Vanity arXiv Vanity renders papers from arXiv as responsive web pages so you don't have to squint at a PDF. It turns this sort of thing: Into this: This is the web interface for viewing papers.

  • 部署 LaTeX 安装 texlive-core 新建 a.tex 文件,内容如下: \documentclass[11pt,a4paper]{article} %加入了一些针对XeTeX的改进并且加入了 \XeTeX 命令来输入漂亮的XeTeX logo \usepackage{xltxtra} %启用一些LaTeX中的功能 \usepackage{xunicode} %%%% fontspe

  • LaTeX(LATEX,音译“拉泰赫”)是一种基于TeX的排版系统。 LaTeX通过CTAN服务器发布,或作为TeX用户组(TUG)或第三方提供的许多易于安装和可用的TeX发布版本的一部分。如果您遇到问题,请访问帮助部分。 LaTeX本身并不是一个独立的排版程序,而是运行在Donald E. Knuth的TeX排版系统之上的文档准备软件。TeX发行版通常会将工作中的TeX系统所需的所有部件捆绑在一

  • To train the model, simply run train.py: $ python3 train.py Then, to generate a sample abstract, run sample.py: $ python3 sample.py If you want to change the starting seed of the generated abstrac