docx2tex

Converts Microsoft Word docx to LaTeX
授权协议 BSD-2-Clause License
开发语言
所属分类 企业应用、 LaTeX排版系统
软件类型 开源软件
地区 不详
投 递 者 连厉刚
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

docx2tex

Converts Microsoft Word's DOCX to LaTeX. Developed by le-tex and based on the transpect framework. The main author of docx2tex and the underlying xml2tex is @mkraetke.

get docx2tex

download the latest release

Download the latest docx2tex release

…or get source via Git. Please note that you have to add the --recursive option in order to clone docx2hub with submodules.

git clone https://github.com/transpect/docx2tex --recursive

requirements

  • Java 1.7 up to 1.15 (more recent versions not yet tested). Java 11 has a bug with file URIs, it should be avoided. Java 13 is safe again.
  • works on Windows, Linux and Mac OS X

run docx2tex

You can run docx2tex with a Bash script (Linux, Mac OSX, Cygwin) or the Windows batch script whose options are somewhat limited, compared to the Bash script.

Linux/MacOSX

./d2t [options ...] myfile.docx
Option Description
-o path to custom output directory
-c path to custom docx2tex configuration file
-m choose MathType source (ole|wmf|ole+wmf)
-f path to custom fontmaps directory
-p generate PDF with pdflatex
-t draw table grid lines
-e custom XSLT stylesheet for evolve-hub overrides
-x custom XSLT stylesheet for postprocessing the evolve-hub results
-d debug mode

Windows

d2t.bat myfile.docx

via XML Calabash

Linux/Mac OSX

calabash/calabash.sh -o result=myfile.tex -o hub=myfile.xml xpl/docx2tex.xpl docx=myfile.docx conf=conf/conf.xml

Windows

calabash\calabash.bat -o result=myfile.tex -o hub=myfile.xml xpl/docx2tex.xpl docx=myfile.docx conf=conf/conf.xml

configure

The docx2tex pipeline consists of 3 macroscopic steps:

  • docx2hub. This step is hardly configurable. It transforms a docx file to a Hub XML representation.
  • evolve-hub. This is a bag of XSLT modes that, among other things, transform paragraphs with list markers and hanging indentation to proper nested lists, create a nested section hierarchy, group images with their figure titles, etc. Only some of the modes are used by docx2tex, orchestrated by evolve-hub.xpl and configured in detail by evolve-hub-driver.xsl.
  • xml2tex

There are five major hooks for adding your own processing: CSV or xml2tex configuration; XSLT that is applied between evolve-hub and xml2tex; XSLT that modifies what happens in evolve-hub; fontmaps.


You can specify a custom configuration file for docx2tex. There are two different formats to write a configuration.

  • The CSV-based configuration format permits a simple way to map from MS Word styles to LaTeX commands.
  • The xml2tex configuration format is recommended for a deeper level of configuration but requires basic knowledge of XML and XPath.

CSV

For each MS Word style name, create a line with three semicolon separated values.

  • MS Word style name
  • LaTeX start statement
  • LaTeX end statement

Just follow this example:

Heading 1   ; \chapter{     ; }
Heading 2   ; \section{     ; }
Heading 3   ; \subsection{  ; }
Quote       ; \begin{quote} ; end{quote}

You can edit CSV files either with a simple text editor or with a spreadsheet application.

xml2tex

docx2tex can also be configured by means of an xml2tex configuration file. docx2tex will apply the configuration to the intermediate Hub XML file and generates the LaTeX output.

The configuration in conf/conf.xml is used by default and works with the styles defined in Microsoft Word's normal.dot. If you want to configure docx2tex for other styles, you can edit this file or pass a custom configuration file with the conf option.

Learn how to edit this file here.

XSLT between evolve-hub and xml2tex

You can provide an XSLT that works on the result of evolve-hub (if debugging is enabled, on the file [basename].debug/evolve-hub/70.docx2tex-postprocess.xml). The location of this XSLT file (absolute URI or path relative to the main directory that d2t and d2t.bat reside in) may be provided to d2t via the -x option. d2t.bat does not have all the flags; if you are confined to Windows and don’t have Cygwin, WSL, or MinGW, you may invoke calabash/calabash.bat yourself, see above. The additional XSLT’s URI may be provided by the custom-xsl option. This processing is applied before the xml2tex configuration, so your XSLT should transform Hub (DocBook namespace) to Hub.

During evolve-hub

In case you need to influence what evolve-hub does, you can provide a custom stylesheet for this. Contrary to custom-xsl which is passed as an option, this is passed to the pipeline on the input port custom-evolve-hub-driver, or using the -e option of d2t. There is an example for such an XSLT that retains empty paragraphs that will otherwise be removed by default, in one of the XSLT passes that comprise evolve-hub. This example was created in response to a user request. If you want to create \chapter, \section, etc. headings from arbitrary docx paragraphs, you should add a template that sets the paragraph’s @role attribute to Heading1, Heading2, etc. (For paragraphs that are not removed during evolve-hub, this can also be done in the -x stylesheet.) It is strongly advised to xsl:import the default evolve-hub customization (see example).

fontmaps

The docx conversion supports individual fontmaps for mapping non-unicode characters to unicode. Please note that this is just needed for fonts that are not unicode-compatible. If you want to map characters from Unicode to LaTeX, please use the character map in the xml2tex configuration instead.

Please find further documentation on how to create a fontmap here.

After you created your fontmap, store it in a directory and pass the path of the directory to docx2tex with the -f option.

If you invoke the docx2tex XProc pipeline (xpl/docx2tex.xpl), you can specify the fontmap directory with the option custom-font-maps-dir.

language tagging

You may have noticed some obscure \foreignlanguage{} or \selectlanguage{} code that doesn't match the actual language used in your TeX document. We have no fancy AI™-based natural language algorithms at work but docx2tex evaluates the original document language which typically applies to your system settings and the language setting of the paragraph or character style which is used by word for auto-correction and hyphenation. docx2tex evaluates these settings and filters redundant markup, e.g. detecting the main language by evaluating the character count of each of the styles and their respective language setting. However, when you copy and paste from the World Wide Web, Microsoft Word usually copies the language of the original Website as well. This causes most of the weird language markup, you may have noticed. So we recommend to copy and paste as plain text and to create new paragraph and character styles when you want to intentionally change the language of a text fragment.

  • .1.首先下载pandoc和pandoc-crossref并安装pandoc将pandoc-crossref放在pandoc的安装目录下即可,pandoc和pandoc-crossref必须是统一版本的。 利用powershell中的pandoc --version进行测试看是否成功安装,如果没有看看是不是路劲问题并将其加pandoc添加到环境变量中。 可以到官网下载也可以在网盘提取 链接: ht

  • 将tex文件转换成word 我本将心向Latex,奈何boss爱word。 平时做科研需要习惯使用Latex来写文章,但是很多的时候给老师的文档又需要是word文档,所以需要用到工具来转换。市面上一些转换软件(比如pdf转word)转换效果实在太差。不过现在有pandoc,这些都变得很方便。最好的一点是,在tex上写好以后转到word以后,排版的时间都省了! 下面是主要使用步骤: 安装pandoc

  • python转换doc文档为docx格式后,提取文档段落内容后保存 #导入所需库 from docx import Document import os import docx import win32com.client as wc #文件地址后的/不能省略 filePath = "C:/Users/现有数据字典/" word = wc.Dispatch("Word.Application"

  • 基础使用: 1、创建一个document文档对象 2、向文档中添加段落 3、添加标题 4、添加分页符 5、添加列表 6、添加图片 7、设置段落风格 8、使用加粗和倾斜 9、设定字体样式 基础使用 1.创建一个document文档对象 from docx import Document document = Document() 2.向文档中添加段落 添加一段话: paragraph = docum

  • C语言循环和数组训练题. 1,下列语句序列执行后,i的值是:( ) int i = 10; do { i = i/2; i--;} while( i> 1 ); A. 1 B. 5 C. 2 D. -1 2,假设代码完整,执行下面代码结果()for(int k=0; ; k++){printf(“这是:%d”,k);}A, 语法错误,缺少条件B, 程序什么都不输出C,死循环D, 输出:这是03,阅

  • Pandoc,这是一个很棒的程序,可以在各种标记格式之间进行转换(包括markdown,latex和docx文档之间)。更重要的是,Pandoc也是免费的开源软件。 第一步:安装 Pandoc的安装相对容易,并且在网页中提供了针对不同操作系统的详细过程。这里我就不加赘述了。 第二步:从 Latex 转换到 Word 我将假设您已准备好一个想要转化成Word的Latex文件。然后,您需要打开一个CM

  • title author date CreateTime categories 使用 Pandoc 把 Markdown 转 Docx lindexi 2018-10-23 10:56:18 +0800 2018-05-25 12:30:33 +0800 pandoc 最近在写文档,但是有小伙伴比较渣,他只会使用 Word 为了照顾这些比较渣的小伙伴,我需要把我的 Markdown 文件转换为 W

  • 参考材料 参考材料 参考材料 参考材料 第一套 华东交通大学2015-2016 学年第一学期 Matlab期末考试 一、填空题(4*6=24分) 1、 在MATLAB命令窗口中的 >> ”标志为MATLAB的_提示符,标志 为提示符。 2、 符号表达式sin(2*a+t)+m 中独立的符号变量为。 3、在通常情况下,左除x=a\b是_的解,右除x=b/a是_的解,一般情 况下,。 4、 为了使两个

  • pandoc转换tex文件为doc文件问题出现的问题 pandoc组件可以将tex文件直接转化成doc文件,这个代码如下: pandoc -s math.tex -o math.docx 但是有时候我们会碰到一个错误 Error at "paper.tex" (line 23, column 10): expecting \end{document} \end{CJK} 为啥呢,因为pando

  • 网页设计与制作(第三部分) css 网页设计与制作(第三部分) css练习题 PAGE PAGE # 网页设计与制作(第三部分) css 网页设计与制作(第三部分) css练习题 PAGE PAGE # 、单选题 CSS CSS是利用(B ) XHTM标签构建网页布局。 A. C.D. 在CSS语言中下列哪一项是“左边框”的语法( C ) A.border-left-width:< 值 >B.bo

  • 原码一位乘 [例 1] x = 0. 1101,y-0. 1011,求x y- 解,部分积 0 Ox 0 0 0 0 +0 0. 1 1 0 1 0 0. 1 1 0 1 -> 0 0. 0 1 1 0 + D D. 1 1 0 1 0 1, 0 0 1 1 —> 0 0. 1 0 0 1 + 0 0.0000 0 0, 1 0 0 1 -> 0 0+ 0 10 0 + 0 0. 1 1 0 1

  • 第二章 1、设整型变量a的值为2,下列表达式值为1的是__。(A) A) a%3 B) a/3 =0 C) --a D) a++ 2. 设变量a、b、c已定义并赋值,则下列表达式中符 合C语言语法规则的是__。 (B) A) a=5++操作数只能是变量 B) a=b=c++ C) a=%2 D) b=a+1=2左边只能是单个变量,即变量1=变量2=。。。 3. 下列式中,最终运算结果的数据类型不是

  • 计量经济学切第3章计算机习题 计量经济学第3章计算机习题 PAGE \* MERGEFORMAT 9 班级:金融学106班 姓名:丁涛 学号:0100726 C3.5 log(wage)= 解:通过对例3.2进行“排除其他影响”练习,证实对OLS估计值做“排除其他影响”解释。 (1)先将educ对exper和tenure进行回归,并保留残差r1; 由上操作可知:educ= 所以n=526 , R

相关阅读

相关文章

相关问答

相关文档