This project provides free (even for commercial use)state-of-the-art information extractiontools. The current release includes tools for performing named entityextractionand binary relation detectionas well as tools for training custom extractors and relation detectors.
MITIE is built on top of dlib, a high-performance machine-learning library[1], MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings[2] and Structural Support Vector Machines[3]. MITIE offers several pre-trained models providing varying levels of support for both English, Spanish, and German trained using a variety of linguistic resources (e.g., CoNLL 2003, ACE, Wikipedia, Freebase, and Gigaword). The core MITIE software is written in C++, but bindings for several other software languages including Python, R, Java, C, and MATLAB allow a user to quickly integrate MITIE into his/her own applications.
Outside projects have created API bindings for OCaml,.NET, .NET Core, andRuby. There is also an interactive tool for labeling data and training MITIE.
MITIE's primary API is a C API which is documented in themitie.h header file. Beyond this, there are manyexample programs showing how to use MITIE from C, C++, Java, R, or Python 2.7.
Before you can run the provided examples you will need to download the trainedmodel files which you can do by running:
make MITIE-models
or by simply downloading the MITIE-models-v0.2.tar.bz2file and extracting it in your MITIE folder. Note that the Spanish and German models are supplied inseparate downloads. So if you want to use the Spanish NER model then download MITIE-models-v0.2-Spanish.zip andextract it into your MITIE folder. Similarly for the German model: MITIE-models-v0.2-German.tar.bz2
MITIE comes with a basic streaming NER tool. So you can tell MITIE toprocess each line of a text file independently and output marked up text with the command:
cat sample_text.txt | ./ner_stream MITIE-models/english/ner_model.dat
The ner_stream executable can be compiled by running make
in the top level MITIE folder orby navigating to the tools/ner_stream folder and running make
or usingCMake to build it which can be done with the following commands:
cd tools/ner_stream
mkdir build
cd build
cmake ..
cmake --build . --config Release
On a UNIX like system, this can be accomplished by running make
in the top level MITIE folder orby running:
cd mitielib
make
This produces shared and static library files in the mitielib folder. Or you can useCMake to compile a shared library by typing:
cd mitielib
mkdir build
cd build
cmake ..
cmake --build . --config Release --target install
Either of these methods will create a MITIE shared library in the mitielib folder.
If you compile MITIE using cmake then it will automatically find and use any optimized BLASlibraries on your machine. However, if you compile using regular make then you haveto manually locate your BLAS libaries or DLIB will default to its built in, but slower, BLASimplementation. Therefore, to use OpenBLAS when compiling without cmake, locate libopenblas.a
and libgfortran.a
, thenrun make
as follows:
cd mitielib
make BLAS_PATH=/path/to/openblas.a LIBGFORTRAN_PATH=/path/to/libfortran.a
Note that if your BLAS libraries are not in standard locations cmake will fail to find them. However,you can tell it what folder to look in by replacing cmake ..
with a statement such as:
cmake -DCMAKE_LIBRARY_PATH=/home/me/place/i/put/blas/lib ..
Once you have built the MITIE shared library, you can go to the examples/python folderand simply run any of the Python scripts. Each script is a tutorial explaining some aspect ofMITIE: named entity recognition and relation extraction,training a custom NER tool, ortraining a custom relation extractor.
You can also install mitie
direcly from github with this command:pip install git+https://github.com/mit-nlp/MITIE.git
.
MITIE can be installed as an R package. See the README for more details.
There are example C programs in the examples/C folder. To compile of them you simplygo into those folders and run make
. Or use CMake like so:
cd examples/C/ner
mkdir build
cd build
cmake ..
cmake --build . --config Release
There are example C++ programs in the examples/cpp folder. To compile any of them you simplygo into those folders and run make
. Or use CMake like so:
cd examples/cpp/ner
mkdir build
cd build
cmake ..
cmake --build . --config Release
There is an example Java program in the examples/java folder. Before you can run it youmust compile MITIE's java interface which you can do like so:
cd mitielib/java
mkdir build
cd build
cmake ..
cmake --build . --config Release --target install
That will place a javamitie shared library and jar file into the mitielib folder. Once you have thosetwo files you can run the example program in examples/java by running run_ner.bat if you are on Windows orrun_ner.sh if you are on a POSIX system like Linux or OS X.
Also note that you must have Swig 1.3.40 or newer, CMake 2.8.4 or newer, and the Java JDK installed to compile the MITIE interface. Finally, note that if you are using 64bit Java on Windows then you will need to use a command like:
cmake -G "Visual Studio 10 Win64" ..
instead of cmake ..
so that Visual Studio knows to make a 64bit library.
You can run a simple regression test to validate your build. Do this by runningthe following command from the top level MITIE folder:
make test
make test
builds both the example programs and downloads requiredexample models. If you require a non-standard C++ compiler, changeCC
in examples/C/makefile
and in tools/ner_stream/makefile
.
We have built Python 2.7 binaries packaged with sample models for 64bit Linux and Windows (both 32 and 64 bit version of Python). You can download the precompiled package here: Precompiled MITIE 0.2
We have built Java binaries for the 64bit JVM which work on Linux and Windows. You can download the precompiled package here: Precompiled Java MITIE 0.3. In the file is an examples/java folder. You can run the example by executing the provided .bat or .sh file.
There isn't any paper specifically about MITIE. However, since MITIE isbasically just a thin wrapper around dlib please cite dlib's JMLR paper if youuse MITIE in your research:
Davis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009
@Article{dlib09,
author = {Davis E. King},
title = {Dlib-ml: A Machine Learning Toolkit},
journal = {Journal of Machine Learning Research},
year = {2009},
volume = {10},
pages = {1755-1758},
}
MITIE is licensed under the Boost Software License - Version 1.0 - August 17th, 2003.
Permission is hereby granted, free of charge, to any person or organizationobtaining a copy of the software and accompanying documentation covered bythis license (the "Software") to use, reproduce, display, distribute,execute, and transmit the Software, and to prepare derivative works of theSoftware, and to permit third-parties to whom the Software is furnished todo so, all subject to the following:
The copyright notices in the Software and this entire statement, includingthe above license grant, this restriction and the following disclaimer,must be included in all copies of the Software, in whole or in part, andall derivative works of the Software, unless such copies or derivativeworks are solely in the form of machine-executable object code generated bya source language processor.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENTSHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLEFOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHERDEALINGS IN THE SOFTWARE.
[1] Davis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009.
[2] Paramveer Dhillon, Dean Foster and Lyle Ungar, Eigenwords: Spectral Word Embeddings, Journal of Machine Learning Research (JMLR), 16, 2015.
[3] T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs, Machine Learning, 77(1):27-59, 2009.
MITIE MITIE是在dlib机器学习库之上开发的NLP工具包,支持分布式词嵌入和结构化SVM。提供英语,西班牙语,德语的预训练语言模型。MITIT核心代码使用C++编写,支持Python,R,Java,C,MATLAB的集成。 安装 mitie python packageinstall, the install step is easy. #clone 源码 git clone https
1.版本 在安装rasa(1.1.8,其它版本应该也行)时候必须安装mitie,下面是我安装mitie时的环境: 1.VS2017 (装没装忘记了,但是电脑中有Microsoft Visual C++ 2017/2015/2013/2012) 2.boost 1.67.0 3.CMake 3.12.3 必须是64位版,不要用conda命令建立 4.MITIE 下载源码到本地安装 2.下载位置 2.
准备好文件total_word_feature_extractor_zh.dat,将它放到指定目录。 专题中上一篇文章已经提过利用python setup.py install来安装的问题,还是逐一安装感觉更可控。难点是安装mitie。 pip install mitie报错: Building wheel for mitie (setup.py) ... error ERROR: Com
折腾了大半天,终于基本搞定(目前没出什么问题) windows本安装MITIE,之前在linux上一个pip install搞定的事,在windows上搞了好久,网上很多帖子写了一些方法,大家可以参考: https://blog.csdn.net/liu765023051/article/details/83107254 https://www.kancloud.cn/ztl_ggg/java/8
language: “zh” pipeline: name: “MitieNLP” model: “data/total_word_feature_extractor_zh.dat” #加载 mitie 模型 name: “JiebaTokenizer” #使用 jieba 进行分词 name: “MitieEntityExtractor” #mitie 的命名实体识别 name: “Entity
一、编译 git clone https://github.com/mit-nlp/MITIE.git cd MITIE #编译python接口 python setup.py install #编译原生c++程序 cd MITIE/tools/wordrep mkdir build cd build cmake .. make 二、使用 1.python使用 from mitie impor
在研究rasa的时候,安装rasa-nlu有个步骤搞了好久才安装成功,记录下来 安装步骤: pip install rasa_nlu pip install rasa_nlu[spacy] pip install rasa_nlu[tensorflow] pip install rasa_nlu[mitie] 其他步骤都好了,就安装mitie一直报错 MITIE是一个MIT信息提取库,该库使用了最
conda activate激活你的python环境(或者venv激活) pip或者conda install cmake以及boost 终端或者cmd进入你的工作目录或者随便哪里,git clone https://github.com/mit-nlp/MITIE.git cd进MITIE的文件夹,python setup.py build 最后 python setup.py install
问题描述 查看MITIE total_word_feature_extractor模型的所有词 解决方案 from mitie import * twfe = total_word_feature_extractor("total_word_feature_extractor_zh.dat") # 加载 words = twfe.get_words_in_dictionary() words
昨天因为PYTHON程序的底层库出错,出了问题一直没解决。怀疑自己乱升级pip的软件库版本所致。所以干脆把PIP库都清理干净重装PIP库,结果在安装MITIE的时候出问题了。 仔细回忆一下MITIE安装的步骤,发现可能是CMAKE没安装,所以安装了一下。CMAKE安装完成,PIP LIST命令执行能看到,但是就在又一次安装MITIE时,出问题: 结果什么问题都冒出来了,什么CMAKE的LIST文件