当前位置：首页 > 软件库 > 程序开发 > 数学计算 >

Word2Vec.Net

单词转换成向量形式工具

授权协议 MIT

开发语言 .NET

所属分类程序开发、数学计算

软件类型开源软件

地区不详

投递者暴奕

操作系统 Windows

开源组织无

适用人群未知

软件官网

软件文档

官方下载

软件概览

Word2Vec.Net 是单词转换成向量形式工具Word2Vec .NET版本。

使用示例代码：

            var builder = Word2VecBuilder.Create();

            if ((i = ArgPos("-train",  args)) > -1)
                builder.WithTrainFile(args[i + 1]);
            if ((i = ArgPos("-output", args)) > -1)
                builder.WithOutputFile(args[i + 1]);
            //to all other parameters will be set default values
            var word2Vec = builder.Build();
            word2Vec.TrainModel();
            var distance = new Distance(args[i + 1]);
            BestWord[] bestwords = distance.Search("some_word");

或者

//more explicit option
        string trainfile="C:/data.txt";
        string outputFileName = "C:/output.bin";
        var word2Vec = Word2VecBuilder.Create()
            .WithTrainFile(trainfile)// Use text data to train the model;
            .WithOutputFile(outputFileName)//Use to save the resulting word vectors / word clusters
            .WithSize(200)//Set size of word vectors; default is 100
            .WithSaveVocubFile()//The vocabulary will be saved to <file>
            .WithDebug(2)//Set the debug mode (default = 2 = more info during training)
            .WithBinary(1)//Save the resulting vectors in binary moded; default is 0 (off)
            .WithCBow(1)//Use the continuous bag of words model; default is 1 (use 0 for skip-gram model)
            .WithAlpha(0.05)//Set the starting learning rate; default is 0.025 for skip-gram and 0.05 for CBOW
            .WithWindow(7)//Set max skip length between words; default is 5
            .WithSample((float) 1e-3)//Set threshold for occurrence of words. Those that appear with higher frequency in the training data twill be randomly down-sampled; default is 1e-3, useful range is (0, 1e-5)
            .WithHs(0)//Use Hierarchical Softmax; default is 0 (not used)
            .WithNegative(5)//Number of negative examples; default is 5, common values are 3 - 10 (0 = not used)
            .WithThreads(5)//Use <int> threads (default 12)
            .WithIter(5)//Run more training iterations (default 5)
            .WithMinCount(5)//This will discard words that appear less than <int> times; default is 5
            .WithClasses(0)//Output word classes rather than word vectors; default number of classes is 0 (vectors are written)
            .Build();

            word2Vec.TrainModel();

        var distance = new Distance(outputFile);
        BestWord[] bestwords = distance.Search("some_word");

使用案例

NLP之Word2Vec：Word2Vec算法的简介(CBOW和Skip-Gram及其对比)、安装、使用方法之详细攻略

NLP之Word2Vec：Word2Vec算法的简介(CBOW和Skip-Gram及其对比)、安装、使用方法之详细攻略目录 Word2Vec算法的简介 1、Word2Vec算法的概述—更好的文本表示(降维)，但存在词歧义问题 (1)、案例理解如何利用Word2Vec算法寻找相似词
应用工具训练Word2Vec

目录前言首先导入相关的包接下来准备语料模型训练应用词向量库前言了解了文本表示的基本含义之后，前面学习了词袋模型这种最简单的文本表示方法，接下来学习了Word2Vec。本文便是使用工具训练Word2Vec的实操代码，代码都是来自参考文献里的这本书，但是我在学习的过程中发现，书上的代码使用的应该gensim3版本，现在gensim已经更新到了v4，所以书中的很多代码报错。针对这种问题
word2vec词向量中文语料处理(python gensim word2vec总结）

目录中文语料处理法一：语料处理为列表法二：语料是文件（处理为迭代器）对一个目录下的所有文件生效（法1）对一个目录下的所有文件生效（法2） class : gensim.models.word2vec.PathLineSentences 对于单个文件语料，使用LineSentence 语料库获取语料 word2vec中文语料处理及模型训练实践 python gensim训练 word2v
Word2vec模型原理与keras、tensorflow实现word2vec

目录一、Word2vec模型介绍与举例 1.1 Skip-Gram详解 1.2 词向量的优势
新闻文本分类之旅 Word2Vec_Corpus

新闻文本分类预训练Word2vec语料导入相关库 import numpy as np import pandas as pd from gensim.models import word2vec 读取数据 train_df = pd.read_csv('../data/train_set.csv', sep='\t') test_df = pd.read_csv('../data/test
Word2Vec训练过程中的加速问题

学习ML/NLP的童鞋们都知道，word2vec是NLP的一个重要应用。Word2Vec是谷歌开源的一个将语言中字词转化为向量形式表达的工具。它通过在大数据量上进行高效训练而得到词向量，使用词向量可以很好地度量词与词之间的相似性。Word2Vec采用的模型包含了连续词袋模型Continuous Bag of Words（简称：CBOW）和Skip-Gram模型，其中CBOW是从原始语

Word2Vec.Net

同类工具

相关阅读

相关文章

相关问答

相关文档