第一种:
(1)显卡配置:GTX1050Ti
(2)系统环境:win10、cuda=9.2
(3)pom依赖:cuda=9.2 nd4j=1.0.0-beta6第二种配置:
(1)显卡配置:RTX3080
(2)系统环境:win10、cuda=11.2 或cuda=11.6
(3)pom依赖:cuda=11.2 nd4j=1.0.0-M1.1 (这里不能用1.0.0-M1,会报错-详见下方,是一个bug,在新版M1.1中不会出现。也不要用1.0.0-M2,因为虽然nd4j-cuda-11.2-platform最高支持1.0.0-M2,但deeplearing4j-cuda-11.2最高只支持到1.0.0-M1.1。)备注:这里说明cuda大版本(version第一个小数点前的数字)一致时,系统环境和pom.xml中使用的cuda小版本可以不一致。
(1)系统环境cuda=11.2,pom.xml中cuda=11.2 且 nd4j=1.0.0-M1
或者系统环境cuda=11.6,pom.xml中cuda=11.2 且 nd4j=1.0.0-M1
系统环境:笔记本cuda=11.2 ;pom依赖:cuda=11.2 nd4j=1.0.0-M1
或
或者系统环境cuda=11.6,pom.xml中cuda=11.2 且 nd4j=1.0.0-M1
的报错日志:
[main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
[main] ERROR org.deeplearning4j.common.config.DL4JClassLoading - Cannot create instance of class 'org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper'.
java.lang.NoSuchMethodException: org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper.<init>(java.lang.Class, [Ljava.lang.Object;)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:103)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:89)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:74)
at org.deeplearning4j.nn.layers.HelperUtils.createHelper(HelperUtils.java:57)
at org.deeplearning4j.nn.layers.recurrent.LSTM.initializeHelper(LSTM.java:53)
at org.deeplearning4j.nn.layers.recurrent.LSTM.<init>(LSTM.java:49)
at org.deeplearning4j.nn.conf.layers.LSTM.instantiate(LSTM.java:78)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:714)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:604)
at zj.rnn.effectiveness.train.wordvector.TestWordVector.main(TestWordVector.java:89)
Exception in thread "main" java.lang.RuntimeException: java.lang.NoSuchMethodException: org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper.<init>(java.lang.Class, [Ljava.lang.Object;)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:108)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:89)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:74)
at org.deeplearning4j.nn.layers.HelperUtils.createHelper(HelperUtils.java:57)
at org.deeplearning4j.nn.layers.recurrent.LSTM.initializeHelper(LSTM.java:53)
at org.deeplearning4j.nn.layers.recurrent.LSTM.<init>(LSTM.java:49)
at org.deeplearning4j.nn.conf.layers.LSTM.instantiate(LSTM.java:78)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:714)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:604)
at zj.rnn.effectiveness.train.wordvector.TestWordVector.main(TestWordVector.java:89)
Caused by: java.lang.NoSuchMethodException: org.deeplearning4j.cuda.recurrent.CudnnLSTMHelper.<init>(java.lang.Class, [Ljava.lang.Object;)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.deeplearning4j.common.config.DL4JClassLoading.createNewInstance(DL4JClassLoading.java:103)
... 9 more
Process finished with exit code 1
(2)系统环境cuda=11.6,pom.xml中cuda=10.2 且 nd4j=1.0.0-beta7
这里的错误就是系统环境的cuda、cudnn版本和pom.xml中不一致导致的。也有说是RTX3080算力比较高,使用cuda10.2与之不匹配的问题。
解决:升级cuda=11.2,nd4j=1.0.0-M1.1
系统环境cuda=11.6,pom.xml中cuda=10.2 且 nd4j=1.0.0-beta7
[main] WARN org.nd4j.linalg.factory.Nd4jBackend - Skipped [JCublasBackend] backend (unavailable): java.lang.UnsatisfiedLinkError: C:\Users\A\.javacpp\cache\rnn-effective-0.0.1-bin.jar\org\bytedeco\cuda\windows-x86_64\jnicudart.dll: Can't find dependent libraries
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.deeplearning4j.models.embeddings.inmemory.InMemoryLookupTable$Builder.<init>(InMemoryLookupTable.java:637)
at org.deeplearning4j.models.sequencevectors.SequenceVectors$Builder.presetTables(SequenceVectors.java:941)
at org.deeplearning4j.models.word2vec.Word2Vec$Builder.build(Word2Vec.java:615)
at zj.rnn.effectiveness.util.PrepareWordVector.trainWordVector(PrepareWordVector.java:133)
at zj.rnn.effectiveness.train.wordvector.RnnClassifyWithTrainWordVector.main(RnnClassifyWithTrainWordVector.java:64)
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5094)
at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:270)
... 5 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:221)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5091)
... 6 more
(3)系统环境cuda=10.2,pom.xml中cuda=10.2 且 nd4j=1.0.0-beta7
虽然词向量的保存和读取都是用的同一类型方法,但仍然报错。最后选用高版本的cuda=11.2, nd4j=1.0.0-M1.1就可以完美解决所有问题。
系统环境cuda=10.2,pom.xml中cuda=10.2 且 nd4j=1.0.0-beta7。在读词向量的时候报错。
其中,词向量的训练保存代码:
// 1、词向量训练
SentenceIterator iter = null;
try {
iter = new BasicLineIterator(hanLpFilePath);
TokenizerFactory t = new DefaultTokenizerFactory();
Word2Vec vec = new Word2Vec.Builder().minWordFrequency(3) // 词在文本(整条训练语句,与窗口大小无关)必须出现的最少次数,短文本中设置只要出现一次就拿下
.epochs(5) // 迭代次数
.layerSize(wordVectorSize) // 每个词用wordVector表示的大小
.seed(42).windowSize(8) // 上下文窗口大小,表示每个词需要考虑前8个词和后8个词,和最小词频无关
.iterate(iter).tokenizerFactory(t).build();
vec.fit();
// 保存词向量
WordVectorSerializer.writeWord2VecModel(vec, vectorPath);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
// 2、读取词向量
WordVectors wordVectors = WordVectorSerializer.readWord2VecModel(new File(vectorPath));
[main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
[main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 32
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [16]; Memory: [7.1GB];
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
[main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 10.2.89
[main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [NVIDIA GeForce RTX 3080]; cc: [8.6]; Total memory: [10736893952]
[main] ERROR org.deeplearning4j.models.embeddings.loader.WordVectorSerializer - Cannot read binary model
U syn0.txt\???[q??????χH??B &??Rw?L?#,?#E??O?ZUk)q?7s?9???CZ?j??9????????k??9?????Zf???3??s??Yu?}V?{??U???~??[??g???m?y????m??????Y??z???z??_????r?~????[W?{?V????7?=G??L?????m?~{?]?????SN)k?>&???e???)s???Vj[?6}?,z????}?y[ie?~??zic???\K??G??????????/??N?E?X{???????????:???\????????Z??T????????f/?\???n|s??????????o?1?.???j??7k?1?V?????+u7?3???z?z?^J??q?v?/??j??u???;?E?(??U??V???/K+Z?,K???t?o{??E?d?it??g??7'*7u??G:??m?V??j?v??;??,?~??1"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readBinaryModel(WordVectorSerializer.java:278)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2444)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2426)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2413)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at maotiao.train.wordvector.rnn.RnnClassifyWordVector.main(RnnClassifyWordVector.java:79)
[main] ERROR org.deeplearning4j.models.embeddings.loader.WordVectorSerializer - Unable to guess input file format
java.lang.RuntimeException: Unable to guess input file format. Please use corresponding loader directly
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2447)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readAsBinaryNoLineBreaks(WordVectorSerializer.java:2426)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2413)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at maotiao.train.wordvector.rnn.RnnClassifyWordVector.main(RnnClassifyWordVector.java:79)
Exception in thread "main" java.lang.RuntimeException: Unable to guess input file format. Please use corresponding loader directly
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2416)
at org.deeplearning4j.models.embeddings.loader.WordVectorSerializer.readWord2VecModel(WordVectorSerializer.java:2372)
at maotiao.train.wordvector.rnn.RnnClassifyWordVector.main(RnnClassifyWordVector.java:79)
显卡的和cuda的匹配关系可看英伟达显卡、cuda、cudnn、tensorflow-gpu、torch-gpu版本对应关系
需要说明:官网上的映射关系都是指最高匹配版本,如RTX3080 最高匹配cuda 11.7,也就是cuda <= 11.7都是可以的,但是如果版本低于11可能会和显卡的算力(NVIDIA支持的显卡算力CC(computer-capability)) 不匹配,在模型训练时可能也会报错。
笔者同时在RTX3080 的台式机上同时安装了cuda11.6、cuda11.2、cuda10.2。在GTX1050Ti上同时安装了cuda9.2、cuda9.0。
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<artifactId>maotiao-classify-gpu</artifactId>
<properties>
<!-- gpu环境3 -->
<!-- <nd4j.version>1.0.0-beta6</nd4j.version>
<dl4j.version>1.0.0-beta6</dl4j.version>
<cuda.version>9.2</cuda.version>-->
<!-- gpu环境2 -->
<!-- <nd4j.version>1.0.0-beta6</nd4j.version>
<dl4j.version>1.0.0-beta6</dl4j.version>
<cuda.version>10.2</cuda.version>-->
<!-- gpu环境1 -->
<nd4j.version>1.0.0-M1.1</nd4j.version>
<dl4j.version>1.0.0-M1.1</dl4j.version>
<cuda.version>11.2</cuda.version>
</properties>
<dependencies>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.25</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.7.1</version>
</dependency>
<!-- 读取.xls的excle -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.13</version>
</dependency>
<!-- 读取.xlsx的excle -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.13</version>
</dependency>
<!-- 有关excel读取-结束 -->
<!-- cpu依赖开始 -->
<!--<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native-platform</artifactId>
<version>${nd4j.version}</version>
</dependency>-->
<!-- cpu依赖结束 -->
<!-- gpu版本依赖开始 -->
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-cuda-${cuda.version}-platform</artifactId>
<version>${nd4j.version}</version>
</dependency>
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-cuda-${cuda.version}</artifactId>
<version>${dl4j.version}</version>
</dependency>
<!-- gpu版本依赖结束 -->
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-core</artifactId>
<version>${dl4j.version}</version>
</dependency>
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-nlp</artifactId>
<version>${dl4j.version}</version>
</dependency>
</dependencies>
<version>0.0.1</version>
<groupId>com.tianque</groupId>
<build>
<finalName>${project.artifactId}</finalName>
<plugins>
<!-- 资源文件拷贝插件 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>2.7</version>
<configuration>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<!-- java编译插件 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.4.0</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>bin</shadedClassifierName>
<createDependencyReducedPom>true</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>org/datanucleus/**</exclude>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>