当前位置: 首页 > 知识库问答 >
问题:

使用SqlContext.Read读取Spark中的.csv文件时出错

何灼光
2023-03-14

>

  • 我运行spark shell,如下所示:

    spark-shell--jars.\spark-csv2.11-1.4.0.jar;.\commons-csv-1.2.jar(我不能直接下载这些依赖项,这就是我使用--jars的原因)

    使用以下命令读取csv文件:

    scala> val df_1 = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("2008.csv")
    java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat
            at com.databricks.spark.csv.package$.<init>(package.scala:27)
            at com.databricks.spark.csv.package$.<clinit>(package.scala)
            at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:235)
            at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:73)
            at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:162)
            at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:44)
            at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
            at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
            at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
            at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
            at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
            at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
            at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
            at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
            at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
            at $iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
            at $iwC$$iwC$$iwC.<init>(<console>:47)
            at $iwC$$iwC.<init>(<console>:49)
            at $iwC.<init>(<console>:51)
            at <init>(<console>:53)
            at .<init>(<console>:57)
            at .<clinit>(<console>)
            at .<init>(<console>:7)
            at .<clinit>(<console>)
            at $print(<console>)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
            at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
            at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
            at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
            at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
            at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
            at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
            at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
            at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
            at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
            at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
            at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop
    .scala:997)
            at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:
    945)
            at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:
    945)
            at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
            at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
            at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
            at org.apache.spark.repl.Main$.main(Main.scala:31)
            at org.apache.spark.repl.Main.main(Main.scala)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
            at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
            at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: org.apache.commons.csv.CSVFormat
            at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            ... 57 more
    

    在执行第一个建议的解决方案后:

    PS C:\Users\319413696\Desktop\graphX> spark-shell --packages com.databricks:spark-csv_2.11:1.4.0
    Ivy Default Cache set to: C:\Users\319413696\.ivy2\cache
    The jars for the packages stored in: C:\Users\319413696\.ivy2\jars
    :: loading settings :: url = jar:file:/C:/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apache
    /ivy/core/settings/ivysettings.xml
    com.databricks#spark-csv_2.11 added as a dependency
    :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
            confs: [default]
            found com.databricks#spark-csv_2.11;1.4.0 in local-m2-cache
            found org.apache.commons#commons-csv;1.1 in local-m2-cache
            found com.univocity#univocity-parsers;1.5.1 in local-m2-cache
    downloading file:/C:/Users/319413696/.m2/repository/com/databricks/spark-csv_2.11/1.4.0/spark-csv_2.11-1.4.0.jar ...
            [SUCCESSFUL ] com.databricks#spark-csv_2.11;1.4.0!spark-csv_2.11.jar (0ms)
    downloading file:/C:/Users/319413696/.m2/repository/org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar ...
            [SUCCESSFUL ] org.apache.commons#commons-csv;1.1!commons-csv.jar (0ms)
    downloading file:/C:/Users/319413696/.m2/repository/com/univocity/univocity-parsers/1.5.1/univocity-parsers-1.5.1.jar ..
    .
            [SUCCESSFUL ] com.univocity#univocity-parsers;1.5.1!univocity-parsers.jar (15ms)
    :: resolution report :: resolve 671ms :: artifacts dl 31ms
            :: modules in use:
            com.databricks#spark-csv_2.11;1.4.0 from local-m2-cache in [default]
            com.univocity#univocity-parsers;1.5.1 from local-m2-cache in [default]
            org.apache.commons#commons-csv;1.1 from local-m2-cache in [default]
            ---------------------------------------------------------------------
            |                  |            modules            ||   artifacts   |
            |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
            ---------------------------------------------------------------------
            |      default     |   3   |   3   |   3   |   0   ||   3   |   3   |
            ---------------------------------------------------------------------
    
    :: problems summary ::
    :::: ERRORS
            Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.4.0/spark-csv_2.11-1.4
    .0-sources.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.11/1.4.0/spark-
    csv_2.11-1.4.0-sources.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.4.0/spark-csv_2.11-1.4
    .0-src.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.11/1.4.0/spark-
    csv_2.11-1.4.0-src.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.4.0/spark-csv_2.11-1.4
    .0-javadoc.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.11/1.4.0/spark-
    csv_2.11-1.4.0-javadoc.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/org/apache/apache/15/apache-15.jar (java.net.SocketExc
    eption: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/apache/15/apache-15.jar (java.n
    et.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/org/apache/commons/commons-parent/35/commons-parent-35
    .jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/35/commo
    ns-parent-35.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/commons-csv-1.1-sou
    rces.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-csv/1.1/commons
    -csv-1.1-sources.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/commons-csv-1.1-src
    .jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-csv/1.1/commons
    -csv-1.1-src.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/commons-csv-1.1-jav
    adoc.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-csv/1.1/commons
    -csv-1.1-javadoc.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/com/univocity/univocity-parsers/1.5.1/univocity-parser
    s-1.5.1-sources.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/com/univocity/univocity-parsers/1.5.1/univ
    ocity-parsers-1.5.1-sources.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/com/univocity/univocity-parsers/1.5.1/univocity-parser
    s-1.5.1-src.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/com/univocity/univocity-parsers/1.5.1/univ
    ocity-parsers-1.5.1-src.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url https://repo1.maven.org/maven2/com/univocity/univocity-parsers/1.5.1/univocity-parser
    s-1.5.1-javadoc.jar (java.net.SocketException: Permission denied: connect)
    
            Server access error at url http://dl.bintray.com/spark-packages/maven/com/univocity/univocity-parsers/1.5.1/univ
    ocity-parsers-1.5.1-javadoc.jar (java.net.SocketException: Permission denied: connect)
    
  • 共有1个答案

    禹正阳
    2023-03-14

    >

  • 给出罐子的完整路径,用,而不是;

    spark-shell--jars spark-shell--jars fullpath\spark-csv2.11-1.4.0.jar,fullpath\commons-csv-1.2.jar

    请确保您在文件夹(DFS)中具有将写入临时文件的权限。

  •  类似资料:
    • 我在apache Spark中读取本地文件时出错。scala>val f=sc.textfile(“/home/cloudera/downloads/sample.txt”)

    • 我正在尝试使用spack-csv从spack-shell中的aws s3读取csv。 下面是我所做的步骤。使用下面的命令启动spack-shell 箱子/火花壳——包装com。数据块:spark-csv\u 2.10:1.2.0 在shell中,执行以下scala代码 获取以下错误 我在这里错过了什么?请注意,我可以使用 同样的scala代码在databricks笔记本中也可以正常工作 在spar

    • 我试图读取CSV文件,但它抛出了一个错误。我无法理解我的语法有什么问题,或者我是否需要向我的read_csv添加更多属性。 我试了一下这个解决办法 UnicodeDecodeError:“utf-8”编解码器无法解码位置21中的字节0x96:起始字节也无效。但它不起作用 [错误] UnicodeDecodeError回溯(最近一次调用)pandas/_libs/解析器。大熊猫中的pyx_图书馆。解

    • 我正在通过Spark使用以下命令读取csv文件。 我需要创建一个Spark DataFrame。 我使用以下方法将此rdd转换为spark df: 但是在将rdd转换为df时,我需要指定df的模式。我试着这样做:(我只有两列文件和消息) 然而,我得到了一个错误:java。lang.IllegalStateException:输入行没有架构所需的预期值数。需要2个字段,但提供1个值。 我还尝试使用以

    • 我在读取压缩的csv文件时出错。错误如下:“zlib.error:解压缩时错误-3:设置的距离无效” 代码: 我在文件上尝试了Gunzip,它没有任何问题。我使用Gunzip-t。它给rc 0。