当前位置: 首页 > 知识库问答 >
问题:

当我编写Pyspark代码连接雪花时,我得到了一个错误

通骁
2023-03-14

下面是我的代码:

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *

sc = SparkContext.getOrCreate()

spark = SparkSession.builder \
    .master("local") \
    .appName("Test") \
    .config('spark.jars','/Users/zhao/Downloads/snowflake-jdbc-3.5.4.jar,/Users/zhao/Downloads/spark-snowflake_2.11-2.3.2.jar') \
    .getOrCreate()

sfOptions = {
  "sfURL" : "xxx",
  "sfUser" : "xxx",
  "sfPassword" : "xxx",
  "sfDatabase" : "xxx",
  "sfSchema" : "xxx",
  "sfWarehouse" : "xxx",
  "sfRole": "xxx"
}

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
  .options(**sfOptions) \
  .option("query",  "select * from CustomerInfo limit 10") \
  .load()

如果有人能给我一些想法,我将不胜感激:)

共有1个答案

任繁
2023-03-14

如何启动jupyter笔记本服务器实例?是否确保PythonPathSPARK_HOME变量设置正确,并且Spark没有预先运行实例?另外,您的雪花火花连接器罐子是否使用了正确的火花和Scala版本变体?

下面是一个在macOS机器上完全引导和测试的运行,作为参考(使用自制的):

# Install JDK8
~> brew tap adoptopenjdk/openjdk
~> brew cask install adoptopenjdk8

# Install Apache Spark (v2.4.5 as of post date)
~> brew install apache-spark

# Install Jupyter Notebooks (incl. optional CLI notebooks)
~> pip3 install --user jupyter notebook

# Ensure we use JDK8 (using very recent JDKs will cause class version issues)
~> export JAVA_HOME="/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home"

# Setup environment to allow discovery of PySpark libs and the Spark binaries
# (Uses homebrew features to set the paths dynamically)
~> export SPARK_HOME="$(brew --prefix apache-spark)/libexec"
~> export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/build:${PYTHONPATH}"
~> export PYTHONPATH="$(brew list apache-spark | grep 'py4j-.*-src.zip$' | head -1):${PYTHONPATH}"

# Download jars for dependencies in notebook code into /tmp

# Snowflake JDBC (v3.12.8 used here):
~> curl --silent --location \
'https://search.maven.org/classic/remotecontent?filepath=net/snowflake/snowflake-jdbc/3.12.8/snowflake-jdbc-3.12.8.jar' \
> /tmp/snowflake-jdbc-3.12.8.jar

# Snowflake Spark Connector (v2.7.2 used here)
# But more importantly, a Scala 2.11 and Spark 2.4.x compliant one is fetched
~> curl --silent --location \
'https://search.maven.org/classic/remotecontent?filepath=net/snowflake/spark-snowflake_2.11/2.7.2-spark_2.4/spark-snowflake_2.11-2.7.2-spark_2.4.jar' \
> /tmp/spark-snowflake_2.11-2.7.2-spark_2.4.jar

# Run the jupyter notebook service (opens up in webbrowser)
~> jupyter notebook

代码在新的Python 3笔记本中运行:

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *

sfOptions = {
    "sfURL": "account.region.snowflakecomputing.com",
    "sfUser": "username",
    "sfPassword": "password",
    "sfDatabase": "db_name",
    "sfSchema": "schema_name",
    "sfWarehouse": "warehouse_name",
    "sfRole": "role_name",
}

spark = SparkSession.builder \
    .master("local") \
    .appName("Test") \
    .config('spark.jars','/tmp/snowflake-jdbc-3.12.8.jar,/tmp/spark-snowflake_2.11-2.7.2-spark_2.4.jar') \
    .getOrCreate()

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
    .options(**sfOptions) \
    .option("query",  "select * from CustomerInfo limit 10") \
    .load()

df.show()
 类似资料:
  • } } /*我得到的错误是:文件:C:\Users\avino\Documents\java_project\gameScore。java[line:14]错误:转义序列无效(有效的是\b\t\n\f\r“'\)*/

  • 当我试图在flash插件和red5服务器安装之间建立连接时,我得到了以下错误。请帮帮我. [ERROR][rtmpConnectionExecutor#dtqatxjixlu78-1]org.red5.server.net.rtmp.basertmphandler-Exception java.lang.nosuchmethoderror:org.red5.server.scope.scope$c

  • 我试图在数据库中的CLOB类型列中插入一个很长的字符串,它基本上是一个base64编码的图像字符串,但我得到了异常java.sql.sqlsyntaxerroreXception。做这件事的正确方法是什么? 我尝试了setClob()中的Clob对象和setClob()中的reader对象,但给出了相同的异常“java.sql.sqlsyntaxerrorexception”,并且我将OJDBC1

  • 当我试图运行这个骨架代码时,我一直收到这个错误。我试图在Eclipse中使用OpenGL。我不知道是什么导致了这个问题。我如何解决这个问题?我也已经将jar文件添加到用户库中。 代码: 这是我一直在犯的错误。 错误:错误1 错误2 Plhd-19/>(JComponent. java: 4839)在java.桌面/java. awt.容器. addNotify(容器. java: 2804)在ja

  • 我对雪花+JMeter是新手。当我尝试使用以下配置来配置和运行Jmeter时,我收到以下错误。 null 我不确定,我在这里遗漏了什么。请帮帮我。 *来自Jemter结果视图的错误信息**响应消息:java.sql.sqlException:无法创建PoolableConnectionFactory(JDBC驱动程序遇到通信错误。消息:HTTP请求遇到异常:连接到

  • C:\Users\georg\Desktop\reactapp reactapp@1.0.0启动网页包开发服务器--热 节点:内部/模块/cjs/加载器:927抛出错误^ 错误:找不到模块'webpack-cli/bin/config-yargs'需要堆栈: C:\用户\georg\Desktop\reactapp\node_modules\webpack-dev-server\bin\webpa