bin/spark-submit
spark-submit
or spark-shell>spark-defaults.conf
file.最终的参数为3者的merge
Spark properties 为不同的引用配置不同的参数,例如:本地模式2个线程
val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("CountingSheep")
val sc = new SparkContext(conf)
对于时间和字节参数要添加单位,如
25ms (milliseconds)
5s (seconds)
10m or 10min (minutes)
3h (hours)
5d (days)
1y (years)
1b (bytes)
1k or 1kb (kibibytes = 1024 bytes)
1m or 1mb (mebibytes = 1024 kibibytes)
1g or 1gb (gibibytes = 1024 mebibytes)
1t or 1tb (tebibytes = 1024 gibibytes)
1p or 1pb (pebibytes = 1024 tebibytes)
可以创建空conf
val sc = new SparkContext(new SparkConf())
在运行是指定参数
./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar
bin/spark-submit
可以直接读取conf/spark-defaults.conf文件,每一行为一个key和value
spark.master spark://5.6.7.8:7077
spark.executor.memory 4g
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer
在web UI at http://<driver>:4040
的“Environment” tab中具体校对提交的参数是否和自己本意一致
Most of the properties that control internal settings have reasonable default values. Some of the most common options to set are:
If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that should be included on Spark’s classpath:
hdfs-site.xml
, which provides default behaviors for the HDFS client.core-site.xml
, which sets the default filesystem name. The location of these configuration files varies across CDH and HDP versions, but a common location is inside of /etc/hadoop/conf
. Some tools, such as Cloudera Manager, create configurations on-the-fly, but offer a mechanisms to download copies of them.
To make these files visible to Spark, set HADOOP_CONF_DIR
in $SPARK_HOME/spark-env.sh
to a location containing the configuration files.