Spark-shell
日志打印级别如果觉得 shell 中输出的日志信息过多而使人分心,可以调整日志的级别来控制输出的信息量。你需要在 conf
目录下创建一个名为 log4j.properties
的文件来管理日志设置。Spark开发者们已经在 Spark 中加入了一个日志设置文件的模版,叫作 log4j.properties.template
。
要让日志看起来不那么啰嗦,可以先把这个日志设置模版文件复制一份到 conf/log4j.properties 来作为日志设置文件,接下来找到下面这一行:log4j.rootCategory=INFO, console
然后通过下面的设定降低日志级别,
只显示警告及更严重的信息:log4j.rootCategory=WARN, console
这时再打开 shell,你就会看到输出大大减少
[hadoop@hadoop000 ~]$ spark-shell --master local[2]
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/20 17:30:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/20 17:30:37 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.43.174:4040
Spark context available as 'sc' (master = local[2], app id = local-1574242233951).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
Spark-shell
和Spark-submit
提交Sparkstream
程序从词频统计入手基于https://blog.csdn.net/qq_35885488/article/details/102667468
spark-submit
执行(生产)
./spark-submit --master local[2] \
--class org.apache.spark.examples.streaming.NetworkWordCount \
--name NetworkWordCount \
/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.2.0.jar hadoop000 9999
spark-shell执行(测试)
./spark-shell --master local[2]
粘贴代码
import org.apache.spark.streaming.{Seconds, StreamingContext}
val ssc = new StreamingContext(sc, Seconds(3))
val lines = ssc.socketTextStream("hadoop000", 9999)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()