当前位置: 首页 > 工具软件 > Spark-Store > 使用案例 >

修改Spark-shell日志打印级别并使用Spark-shell和Spark-submit提交Sparkstream程序

管景天
2023-12-01

1. 修改Spark-shell日志打印级别

如果觉得 shell 中输出的日志信息过多而使人分心,可以调整日志的级别来控制输出的信息量。你需要在 conf 目录下创建一个名为 log4j.properties 的文件来管理日志设置。Spark开发者们已经在 Spark 中加入了一个日志设置文件的模版,叫作 log4j.properties.template
要让日志看起来不那么啰嗦,可以先把这个日志设置模版文件复制一份到 conf/log4j.properties 来作为日志设置文件,接下来找到下面这一行:log4j.rootCategory=INFO, console
然后通过下面的设定降低日志级别,
只显示警告及更严重的信息:log4j.rootCategory=WARN, console
这时再打开 shell,你就会看到输出大大减少

[hadoop@hadoop000 ~]$ spark-shell --master local[2]
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/20 17:30:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/20 17:30:37 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.43.174:4040
Spark context available as 'sc' (master = local[2], app id = local-1574242233951).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

2. 使用Spark-shellSpark-submit提交Sparkstream程序从词频统计入手

基于https://blog.csdn.net/qq_35885488/article/details/102667468
spark-submit执行(生产)

./spark-submit  --master local[2] \
--class org.apache.spark.examples.streaming.NetworkWordCount   \
--name NetworkWordCount  \
/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/examples/jars/spark-examples_2.11-2.2.0.jar hadoop000 9999

spark-shell执行(测试)

./spark-shell --master local[2]

粘贴代码

import org.apache.spark.streaming.{Seconds, StreamingContext}
val ssc = new StreamingContext(sc, Seconds(3))
    val lines = ssc.socketTextStream("hadoop000", 9999)
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
    wordCounts.print()
    ssc.start()
    ssc.awaitTermination()
 类似资料: