当前位置: 首页 > 知识库问答 >
问题:

Spark挂起了Docker Mesos集群的身份验证

孟英锐
2023-03-14

我试图使用Docker和Zookeeper模拟一个多节点Mesos集群,并尝试在其上运行一个简单的(py)Spark作业。这些Docker容器和pyspark脚本都在同一台机器上运行。但是,当我执行Spark脚本时,它挂在:

No credentials provided. Attempting to register without authentication

Mesos从机不断输出:

I0929 14:59:32.925915    62 slave.cpp:1959] Asked to shut down framework 20150929-143802-1224741292-5050-33-0060 by master@172.17.0.73:5050
W0929 14:59:32.926035    62 slave.cpp:1974] Cannot shut down unknown framework 20150929-143802-1224741292-5050-33-0060

而Mesos主控器不断输出:

I0929 14:38:15.169683    39 master.cpp:2094] Received SUBSCRIBE call for framework 'test' at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b@127.0.1.1:46693
I0929 14:38:15.169845    39 master.cpp:2164] Subscribing framework test with checkpointing disabled and capabilities [  ]
E0929 14:38:15.170361    42 socket.hpp:174] Shutdown failed on fd=15: Transport endpoint is not connected [107]
I0929 14:38:15.170409    36 hierarchical.hpp:391] Added framework 20150929-143802-1224741292-5050-33-0000
I0929 14:38:15.170534    39 master.cpp:1051] Framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b@127.0.1.1:46693 disconnected
I0929 14:38:15.170549    39 master.cpp:2370] Disconnecting framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b@127.0.1.1:46693
I0929 14:38:15.170555    39 master.cpp:2394] Deactivating framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b@127.0.1.1:46693
E0929 14:38:15.170560    42 socket.hpp:174] Shutdown failed on fd=16: Transport endpoint is not connected [107]
I0929 14:38:15.170593    39 master.cpp:1075] Giving framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b@127.0.1.1:46693 0ns to failover
W0929 14:38:15.170835    41 master.cpp:4482] Master returning resources offered to framework 20150929-143802-1224741292-5050-33-0000 because the framework has terminated or is inactive
I0929 14:38:15.170855    36 hierarchical.hpp:474] Deactivated framework 20150929-143802-1224741292-5050-33-0000
I0929 14:38:15.170990    37 hierarchical.hpp:814] Recovered cpus(*):8; mem(*):31092; disk(*):443036; ports(*):[31000-32000] (total: cpus(*):8; mem(*):31092; disk(*):443036; ports(*):[31000-32000
], allocated: ) on slave 20150929-051336-1224741292-5050-19-S0 from framework 20150929-143802-1224741292-5050-33-0000
I0929 14:38:15.171820    41 master.cpp:4469] Framework failover timeout, removing framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b@127.0
.1.1:46693
I0929 14:38:15.171835    41 master.cpp:5112] Removing framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b@127.0.1.1:46693
I0929 14:38:15.172130    41 hierarchical.hpp:428] Removed framework 20150929-143802-1224741292-5050-33-0000
FROM ubuntu:14.04

ENV MESOS_V 0.24.0

# update
RUN apt-get update
RUN apt-get upgrade -y

# dependencies
RUN apt-get install -y wget openjdk-7-jdk build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev maven libapr1-dev libsvn-dev

# mesos
RUN wget http://www.apache.org/dist/mesos/${MESOS_V}/mesos-${MESOS_V}.tar.gz
RUN tar -zxf mesos-*.tar.gz
RUN rm mesos-*.tar.gz
RUN mv mesos-* mesos
WORKDIR mesos
RUN mkdir build
RUN ./configure
RUN make
RUN make install

RUN ldconfig

EXPOSE 5050

ENTRYPOINT ["/bin/bash"]
LIBPROCESS_IP=${MASTER_IP} mesos-master --registry=in_memory --ip=${MASTER_IP} --zk=zk://172.17.0.75:2181/mesos --advertise_ip=${MASTER_IP}
LIBPROCESS_IP=172.17.0.72 mesos-slave --master=zk://172.17.0.75:2181/mesos
import os
import pyspark

src = 'file:///{}/README.md'.format(os.environ['SPARK_HOME'])

leader_ip = '172.17.0.75'
conf = pyspark.SparkConf()
conf.setMaster('mesos://zk://{}:2181/mesos'.format(leader_ip))
conf.set('spark.executor.uri', 'http://d3kbcqa49mib13.cloudfront.net/spark-1.5.0-bin-hadoop2.6.tgz')
conf.setAppName('my_test_app')

sc = pyspark.SparkContext(conf=conf)

lines = sc.textFile(src)
words = lines.flatMap(lambda x: x.split(' '))
word_count = (words.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x+y))
print(word_count.collect())
15/09/29 11:07:59 INFO SparkContext: Running Spark version 1.5.0
15/09/29 11:07:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/29 11:07:59 WARN Utils: Your hostname, hubble resolves to a loopback address: 127.0.1.1; using 192.168.1.2 instead (on interface em1)
15/09/29 11:07:59 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/09/29 11:07:59 INFO SecurityManager: Changing view acls to: ftseng
15/09/29 11:07:59 INFO SecurityManager: Changing modify acls to: ftseng
15/09/29 11:07:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ftseng); users with modify permissions: Set(ftseng)
15/09/29 11:08:00 INFO Slf4jLogger: Slf4jLogger started
15/09/29 11:08:00 INFO Remoting: Starting remoting
15/09/29 11:08:00 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.2:38860]
15/09/29 11:08:00 INFO Utils: Successfully started service 'sparkDriver' on port 38860.
15/09/29 11:08:00 INFO SparkEnv: Registering MapOutputTracker
15/09/29 11:08:00 INFO SparkEnv: Registering BlockManagerMaster
15/09/29 11:08:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-28695bd2-fc83-45f4-b0a0-eefcfb80a3b5
15/09/29 11:08:00 INFO MemoryStore: MemoryStore started with capacity 530.3 MB
15/09/29 11:08:00 INFO HttpFileServer: HTTP File server directory is /tmp/spark-89444c7a-725a-4454-87db-8873f4134580/httpd-341c3da9-16d5-43a4-93ee-0e8b47389fdb
15/09/29 11:08:00 INFO HttpServer: Starting HTTP Server
15/09/29 11:08:00 INFO Utils: Successfully started service 'HTTP file server' on port 51405.
15/09/29 11:08:00 INFO SparkEnv: Registering OutputCommitCoordinator
15/09/29 11:08:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/09/29 11:08:00 INFO SparkUI: Started SparkUI at http://192.168.1.2:4040
15/09/29 11:08:00 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@log_env@716: Client environment:host.name=hubble
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@log_env@724: Client environment:os.arch=3.19.0-25-generic
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@log_env@725: Client environment:os.version=#26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@log_env@733: Client environment:user.name=ftseng
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@log_env@741: Client environment:user.home=/home/ftseng
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ftseng
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=172.17.0.75:2181 sessionTimeout=10000 watcher=0x7fc0962b7176 sessionId=0 sessionPasswd=<null> context=0x7fc078001860 flags=0
I0929 11:08:00.651923 32328 sched.cpp:164] Version: 0.24.0
2015-09-29 11:08:00,652:32221(0x7fc06bfff700):ZOO_INFO@check_events@1703: initiated connection to server [172.17.0.75:2181]
2015-09-29 11:08:00,657:32221(0x7fc06bfff700):ZOO_INFO@check_events@1750: session establishment complete on server [172.17.0.75:2181], sessionId=0x150177fcfc40014, negotiated timeout=10000
I0929 11:08:00.658051 32322 group.cpp:331] Group process (group(1)@127.0.1.1:48692) connected to ZooKeeper
I0929 11:08:00.658083 32322 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0929 11:08:00.658100 32322 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I0929 11:08:00.659600 32326 detector.cpp:156] Detected a new leader: (id='2')
I0929 11:08:00.659904 32325 group.cpp:674] Trying to get '/mesos/json.info_0000000002' in ZooKeeper
I0929 11:08:00.661052 32326 detector.cpp:481] A new leading master (UPID=master@172.17.0.73:5050) is detected
I0929 11:08:00.661201 32320 sched.cpp:262] New master detected at master@172.17.0.73:5050
I0929 11:08:00.661798 32320 sched.cpp:272] No credentials provided. Attempting to register without authentication

共有1个答案

燕智
2023-03-14

经过更多的实验,看起来应该使用Docker主机IP(172.17.xx.xx)时,主机的IP地址(使用其本地网络地址192.168.xx.xx)出现了问题。

我设法让事情运转起来:

LIBPROCESS_IP=172.17.xx.xx python test_spark.py

我现在碰到了一个不同的错误,但它似乎无关,所以我认为这个命令解决了我的问题。

 类似资料:
  • 我已经在Kubernetes上建立了Spark独立集群,并试图连接到Kubernetes上没有的Kerberized Hadoop集群。我已经将core-site.xml和hdfs-site.xml放在Spark集群的容器中,并相应地设置了HADOOP_CONF_DIR。我能够成功地在Spark容器中为访问Hadoop集群的principal生成kerberos凭据缓存。但是当我运行spark-s

  • 15/02/06 15:17:12警告IPC.Client:连接到服务器时遇到异常:javax.security.sasl.saslexception:GSS initiate失败[由GSSException:未提供有效凭据(机制级别:找不到任何Kerberos tgt导致)]LS:本地异常失败:java.io.ioException:javax.security.sasl.saslexcepti

  • 我的环境设置(AWS EKS)的方式是,在我的中,用户有一个exec配置来调用。 这样,当运行时,它将请求令牌对Kubernetes集群进行身份验证。 我目前正在编写一个将与Kubernetes API交互的客户端应用程序。这是用Python编写的,使用的是官方的Python客户端。 在执行任何示例时,我得到的错误是不允许执行某个操作(例如,列表pods)。我认为问题的根源在于我需要将中的令牌传递

  • 我对Web浏览器的基本身份验证有点困惑。我原以为Web浏览器只会在之前的响应中收到HTTP 401状态后发送授权标头。然而,似乎Chrome在之后的每个请求中都发送授权标头。它包含我曾经输入的数据,以响应我网站上的401,并与每条消息一起发送(根据Chrome和我的Web服务器附带的开发人员工具)。这是预期的行为吗?我应该在我的401中使用一些标头来推断不应该缓存授权内容吗?我目前正在使用WWW-

  • 我们现在正在进行的项目是使用SAML令牌通过ADFS进行单点登录<此项目应遵循以下基本规则: 1。代理使用其凭据登录到windows 2。代理登录到web应用程序(依赖方) 3。web应用程序应重定向到ADFS中的STS(Active Directory是身份提供程序),并使用代理在其windows身份验证(无缝身份验证)中使用的凭据登录 4。因此,不应显示STS登录页面,并且应验证用户身份。之后

  • 我有Kerberos并启用了Hadoop集群。我需要使用Java代码执行HDFS操作。 多谢了。