问题：

字符串相等但行为不同：如果文件字符串来自HDFS文件，则它们不适用于火花

方绪

2023-03-14

这个问题更容易用代码内联解释：

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val sqlcontext = new org.apache.spark.sql.SQLContext(sc)

// I have a file (fileToProcess) in HDFS that contains the name of another HDFS file:
val fs = FileSystem.get(new Configuration())
val fileToProcess = "hdfs://production/user/robin/fileToProcess"
// The contents of fileToProcess is just the name of another file.  In this case
// hdfs://production/user/robin/people
// I want to read hdfs://production/user/robin/people and use it in the data frame as coded below.
// However, if I do this by reading the HDFS file (fileToProcess) to get the file name like this:
val file = fs.open(new Path(fileToProcess)).readLine().toString

// I will get a Task not serializable exception on the last line of the script

// If I hardcode the file name like this:
val file2 = "hdfs://production/user/robin/people"
// It works great; however, I can't do this as I don't know the file I need to read in reality
// file2 and file are both Strings seem equal in every way so I am really perplexed!

// Here is what I am doing with the file to get the exception

// The contents of people:
// { "person" : "John"}
// { "person" : "Sue"}
// { "person" : "Jean"}
// { "person" : "Jane"}
// { "person" : "John"}

val df = sqlcontext.read.json(file)
val peopleList = df.map(r => r(0).toString).distinct.collect
val anotherList = sc.parallelize(Array("Jean", "Sue", "Bill"))

val peopleListBroadcast = sc.broadcast(peopleList)

// everything works great up to this point

val filteredPeople = anotherList.filter(x=> peopleListBroadcast.value contains x)
// here I get a Task not serializable exception if I use the file name read from the HDFS file but it works fine if I hardcode it (like with file2)

我已经被困在这个舞台问题上好几天了。我似乎也找不到工作。有我看不到的差异吗？两个相等的字符串怎么会表现得如此不同呢。请帮帮我，我要发疯了，想办法解决这个问题！

我得到的具体例外是：

原因：java.io.NotSerializableException：org.apache.hadoop.hdfs.分布式文件系统序列化堆栈：-对象不可序列化（类：org.apache.hadoop.hdfs.分布式文件系统，值：DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1011603383_1，ugi=robin（身份：SIMTER)]]) - 字段（类：$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$iwC$$iwC$iwC，名称：fs，类型：类org.apache.hadoop.fs.文件系统）

顺便说一句，我正在使用Spark 1.6.1和Scala 2.10.5。我认为任何人都应该能够通过在hdfs中制作这两个文件，然后将上面的代码放在spack-shell中来重新创建它

谢谢，罗宾

共有2个答案

危寒

2023-03-14

另一种选择是

fs = sc.textFile(fileToProcess,1).collect()

这将为您提供文件名

穆季萌

2023-03-14

它与字符串无关。您将org的一个实例放入范围。阿帕奇。hadoop。财政司司长。文件系统不可序列化。将其标记为瞬态应能解决此特定问题：

@transient val fs = FileSystem.get(new Configuration())

类似资料：

字符串不等于字符串？

问题内容： String[] letters = {“A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “L”}; 为什么是 Fk呀！输入AL字母之一绝对不会发生？问题答案：字符串是对象。所述通过引用，而不是由它们的内部值进行比较的对象。有两种解决方案：使用method来比较两个对象的值。使用代替。这是原始的，因此可以使用。
字符串不等于事件它实际上等于

我的应用程序有一个自动更新功能。为了验证它是否成功下载了文件，我比较了两个哈希，一个是xml，另一个是下载后生成的哈希。两个哈希是一样的，但我发现两个哈希不一样。当我检查大小时，xml哈希字符串有66个，另一个是36个。我使用了trim方法，但还是运气不好。哈希代码
初始化字符串与串联字符串不同，但打印时结果相同

因此，我为用户输入创建了一个reg ex模式和匹配器，并使用串联来产生将用于Pattern.compile()的字符串。我不能匹配模式，但是当我打印连接的结果并把它放入Pattern.compile时，它匹配模式。我尝试了不同的连接方式，但还是不匹配。奇怪的是，它在以下情况下起作用：但是打印inputPattern会产生和产生不同的结果
用文字字符串sed-不输入文件

问题内容：这应该很容易：我想对文字字符串而不是输入文件运行 sed 。如果您想知道为什么，例如是编辑存储在变量中的值，而不必是文本数据。当我做：其中A，B，C是我要更改为A’，’B’，’C的文字我懂了好像它认为A，B，C是一个文件。我试着用管道将其回显：我得到提示。正确的做法是什么？问题答案：您有单引号冲突，因此请使用：如果使用bash，您也可以这样做（是）：但不是因为期
String.replaceAll（）不适用于某些文件和字符串[重复]

这是代码: stringOld： StringNew：输出：字符串字符串Old 与文件中的某一行匹配，但不会替换为 stringNew。如果我对另一个字符串使用相同的方法，并且在另一个文件上，它可以正常工作
Unity TextMeshPro文本和字符串不相同

基本上，我试图比较两个字符串，如下所示；当用户点击我的“加入”按钮并从textmeshpro输入字段获取房间名称时，OnJoinRoom会触发。总之，我尝试创建room和其他用户类型room名称和连接。当我尝试创建名称为“123”的room并加入“normalString”时，它加入了该room。但当我从roomName获得值并尝试加入时，它失败了。我确定我在输入字段中输入了“123”。下面是控

字符串相等但行为不同：如果文件字符串来自HDFS文件，则它们不适用于火花

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档