val splitSeed = 5043
val Array(trainingData, testData) = df3.randomSplit(Array(0.7, 0.3), splitSeed)
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
trainingData.show(20);
// Fit the model
val model = lr.fit(trainingData)
// Print the coefficients and intercept for logistic regression
println(s"Coefficients: ${model.coefficients} Intercept: ${model.intercept}")
// run the model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
testData.show()
// run the model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
predictions.show()
// use MLlib to evaluate, convert DF to RDD**
val myRdd = predictions.select("rawPrediction", "label").rdd
val predictionAndLabels = myRdd.map(x => (x(0).asInstanceOf[DenseVector](1), x(1).asInstanceOf[Double]))
// Instantiate metrics object
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
println("area under the precision-recall curve: " + metrics.areaUnderPR)
println("area under the receiver operating characteristic (ROC) curve : " + metrics.areaUnderROC)
// A Precision-Recall curve plots (precision, recall) points for different threshold values, while a
// receiver operating characteristic, or ROC, curve plots (recall, false positive rate) points.
// The closer the area Under ROC is to 1, the better the model is making predictions.**
当我试图了解areaunderpr
属性时,我遇到了以下错误:
我的预测。显示结果:
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
| id|thickness|size|shape|madh|epsize|bnuc|bchrom|nNuc|mit|clas|clasLogistic| features|label| rawPrediction| probability|prediction|
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
| 63375| 9.0| 1.0| 2.0| 6.0| 4.0|10.0| 7.0| 7.0|2.0| 4| 1|[9.0,1.0,2.0,6.0,...| 1.0|[0.36391634252951...|[0.58998813846052...| 0.0|
|128059| 1.0| 1.0| 1.0| 1.0| 2.0| 5.0| 5.0| 1.0|1.0| 2| 0|[1.0,1.0,1.0,1.0,...| 0.0|[0.81179252636135...|[0.69249134920886...| 0.0|
|145447| 8.0| 4.0| 4.0| 1.0| 2.0| 9.0| 3.0| 3.0|1.0| 4| 1|[8.0,4.0,4.0,1.0,...| 1.0|[0.06964047482828...|[0.51740308582457...| 0.0|
|183913| 1.0| 2.0| 2.0| 1.0| 2.0| 1.0| 1.0| 1.0|1.0| 2| 0|[1.0,2.0,2.0,1.0,...| 0.0|[0.96139876234944...|[0.72340177322811...| 0.0|
|342245| 1.0| 1.0| 3.0| 1.0| 2.0| 1.0| 1.0| 1.0|1.0| 2| 0|[1.0,1.0,3.0,1.0,...| 0.0|[0.95750903648839...|[0.72262279564412...| 0.0|
|434518| 3.0| 1.0| 1.0| 1.0| 2.0| 1.0| 2.0| 1.0|1.0| 2| 0|[3.0,1.0,1.0,1.0,...| 0.0|[1.10995557408198...|[0.75212082898242...| 0.0|
|493452| 1.0| 1.0| 3.0| 1.0| 2.0| 1.0| 1.0| 1.0|1.0| 2| 0|[1.0,1.0,3.0,1.0,...| 0.0|[0.95750903648839...|[0.72262279564412...| 0.0|
|508234| 7.0| 4.0| 5.0|10.0| 2.0|10.0| 3.0| 8.0|2.0| 4| 1|[7.0,4.0,5.0,10.0...| 1.0|[-0.0809133769755...|[0.47978268474014...| 1.0|
|521441| 5.0| 1.0| 1.0| 2.0| 2.0| 1.0| 2.0| 1.0|1.0| 2| 0|[5.0,1.0,1.0,2.0,...| 0.0|[1.10995557408198...|[0.75212082898242...| 0.0|
|527337| 4.0| 1.0| 1.0| 1.0| 2.0| 1.0| 1.0| 1.0|1.0| 2| 0|[4.0,1.0,1.0,1.0,...| 0.0|[1.11079628977456...|[0.75227753466134...| 0.0|
|534555| 1.0| 1.0| 1.0| 1.0| 2.0| 1.0| 1.0| 1.0|1.0| 2| 0|[1.0,1.0,1.0,1.0,...| 0.0|[1.11079628977456...|[0.75227753466134...| 0.0|
|535331| 3.0| 1.0| 1.0| 1.0| 3.0| 1.0| 2.0| 1.0|1.0| 2| 0|[3.0,1.0,1.0,1.0,...| 0.0|[1.10995557408198...|[0.75212082898242...| 0.0|
|558538| 4.0| 1.0| 3.0| 3.0| 2.0| 1.0| 1.0| 1.0|1.0| 2| 0|[4.0,1.0,3.0,3.0,...| 0.0|[0.95750903648839...|[0.72262279564412...| 0.0|
|560680| 1.0| 1.0| 1.0| 1.0| 2.0| 1.0| 1.0| 1.0|1.0| 2| 0|[1.0,1.0,1.0,1.0,...| 0.0|[1.11079628977456...|[0.75227753466134...| 0.0|
|601265| 10.0| 4.0| 4.0| 6.0| 2.0|10.0| 2.0| 3.0|1.0| 4| 1|[10.0,4.0,4.0,6.0...| 1.0|[-0.0034290346398...|[0.49914274218002...| 1.0|
|603148| 4.0| 1.0| 1.0| 1.0| 2.0| 1.0| 1.0| 1.0|1.0| 2| 0|[4.0,1.0,1.0,1.0,...| 0.0|[1.11079628977456...|[0.75227753466134...| 0.0|
|606722| 5.0| 5.0| 7.0| 8.0| 6.0|10.0| 7.0| 4.0|1.0| 4| 1|[5.0,5.0,7.0,8.0,...| 1.0|[-0.3103173938140...|[0.42303726852941...| 1.0|
|616240| 5.0| 3.0| 4.0| 3.0| 4.0| 5.0| 4.0| 7.0|1.0| 2| 0|[5.0,3.0,4.0,3.0,...| 0.0|[0.43719456056061...|[0.60759034803682...| 0.0|
|640712| 1.0| 1.0| 1.0| 1.0| 2.0| 1.0| 2.0| 1.0|1.0| 2| 0|[1.0,1.0,1.0,1.0,...| 0.0|[1.10995557408198...|[0.75212082898242...| 0.0|
|654546| 1.0| 1.0| 1.0| 1.0| 2.0| 1.0| 1.0| 1.0|8.0| 2| 0|[1.0,1.0,1.0,1.0,...| 0.0|[1.11079628977456...|[0.75227753466134...| 0.0|
+------+---------+----+-----+----+------+----+------+----+---+----+------------+--------------------+-----+--------------------+--------------------+----------+
only showing top 20 rows
我在这里看到的一个错误是,您将rawprediction
列传递给BinaryClassificationMetrics
对象,而不是prediction
列。rawprediction
包含一个数组,每个类都具有某种“概率”,而BinaryClassificationMetrics
需要一个由签名指定的双值:
new BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)])
你可以在这里看到细节。
我已经用这个修改做了一个快速测试,它似乎有效,下面是代码片段:
import org.apache.spark.sql.{Encoders, SparkSession}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.feature.StringIndexer
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.functions._
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
case class Obs(id: Int, thickness: Double, size: Double, shape: Double, madh: Double,
epsize: Double, bnuc: Double, bchrom: Double, nNuc: Double, mit: Double, clas: Double)
val obsSchema = Encoders.product[Obs].schema
val spark = SparkSession.builder
.appName("StackoverflowQuestions")
.master("local[*]")
.getOrCreate()
// Implicits necessary to transform DataFrame to Dataset using .as[] method
import spark.implicits._
val df = spark.read
.schema(obsSchema)
.csv("breast-cancer-wisconsin.data")
.drop("id")
.withColumn("clas", when(col("clas").equalTo(4.0), 1.0).otherwise(0.0))
.na.drop() // Make sure to drop nulls, or the feature assemble will fail
//define the feature columns to put in the feature vector**
val featureCols = Array("thickness", "size", "shape", "madh", "epsize", "bnuc", "bchrom", "nNuc", "mit")
//set the input and output column names**
val assembler = new VectorAssembler().setInputCols(featureCols).setOutputCol("features")
//return a dataframe with all of the feature columns in a vector column**
val df2 = assembler.transform(df)
// Create a label column with the StringIndexer**
val labelIndexer = new StringIndexer().setInputCol("clas").setOutputCol("label")
val df3 = labelIndexer.fit(df2).transform(df2)
val splitSeed = 5043
val Array(trainingData, testData) = df3.randomSplit(Array(0.7, 0.3), splitSeed)
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
trainingData.show(20);
// Fit the model
val model = lr.fit(trainingData)
// Print the coefficients and intercept for logistic regression
println(s"Coefficients: ${model.coefficients} Intercept: ${model.intercept}")
// run the model on test features to get predictions**
val predictions = model.transform(testData)
//As you can see, the previous model transform produced a new columns: rawPrediction, probablity and prediction.**
predictions.show(truncate=false)
// use MLlib to evaluate, convert DF to RDD**
val predictionAndLabels = predictions.select("prediction", "label").as[(Double, Double)].rdd
// Instantiate metrics object
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
println("area under the precision-recall curve: " + metrics.areaUnderPR)
println("area under the receiver operating characteristic (ROC) curve : " + metrics.areaUnderROC)
属性与Java中的字段是相同的,但是更加强大。属性做的事情是字段加上getter加上setter。我们通过一个例子来比较他们的不同之处。这是Java中字段安全访问和修改所需要的代码: public class Person { private String name; public String getName() { return name; }
问题内容: 我知道有很多关于此的帖子,但是我找不到我的特定问题的答案。 我想让JS变量成为HTML属性的值 VARIABLE HERE是我想要screenWidth变量去的地方。最好的办法是什么? 谢谢,本 问题答案: 这应该工作:
译者:阿远 每个 torch.Tensor 对象都有以下几个属性: torch.dtype, torch.device, 和 torch.layout。 torch.dtype class torch.dtype torch.dtype 属性标识了 torch.Tensor的数据类型。PyTorch 有八种不同的数据类型: Data type dtype Tensor types 32-bit
我正在构建一个以城市为节点的图表,边缘是连接这些节点的主要公路。 “我的边”(My edge)属性是公路的长度以及从起点到目标节点的旅行时间估计值。 NetworkX有计算距离度量的算法,如Diameter(最远节点之间的最短路径)、偏心率(从一个节点到所有其他节点的最大距离)和半径(整个网络的最大偏心率)。 是否可以使用我上传到网络的边缘属性(如以英里为单位的距离和以分钟为单位的时间)来计算这些
我创建了我的自定义用户模型。在执行迁移时,我会收到一个ATRIBUTEERROR 例外是: 回溯(最近一次呼叫最后一次): 文件"manage.py",第22行,execute_from_command_line(sys.argv) 文件“C:\Users\Nutzer\AppData\Local\Programs\Python\Python36-32\lib\site packages\djan
变量可以很简单地定义成可变(var)和不可变(val)的变量。这个与Java中使用的final很相似。但是不可变在Kotlin(和其它很多现代语言)中是一个很重要的概念。 一个不可变对象意味着它在实例化之后就不能再去改变它的状态了。如果你需要一个这个对象修改之后的版本,那就会再创建一个新的对象。这个让编程更加具有健壮性和预估性。在Java中,大部分的对象是可变的,那就意味着任何可以访问它这个对象的