问题：

Spark ml模型保存到hdfs

束志业

2023-03-14

我正在尝试将我的模型保存为从spark ml库创建的对象。

但是,它给了我一个错误：

以下是我的依赖项：

    <dependency>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest_2.10</artifactId>
        <version>2.1.7</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.3</version>
        <type>maven-plugin</type>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-parser-combinators</artifactId>
        <version>2.11.0-M4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.6.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-csv</artifactId>
        <version>1.2</version>
    </dependency>

    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.10</artifactId>
        <version>1.4.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.10</artifactId>
        <version>1.6.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.10</artifactId>
        <version>1.6.0</version>
    </dependency>

我还想将从模型生成的dataframe保存为CSV。

model.transform(df).select("features","label","prediction").show()



import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._

import org.apache.spark.SparkConf

import org.apache.spark.sql.hive.HiveContext


import org.apache.spark.ml.feature.OneHotEncoder
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.PipelineModel._
import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer}
import org.apache.spark.ml.util.MLWritable

object prediction {
  def main(args: Array[String]): Unit = {

     val conf = new SparkConf()
             .setMaster("local[2]")
             .setAppName("conversion")
    val sc = new SparkContext(conf)

    val hiveContext = new HiveContext(sc)

    val df = hiveContext.sql("select * from prediction_test")
    df.show()
    val credit_indexer = new StringIndexer().setInputCol("transaction_credit_card").setOutputCol("creditCardIndex").fit(df)
    val category_indexer = new StringIndexer().setInputCol("transaction_category").setOutputCol("categoryIndex").fit(df)
    val location_flag_indexer = new StringIndexer().setInputCol("location_flag").setOutputCol("locationIndex").fit(df)
    val label_indexer = new StringIndexer().setInputCol("fraud").setOutputCol("label").fit(df)

    val assembler =  new VectorAssembler().setInputCols(Array("transaction_amount", "creditCardIndex","categoryIndex","locationIndex")).setOutputCol("features")
    val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01)
    val pipeline = new Pipeline().setStages(Array(credit_indexer, category_indexer, location_flag_indexer, label_indexer, assembler, lr))

    val model = pipeline.fit(df)

    pipeline.save("/user/f42h/prediction/pipeline")
    model.save("/user/f42h/prediction/model")
 //   val sameModel = PipelineModel.load("/user/bob/prediction/model")
    model.transform(df).select("features","label","prediction")

  }
}

共有1个答案

席宜修

2023-03-14

您使用的是Spark1.6.0和afaik，ml模型的保存/加载仅从2.0开始可用。您可以使用2.0.0-preview版本的项目进行预览:http://search.maven.org/#search%7cga%7c1%7cg%3aorg.apache.spark%20v%3a2.0.0-preview

类似资料：

Hibernate将用户模型保存到Postgres

问题内容：我正在通过Hibernate（注释）使用Postgres，但似乎已经无法处理User对象了：如果我手动运行SQL，则必须在表名两边加上引号，因为用户似乎是postgres关键字，但是如何说服hibernate自己做呢？提前致谢。问题答案：使用保留关键字时，需要转义表名。在JPA 1.0中，没有标准化的方法，而Hibernate特定的解决方案是使用反引号：在JPA 2.0中，标
保存和加载模型

译者 bruce1408 作者: Matthew Inkawhich 本文提供有关Pytorch模型保存和加载的各种用例的解决方案。您可以随意阅读整个文档，或者只是跳转到所需用例的代码部分。当保存和加载模型时，有三个核心功能需要熟悉： torch.save: 将序列化对象保存到磁盘。此函数使用 Python 的pickle模块进行序列化。使用此函数可以保存如模型、tensor、字典等各种对象。
SparkML（Scala）中的并行训练独立模型

假设我有3个简单的SparkML模型，它们将使用相同的数据帧作为输入，但彼此完全独立（在运行序列和使用的数据列中）。我想到的第一件事是，只需使用阶段数组中的3个模型创建一个管道数组，然后运行总体拟合/变换来获得完整的预测等等。但是，我的理解是，因为我们将这些模型作为序列堆叠在单个管道中，Spark不一定会并行运行这些模型，即使它们彼此完全独立。也就是说，有没有办法并行拟合/转换3个独立模型？
解决pytorch 保存模型遇到的问题

本文向大家介绍解决pytorch 保存模型遇到的问题，包括了解决pytorch 保存模型遇到的问题的使用技巧和注意事项，需要的朋友参考一下今天用pytorch保存模型时遇到bug Can't pickle <class 'torch._C._VariableFunctions'> 在google上查找原因，发现是保存时保存了整个模型的原因，而模型中有一些自定义的参数将 torch.save(m
保存后，保存前等的laravel模型回调

问题内容： Laravel中是否有回调，例如：我搜索了但什么也没找到。如果没有这样的东西-实施它的最佳方法是什么？谢谢！问题答案：实际上，Laravel在保存|更新|创建某些模型之前具有真实的回调。检查一下： https://github.com/laravel/laravel/blob/3.0/laravel/database/eloquent/model.php#L362 像保存和保存
Django的。覆盖模型保存

问题内容：在保存模型之前，我需要重新调整图片大小。但是，如何检查是否添加了新图片或仅更新了说明，以便每次保存模型时都可以跳过重新缩放？我只想在加载新图像或更新图像时重新缩放，而在更新说明时不想要。问题答案：一些想法：不确定是否可以在所有伪自动django工具中正常运行（例如：ModelForm，contrib.admin等）。

Spark ml模型保存到hdfs

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档