问题：

转储数据集时将数据从配置单元加载到Pig错误

黄德明

2023-03-14

retail_db.categories有58行

$pig -useHCatalog
grunt> pcategories = LOAD 'retail_db.categories' USING org.apache.hive.hcatalog.pig.HCatLoader();
grunt>b = limit pcategories 100;
grunt>dump b;

然后我将获取所有记录，但当我试图转储原始数据集时

grunt>dump pcategories;

那我就犯错了

失败了！

失败的作业：JobId别名功能消息输出job_1523787662857_0004 pcategories MAP_ONLY消息：作业失败！hdfs:/localhost:9000/tmp/temp-1113251818/tmp-83503168，

输入：无法从“RETAIL_DB.categories”读取数据

计数器：写入的记录总数：0写入的字节总数：0可溢出内存管理器溢出计数：0主动溢出的包总数：0主动溢出的记录总数：0

作业DAG:job_1523787662857_0004

2018-04-15 16:28:27,828[main]INFO org.apache.pig.backend.hadoop.executionengine.MapReduceLayer.MapReduceLauncher-失败！2018-04-15 16:28:27,836[main]错误org.apache.pig.tools.grunt.grunt-错误1066：无法打开日志文件中别名pcategories详细信息的迭代器：/home/jay/pig_1523787729987.log

AM Container for appattempt_1523799060075_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2018-04-15 19:02:58.344]Exception from container-launch.
Container id: container_1523799060075_0001_02_000001
Exit code: 1
[2018-04-15 19:02:58.348]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2018-04-15 19:02:58.348]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://jay-Lenovo-Z50-70:8088/cluster/app/application_1523799060075_0001 Then click on links to logs of each attempt.  this what get after clicking the link

共有1个答案

班凌

2023-03-14

对我来说很管用。我运行了以下命令

$pig -useHCatalog
grunt> pcategories = LOAD 'hive_testing.address' USINGorg.apache.hive.hcatalog.pig.HCatLoader();
grunt>dump pcategories

在这里，我在数据库中创建了一个虚拟地址表

输出

类似资料：

将数据从txt表加载到orc表配置单元查询

我的主要目标是创建一个存储为ORC的表。为此，我遵循了以下步骤我创建了一个文件夹/user/hive/external，并在同一位置创建了两个表(table_txt和table_orc)。直到将数据加载到table_txt中，它才是好的。 2-当我查询table_txt时，为什么它没有给出任何数据？而在查询table_orc时，我会得到结果？
无法使用pyspark将数据加载到配置单元中

无法通过jupyter笔记本使用pyspark将数据写入hive。给我下面的错误 Py4JJavaError：调用o99.saveAsTable时发生错误。：org.apache.spark.sql.分析异常：java.lang.运行时异常：java.lang.运行时异常：无法实例化org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreCl
将数据从google云存储加载到BigQuery

我需要从Google Cloud Storage（GCS->Temp Table->Main table)中加载100个表到BigQuery。我创建了一个python进程，将数据加载到BigQuery中，并在AppEngine中进行调度。因为AppEngine最多有10min的超时时间。我已经在异步模式下提交了作业，并在稍后的时间点检查了作业状态。由于我有100个表，需要创建一个监控系统来检查作业
数据未加载到配置单元中的分区表中

我试图为我的表创建分区，以便更新一个值。这是我的样本数据我想把珍妮特的部门更新到B。为此，我创建了一个以Department为分区的表。创建外部表trail（EmployeeID Int、FirstName String、Designation String、Salary Int），按（Department String）行格式分隔字段进行分区，以“，”location'/user/sre
hadoop PIG：无法加载sqooped数据

我将一个非常简单的mysql表（2列，'key'和'label')sqooping到HDFS。当我查看数据时，这似乎很有效： java.io.ioException：ExecException：无法设置加载函数。在org.apache.pig.pigserver.getExamples(pigServer.java:1204)，在org.apache.pig.tools.grunt.gruntPa
配置单元将数据配置为结构数组

问题内容：我试图找出一种在Hive中从平面源中选择数据并将其输出到一个名为struct的数组中的方法。这是我正在寻找的示例… 样本数据：所需的输出：我尝试了collect_list和collect_set，但是它们仅允许原始数据类型。关于如何在Hive中进行此操作有任何想法吗？问题答案：我会使用这个jar，它是的更好的实现（并需要复杂的数据类型）。查询：输出：

转储数据集时将数据从配置单元加载到Pig错误

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档