HDP-2.4.2.0-258安装使用Ambari2.2.2.0
我必须导入几个SQL Server模式,这些模式应该可以通过Hive、Pig、MR和任何第三方(将来)访问。我决定在HCatalog中导入。
Sqoop提供了导入到Hive或HCatalog的方法,我想如果我导入到HCatalog,同一个表将可以从Hive CLI、MR和Pig访问(请评估我的假设)。
问题:
问题:我希望一步到位:
我执行了以下命令:
-bash-4.2$ sqoop import --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password --table SettingAttribute -- --schema Administration --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database default --hcatalog-table SettingAttribute --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
源表包含109条记录,这些记录被提取:
16/08/10 15:02:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
16/08/10 15:02:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/10 15:02:28 INFO manager.SqlManager: Using default fetchSize of 1000
16/08/10 15:02:28 INFO manager.SQLServerManager: We will use schema Administration
16/08/10 15:02:28 INFO tool.CodeGenTool: Beginning code generation
16/08/10 15:02:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [Administration].[SettingAttribute] AS t WHERE 1=0
16/08/10 15:02:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce
Note: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/10 15:02:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.jar
16/08/10 15:02:30 INFO mapreduce.ImportJobBase: Beginning import of SettingAttribute
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/10 15:02:31 INFO impl.TimelineClientImpl: Timeline service address: http://l4373t.sss.com:8188/ws/v1/timeline/
16/08/10 15:02:31 INFO client.RMProxy: Connecting to ResourceManager at l4283t.sss.com/138.106.9.80:8050
16/08/10 15:02:33 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/10 15:02:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([SettingAttributeId]), MAX([SettingAttributeId]) FROM [Administration].[SettingAttribute]
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: number of splits:4
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467787344827_0013
16/08/10 15:02:34 INFO impl.YarnClientImpl: Submitted application application_1467787344827_0013
16/08/10 15:02:34 INFO mapreduce.Job: The url to track the job: http://l4283t.sss.com:8088/proxy/application_1467787344827_0013/
16/08/10 15:02:34 INFO mapreduce.Job: Running job: job_1467787344827_0013
16/08/10 15:02:41 INFO mapreduce.Job: Job job_1467787344827_0013 running in uber mode : false
16/08/10 15:02:41 INFO mapreduce.Job: map 0% reduce 0%
16/08/10 15:02:47 INFO mapreduce.Job: map 100% reduce 0%
16/08/10 15:02:48 INFO mapreduce.Job: Job job_1467787344827_0013 completed successfully
16/08/10 15:02:48 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=616636
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=540
HDFS: Number of bytes written=10079
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=16132
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=16132
Total vcore-seconds taken by all map tasks=16132
Total megabyte-seconds taken by all map tasks=66076672
Map-Reduce Framework
Map input records=109
Map output records=109
Input split bytes=540
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=320
CPU time spent (ms)=6340
Physical memory (bytes) snapshot=999870464
Virtual memory (bytes) snapshot=21872697344
Total committed heap usage (bytes)=943194112
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=10079
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Transferred 9.8428 KB in 17.2115 seconds (585.597 bytes/sec)
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Retrieved 109 records.
文件是在我的用户下创建的:
hdfs dfs -ls /user/ojoqcu/SettingAttribute
Found 5 items
-rw------- 3 ojoqcu hdfs 0 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/_SUCCESS
-rw------- 3 ojoqcu hdfs 8378 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00000
-rw------- 3 ojoqcu hdfs 144 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00001
-rw------- 3 ojoqcu hdfs 1123 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00002
-rw------- 3 ojoqcu hdfs 434 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00003
我在HCatalog(或蜂巢)中看不到任何东西
-bash-4.2$ /usr/hdp/2.4.2.0-258/hive-hcatalog/bin/hcat -e "show tables in default;"
WARNING: Use "yarn jar" to launch YARN applications.
16/08/10 15:07:12 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
OK
Time taken: 2.007 seconds
是否存在授权问题?
我检查了var/log,但Sqoop、Hive-Hcatalog和Hive都不存在,我如何查看授权问题并修复它?
嗯,我不确定这是授权问题还是仅仅是解析问题,或者两者兼而有之。我做了以下工作,它起作用了:
>
su蜂巢
执行了以下命令(可能,----schema
应该是最后一个arg,Sqoop只是忽略/中断之后!)
sqoop导入--hcatalog-home /usr/hdp/current/hive-webhcat--hcatalog-数据库FleetManagement_Ape--hcatalog-table数据库日志--create-hcatalog-table--hcatalog-存储-节存储为orcfile--link'jdbc: sqlserver://
已成功执行作业E=但找不到数据。此外,检查了在hcatalog中创建的表的位置,检查该位置后发现没有为此创建任何目录,并且只找到了一个0字节的文件_$folder$。
主要内容:1.Mysql到HDFS,2.Mysql到Hbase,3.Mysql到HiveSqoop是Hadoop生态圈里很重要的一员。很重要的工具 工具就是为了解决某些问题的。 主要解决的场景问题是数据从关系型数据库到Hadoop生态系统的过程。 因为要推广Hadoop所以要有数据,Sqoop就是数据的来源。 1.Mysql到HDFS sqoop import 导入 -connect 连接mysql -username mysql的username -password mysql的密
Im试图使用SQOOP将一个小表从mysql导出到HDFS,该表有两列id(主键)和名称。我可以通过SQOOP列出数据库和表。但在将表导入到HDFS时出现异常。好心的帮助。下面是错误日志。
主要内容:1.HDFS到Mysql,2.Hbase到Mysql,3.Job1.HDFS到Mysql export为导入 export-dir为连接的HDFS是数据目录 table为数据库Mysql的表名 2.Hbase到Mysql 由于没有提供直接导出的接口 所以采取迂回战术 可以将Hbase先导入到HDFS中然后导入到Mysql 3.Job 作业:可以看做一个任务或者是Mysql中存储的过程 我们可以先存储然后不执行。 当想要执行的时候可以直接执行名称避免执行全部的任
我不知道它为什么要在HDFS中寻找avro jar文件。 和设置为Hadoop目录。我使用的是Hadoop 2.6.0和sqoop 1.4.5版本
null 通过查看错误,我试图解决它,因此我将sqoop(本地机器/usr/local/sqoop)文件夹复制到hdfs目录(hdfs://localhost:54310/usr/local/sqoop)。这解决了我的问题。我想从这里知道一些事情:- 在将sqoop转换为hdfs之前,我的安装是否正确? 是否需要将sqoop目录从ext文件系统复制到hdfs文件系统。 16/07/02 13:22