当前位置: 首页 > 工具软件 > cl-bzip2 > 使用案例 >

sqoop2 mysql to hdfs_Sqoop2同步mysql至HDFS

方斌
2023-12-01

开启客户端

sqoop2-shell

配置sqoop server参数

sqoop:000> set server --host luhuijundeMacBook-Pro.local --port 12000 --webapp

sqoop #luhuijundeMacBook-Pro.local 一般为HDFS主机名

–webapp官方文档说是指定的sqoop jetty服务器名称,

大概是一个自己能识别的用于标示这个服务器的名字吧。

我们在使用的过程中可能会遇到错误,使用此配置打印错误

sqoop:000> set option --name verbose --value true

验证是否已经连上

sqoop:000> show version --all #如果server version:能显示代表能正常连接

使用help命令可以查看sqoop支持的所有命令

sqoop:000> help

For information about Sqoop, visit: http://sqoop.apache.org/

Available commands:

exit (\x ) Exit the shell

history (\H ) Display, manage and recall edit-line history

help (\h ) Display this help message

set (\st ) Configure various client options and settings

show (\sh ) Display various objects and configuration options

create (\cr ) Create new object in Sqoop repository

delete (\d ) Delete existing object in Sqoop repository

update (\up ) Update objects in Sqoop repository

clone (\cl ) Create new object based on existing one

start (\sta) Start job

stop (\stp) Stop job

status (\stu) Display status of a job

enable (\en ) Enable object in Sqoop repository

disable (\di ) Disable object in Sqoop repository

检查Sqoop server支持的连接

sqoop:000> show connector

创建数据源头link

sqoop:000> create link -connector generic-jdbc-connector

Name: First Link

Link configuration

JDBC Driver Class: com.mysql.jdbc.Driver

JDBC Connection String: jdbc:mysql://mysql.server/databaseName

Username:dba

Password: *****

Fetch Size:(回车)

entry#protocol=tcp

entry#(回车)

Identifier enclose:(空格) #这里是指定SQL中标识符的定界符,也就是说,

有的SQL标示符是一个引号:select * from "table_name",这种定界符在MySQL中是会报错的。

这个属性默认值就是双引号,所以不能使用回车,必须将之覆盖,我使用空格覆盖了这个值。

官方文档这里有坑!

New link was successfully created with validation status OK and name third link

看到这样的字符这个link算是创建成功

创建目标link

sqoop:000> create link -connector hdfs-connector

Creating link for connector with name hdfs-connector

Please fill following values to create new link object

Name: Second Link

Link configuration

HDFS URI: hdfs://localhost:9000

Conf directory:/usr/local/hadoop/etc/hadoop

entry#(回车)

New link was successfully created with validation status OK and name Second Link

看到这样的字符串代表创建成功

使用两个link名字 from 和 to 来创建job

sqoop:000> create job -f "First Link" -t "Second Link"

Creating job for links with from name First Link and to name Second Link

Please fill following values to create new job object

Name: Sqoopy #job 名称

Schema name: test #mysql数据库名称

Table name: test #表名称

SQL statement:

Column names:

There are currently 0 values in the list:

element#(回车)

Partition column:(回车)

Partition column nullable:(回车)

Boundary query:(回车)

Incremental read

Check column:(回车)

Last value:(回车)

Target configuration #配置目标

Override null value: null

Null value: null

File format:

0 : TEXT_FILE

1 : SEQUENCE_FILE

2 : PARQUET_FILE

Choose: 0 #选择0最简单的文本文件

Compression codec:

0 : NONE

1 : DEFAULT

2 : DEFLATE

3 : GZIP

4 : BZIP2

5 : LZO

6 : LZ4

7 : SNAPPY

8 : CUSTOM

Choose: 0 #选择0,不压缩

Custom codec:(回车)

Output directory: hdfs://localhost:9000/user/luhuijun/sqoop #最好是完全没有这个目录: sqoop,如果有目录里面又有文件, 又是一堆权限问题.

Append mode:(回车)

Throttling resources

Extractors: 2

Loaders: 2

Classpath configuration

Extra mapper jars:

There are currently 0 values in the list:

element#

New job was successfully created with validation status OK and name Sqoopy

开启job 并打印job执行详情

sqoop:000> start job -n Sqoopy -s

Submission details

Job Name: Sqoopy

Server URL: http://luhuijundeMacBook-Pro.local:12000/sqoop/

Created by: luhuijun

Creation date: 2016-12-06 21:26:24 CST

Lastly updated by: luhuijun

External ID: job_1481030429951_0001

http://luhuijundeMacBook-Pro.local:8088/proxy/application_1481030429951_0001/

Source Connector schema: Schema{name= test . test ,columns=[

FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},

Text{name=name,nullable=true,type=TEXT,charSize=null}]}

2016-12-06 21:26:24 CST: BOOTING - Progress is not available

2016-12-06 21:26:36 CST: RUNNING - 0.00 %

2016-12-06 21:26:46 CST: RUNNING - 50.00 %

2016-12-06 21:26:57 CST: SUCCEEDED

查看执行结果

➜ /usr/local/hadoop/etc/hadoop git:(master) >hdfs dfs -cat 'sqoop/*'

#这是zsh *直接匹配不到,在正常的shell里应该不需要引号

1,'1'

2,'xinle'

3,'huijun'

4,'hongna'

总结

看上去简单的几步,其实踩了很多坑,在学习过程中通常会犯两类错误:

第一 类错误是在知之不多的情况下就盲目开始,即行动太快;

第二类错误是在行动之前准备过多,即行动太晚。要想在这 二者之间取得平衡,

你掌握的知识要恰到好处,足以能让你开始学习, 但又不会多到让你无力探索,

这样学习效果最佳。<>

 类似资料: