开启客户端
sqoop2-shell
配置sqoop server参数
sqoop:000> set server --host luhuijundeMacBook-Pro.local --port 12000 --webapp
sqoop #luhuijundeMacBook-Pro.local 一般为HDFS主机名
–webapp官方文档说是指定的sqoop jetty服务器名称,
大概是一个自己能识别的用于标示这个服务器的名字吧。
我们在使用的过程中可能会遇到错误,使用此配置打印错误
sqoop:000> set option --name verbose --value true
验证是否已经连上
sqoop:000> show version --all #如果server version:能显示代表能正常连接
使用help命令可以查看sqoop支持的所有命令
sqoop:000> help
For information about Sqoop, visit: http://sqoop.apache.org/
Available commands:
exit (\x ) Exit the shell
history (\H ) Display, manage and recall edit-line history
help (\h ) Display this help message
set (\st ) Configure various client options and settings
show (\sh ) Display various objects and configuration options
create (\cr ) Create new object in Sqoop repository
delete (\d ) Delete existing object in Sqoop repository
update (\up ) Update objects in Sqoop repository
clone (\cl ) Create new object based on existing one
start (\sta) Start job
stop (\stp) Stop job
status (\stu) Display status of a job
enable (\en ) Enable object in Sqoop repository
disable (\di ) Disable object in Sqoop repository
检查Sqoop server支持的连接
sqoop:000> show connector
创建数据源头link
sqoop:000> create link -connector generic-jdbc-connector
Name: First Link
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://mysql.server/databaseName
Username:dba
Password: *****
Fetch Size:(回车)
entry#protocol=tcp
entry#(回车)
Identifier enclose:(空格) #这里是指定SQL中标识符的定界符,也就是说,
有的SQL标示符是一个引号:select * from "table_name",这种定界符在MySQL中是会报错的。
这个属性默认值就是双引号,所以不能使用回车,必须将之覆盖,我使用空格覆盖了这个值。
官方文档这里有坑!
New link was successfully created with validation status OK and name third link
看到这样的字符这个link算是创建成功
创建目标link
sqoop:000> create link -connector hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: Second Link
Link configuration
HDFS URI: hdfs://localhost:9000
Conf directory:/usr/local/hadoop/etc/hadoop
entry#(回车)
New link was successfully created with validation status OK and name Second Link
看到这样的字符串代表创建成功
使用两个link名字 from 和 to 来创建job
sqoop:000> create job -f "First Link" -t "Second Link"
Creating job for links with from name First Link and to name Second Link
Please fill following values to create new job object
Name: Sqoopy #job 名称
Schema name: test #mysql数据库名称
Table name: test #表名称
SQL statement:
Column names:
There are currently 0 values in the list:
element#(回车)
Partition column:(回车)
Partition column nullable:(回车)
Boundary query:(回车)
Incremental read
Check column:(回车)
Last value:(回车)
Target configuration #配置目标
Override null value: null
Null value: null
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0 #选择0最简单的文本文件
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0 #选择0,不压缩
Custom codec:(回车)
Output directory: hdfs://localhost:9000/user/luhuijun/sqoop #最好是完全没有这个目录: sqoop,如果有目录里面又有文件, 又是一堆权限问题.
Append mode:(回车)
Throttling resources
Extractors: 2
Loaders: 2
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name Sqoopy
开启job 并打印job执行详情
sqoop:000> start job -n Sqoopy -s
Submission details
Job Name: Sqoopy
Server URL: http://luhuijundeMacBook-Pro.local:12000/sqoop/
Created by: luhuijun
Creation date: 2016-12-06 21:26:24 CST
Lastly updated by: luhuijun
External ID: job_1481030429951_0001
http://luhuijundeMacBook-Pro.local:8088/proxy/application_1481030429951_0001/
Source Connector schema: Schema{name= test . test ,columns=[
FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
Text{name=name,nullable=true,type=TEXT,charSize=null}]}
2016-12-06 21:26:24 CST: BOOTING - Progress is not available
2016-12-06 21:26:36 CST: RUNNING - 0.00 %
2016-12-06 21:26:46 CST: RUNNING - 50.00 %
2016-12-06 21:26:57 CST: SUCCEEDED
查看执行结果
➜ /usr/local/hadoop/etc/hadoop git:(master) >hdfs dfs -cat 'sqoop/*'
#这是zsh *直接匹配不到,在正常的shell里应该不需要引号
1,'1'
2,'xinle'
3,'huijun'
4,'hongna'
总结
看上去简单的几步,其实踩了很多坑,在学习过程中通常会犯两类错误:
第一 类错误是在知之不多的情况下就盲目开始,即行动太快;
第二类错误是在行动之前准备过多,即行动太晚。要想在这 二者之间取得平衡,
你掌握的知识要恰到好处,足以能让你开始学习, 但又不会多到让你无力探索,
这样学习效果最佳。<>