Apache Sqoop(TM)是一种用于在Apache Hadoop和结构化数据存储(如关系数据库)之间高效传输大量数据的工具。
#官网指导文档
http://sqoop.apache.org/docs/1.99.7/index.html
#下载地址
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/
一、Administrator 指导
1、Sqoop Server and Client Installation
sqoop包分为两部分,server与client
#下载二进制包并解压
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz
tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz
#依赖hadoop
core-site.xml添加权限
<property>
<name>hadoop.proxyuser.sqoop2.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.sqoop2.groups</name>
<value>*</value>
</property>
#配置第三方jar包
export SQOOP_SERVER_EXTRA_LIB=/opt/sqoop-1.99.7-bin-hadoop200/lib
#配置环境变量
export HADOOP_COMMON_HOME=$HADOOP_HOME/share/hadoop/common
export HADOOP_HDFS_HOME=$HADOOP_HOME/share/hadoop/hdfs
export HADOOP_MAPRED_HOME=$HADOOP_HOME/share/hadoop/mapreduce
export HADOOP_YARN_HOME=$HADOOP_HOME/share/hadoop/yarn
export SQOOP_HOME=/opt/sqoop-1.99.7-bin-hadoop200
export SQOOP_SERVER_EXTRA_LIB=/opt/sqoop-1.99.7-bin-hadoop200/lib
#配置
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop-2.7.7/etc/hadoop/
#服务启停
sqoop2-server start
sqoop2-server stop
#默认端口修改(12000)
vi conf/sqoop.properties
org.apache.sqoop.jetty.port=12000
#客户端启动
sqoop2-shell
2、Sqoop Tools
#升级
sqoop2-tool upgrade
#验证
sqoop2-tool verify
#数据导出
sqoop2-tool repositorydump -o repository.json --include-sensitive
#数据导入
sqoop2-tool repositoryload -i repository.json
3、Sqoop Server Upgrade
#服务启动自动更新
vi sqoop.properties
org.apache.sqoop.connector.autoupgrade=true
org.apache.sqoop.driver.autoupgrade=true
二、user 指导
1、Command Line Shell Usage Guide
1)资源文件
sqoop2-shell /path/to/your/script.sqoop
2)命令行
#辅助命令
exit (\x ) Exit the shell
history (\H ) Display, manage and recall edit-line history
help (\h ) Display this help message
set (\st ) Configure various client options and settings
show (\sh ) Display various objects and configuration options
create (\cr ) Create new object in Sqoop repository
delete (\d ) Delete existing object in Sqoop repository
update (\up ) Update objects in Sqoop repository
clone (\cl ) Create new object based on existing one
start (\sta) Start job
stop (\stp) Stop job
status (\stu) Display status of a job
enable (\en ) Enable object in Sqoop repository
disable (\di ) Disable object in Sqoop repository
#set
set server --url http://172.16.1.181:8090/sqoop
2、Connectors
#FTP、JDBC、HDFS、KAFKA、KITE、SFTP
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
| ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
| hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
3、示例
http://sqoop.apache.org/docs/1.99.7/user/Sqoop5MinutesDemo.html
1)启动客户端
sqoop2-shell
2)启动服务
sqoop:000> set server --172.16.1.181 --port 12000 --webapp sqoop
3)版本查看
sqoop:000> show version --all
4)create link
sqoop:000> create link -connector generic-jdbc-connector
Name: First Link
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://172.16.1.241:3306/crawl
Username: root
Password: ******
Fetch Size: 1
Connection Properties:
There are currently 0 values in the map:
entry# protocol=tcp
///
sqoop:000> create link -connector hdfs-connector
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: Second Link
HDFS cluster
URI: hdfs://master:8020/
Conf directory: /opt/hadoop-2.7.7/etc/hadoop/
Additional configs::
There are currently 0 values in the map:
entry# protocol=tcp
5)创建job
sqoop:000> create job -f "First Link" -t "Second Link"
Creating job for links with from name First Link and to name Second Link
Please fill following values to create new job object
Name: Sqoopy
FromJob configuration
Schema name:(Required)sqoop
Table name:(Required)sqoop
Table SQL statement:(Optional)
Table column names:(Optional)
Partition column name:(Optional) id
Null value allowed for the partition column:(Optional)
Boundary query:(Optional)
ToJob configuration
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:(Optional)
Output directory:(Required)/root/projects/sqoop
Driver Config
Extractors:(Optional) 2
Loaders:(Optional) 2
6)start job
sqoop:000> start job -name Sqoopy
Submission details
Job Name: Sqoopy
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2014-11-04 19:43:29 PST
Lastly updated by: root
External ID: job_1412137947693_0001
http://vbsqoop-1.ent.cloudera.com:8088/proxy/application_1412137947693_0001/
2014-11-04 19:43:29 PST: BOOTING - Progress is not available
7)status job
sqoop:000> status job -n Sqoopy
8)stop job
sqoop:000> stop job -n Sqoopy
三、developer 指导
Building Sqoop 2
Sqoop Development Environment Setup
Developing a Sqoop Connector with Connector API
Developing Sqoop application with REST API
Developing Sqoop application using Sqoop Java Client API
Repository
四、security 指导
Security Guide