当前位置: 首页 > 工具软件 > cl-bzip2 > 使用案例 >

sqoop2安装

楚弘益
2023-12-01

Apache Sqoop(TM)是一种用于在Apache Hadoop和结构化数据存储(如关系数据库)之间高效传输大量数据的工具。

#官网指导文档
http://sqoop.apache.org/docs/1.99.7/index.html

#下载地址
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/

一、Administrator 指导
1、Sqoop Server and Client Installation
sqoop包分为两部分,server与client
#下载二进制包并解压
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz
tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz
#依赖hadoop
core-site.xml添加权限
<property>
  <name>hadoop.proxyuser.sqoop2.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.sqoop2.groups</name>
  <value>*</value>
</property>
#配置第三方jar包
export SQOOP_SERVER_EXTRA_LIB=/opt/sqoop-1.99.7-bin-hadoop200/lib
#配置环境变量
export HADOOP_COMMON_HOME=$HADOOP_HOME/share/hadoop/common
export HADOOP_HDFS_HOME=$HADOOP_HOME/share/hadoop/hdfs
export HADOOP_MAPRED_HOME=$HADOOP_HOME/share/hadoop/mapreduce
export HADOOP_YARN_HOME=$HADOOP_HOME/share/hadoop/yarn
export SQOOP_HOME=/opt/sqoop-1.99.7-bin-hadoop200
export SQOOP_SERVER_EXTRA_LIB=/opt/sqoop-1.99.7-bin-hadoop200/lib
#配置
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop-2.7.7/etc/hadoop/
#服务启停
sqoop2-server start
sqoop2-server stop
#默认端口修改(12000)
vi conf/sqoop.properties
org.apache.sqoop.jetty.port=12000
#客户端启动
sqoop2-shell

2、Sqoop Tools
#升级
sqoop2-tool upgrade
#验证
sqoop2-tool verify
#数据导出
sqoop2-tool repositorydump -o repository.json --include-sensitive
#数据导入
sqoop2-tool repositoryload -i repository.json

3、Sqoop Server Upgrade
#服务启动自动更新
vi sqoop.properties
org.apache.sqoop.connector.autoupgrade=true
org.apache.sqoop.driver.autoupgrade=true


二、user 指导
1、Command Line Shell Usage Guide
1)资源文件
sqoop2-shell /path/to/your/script.sqoop
2)命令行
#辅助命令
  exit    (\x  ) Exit the shell
  history (\H  ) Display, manage and recall edit-line history
  help    (\h  ) Display this help message
  set     (\st ) Configure various client options and settings
  show    (\sh ) Display various objects and configuration options
  create  (\cr ) Create new object in Sqoop repository
  delete  (\d  ) Delete existing object in Sqoop repository
  update  (\up ) Update objects in Sqoop repository
  clone   (\cl ) Create new object based on existing one
  start   (\sta) Start job
  stop    (\stp) Stop job
  status  (\stu) Display status of a job
  enable  (\en ) Enable object in Sqoop repository
  disable (\di ) Disable object in Sqoop repository

#set
set server --url http://172.16.1.181:8090/sqoop

2、Connectors
#FTP、JDBC、HDFS、KAFKA、KITE、SFTP
+------------------------+---------+------------------------------------------------------------+----------------------+
|          Name          | Version |                           Class                            | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
+------------------------+---------+------------------------------------------------------------+----------------------+


3、示例
http://sqoop.apache.org/docs/1.99.7/user/Sqoop5MinutesDemo.html
1)启动客户端
    sqoop2-shell
2)启动服务
    sqoop:000> set server --172.16.1.181 --port 12000 --webapp sqoop
3)版本查看
    sqoop:000> show version --all
4)create link
    sqoop:000> create link -connector generic-jdbc-connector
    Name: First Link
    Database connection
    Driver class: com.mysql.jdbc.Driver
    Connection String: jdbc:mysql://172.16.1.241:3306/crawl
    Username: root
    Password: ******
    Fetch Size: 1
    Connection Properties:
    There are currently 0 values in the map:
    entry# protocol=tcp
    ///
    sqoop:000>  create link -connector hdfs-connector
    0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Creating link for connector with name hdfs-connector
    Please fill following values to create new link object
    Name: Second Link
    HDFS cluster
    URI: hdfs://master:8020/
    Conf directory: /opt/hadoop-2.7.7/etc/hadoop/
    Additional configs::
    There are currently 0 values in the map:
    entry# protocol=tcp
5)创建job
    sqoop:000> create job -f "First Link" -t "Second Link"
    Creating job for links with from name First Link and to name Second Link
     Please fill following values to create new job object
     Name: Sqoopy

     FromJob configuration

      Schema name:(Required)sqoop
      Table name:(Required)sqoop
      Table SQL statement:(Optional)
      Table column names:(Optional)
      Partition column name:(Optional) id
      Null value allowed for the partition column:(Optional)
      Boundary query:(Optional)

    ToJob configuration

      Output format:
       0 : TEXT_FILE
       1 : SEQUENCE_FILE
      Choose: 0
      Compression format:
       0 : NONE
       1 : DEFAULT
       2 : DEFLATE
       3 : GZIP
       4 : BZIP2
       5 : LZO
       6 : LZ4
       7 : SNAPPY
       8 : CUSTOM
      Choose: 0
      Custom compression format:(Optional)
      Output directory:(Required)/root/projects/sqoop

      Driver Config
      Extractors:(Optional) 2
      Loaders:(Optional) 2
6)start job
    sqoop:000> start job -name Sqoopy
    Submission details
    Job Name: Sqoopy
    Server URL: http://localhost:12000/sqoop/
    Created by: root
    Creation date: 2014-11-04 19:43:29 PST
    Lastly updated by: root
    External ID: job_1412137947693_0001
      http://vbsqoop-1.ent.cloudera.com:8088/proxy/application_1412137947693_0001/
    2014-11-04 19:43:29 PST: BOOTING  - Progress is not available
7)status job
sqoop:000> status job -n Sqoopy
8)stop job
sqoop:000> stop job -n Sqoopy

三、developer 指导
Building Sqoop 2
Sqoop Development Environment Setup
Developing a Sqoop Connector with Connector API
Developing Sqoop application with REST API
Developing Sqoop application using Sqoop Java Client API
Repository


四、security 指导
Security Guide


 

 类似资料: