当前位置: 首页 > 工具软件 > cl-bzip2 > 使用案例 >

sqoop2 mysql to hdfs_Sqoop2学习(二)—将MySQL数据库中数据导入到HDFS中

曹成双
2023-12-01

问题导读:

1、Sqoop客户端如何创建服务端连接?

Sqoop的安装此处不再介绍,此处接我上一篇博文:Sqoop2学习(一)—Sqoop1.99.3介绍与安装。

一、配置服务端地址及端口

要想和服务端连接,需要在客户端配置服务端的信息,如下:

set server --host secondmgt --port 12000 --webapp sqoop        secondmgt:是服务端的主机名,12000是Sqoop的端口号。

二、验证连接是否正常

sqoop:000> show version -all

client version:

Sqoop 1.99.4 source revision 2475a76ef70a0660f381c75c3d47d0d24f00b57f

Compiled by gshapira on Sun Nov 16 02:50:00 PST 2014

server version:

Sqoop 1.99.4 source revision 2475a76ef70a0660f381c75c3d47d0d24f00b57f

Compiled by gshapira on Sun Nov 16 02:50:00 PST 2014

API versions:

[v1]        结果显示客户端和服务端版本信息,说明,客户端已经连上服务端。

三、查看sqoop shell支持的命令

sqoop:000> help

For information about Sqoop, visit: http://sqoop.apache.org/

Available commands:

exit (\x ) Exit the shell

history (\H ) Display, manage and recall edit-line history

help (\h ) Display this help message

set (\st ) Configure various client options and settings

show (\sh ) Display various objects and configuration options

create (\cr ) Create new object in Sqoop repository

delete (\d ) Delete existing object in Sqoop repository

update (\up ) Update objects in Sqoop repository

clone (\cl ) Create new object based on existing one

start (\sta) Start job

stop (\stp) Stop job

status (\stu) Display status of a job

enable (\en ) Enable object in Sqoop repository

disable (\di ) Disable object in Sqoop repository

四、查看Sqoop服务器上注册的连接器

sqoop:000> show connector

+----+------------------------+---------+------------------------------------------------------+

| Id | Name | Version | Class |

+----+------------------------+---------+------------------------------------------------------+

| 1 | generic-jdbc-connector | 1.99.3 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector |

+----+------------------------+---------+------------------------------------------------------+

五、创建连接

sqoop:000> create connection --cid 1

Creating connection for connector with id 1

Please fill following values to create new connection object

Name: My First

Connection configuration

JDBC Driver Class: com.mysql.jdbc.Driver

JDBC Connection String: jdbc:mysql://secondmgt:3306/sqoopdb

Username: hive

Password: ****

JDBC Connection Properties:

There are currently 0 values in the map:

entry# //此处直接敲回车

Security related configuration options

Max connections: 100

New connection was successfully created with validation status FINE and persistent id 1

sqoop:000>

六、查看创建的连接

sqoop:000> show connection

+----+----------+-----------+---------+

| Id | Name | Connector | Enabled |

+----+----------+-----------+---------+

| 1 | My First | 1 | true |

+----+----------+-----------+---------+

七、创建Job

sqoop:000> create job --xid 1 --type import

Creating job for connection with id 1

Please fill following values to create new job object

Name: Second Job

Database configuration

Schema name: sqoopdb

Table name: stu

Table SQL statement:

Table column names: id,name

Partition column name: name

Nulls in partition column:

Boundary query:

Output configuration

Storage type:

0 : HDFS

Choose: 0

Output format:

0 : TEXT_FILE

1 : SEQUENCE_FILE

Choose: 1

Compression format:

0 : NONE

1 : DEFAULT

2 : DEFLATE

3 : GZIP

4 : BZIP2

5 : LZO

6 : LZ4

7 : SNAPPY

Choose: 0

Output directory: /output/sqoopsecond

Throttling resources

Extractors:

Loaders:

New job was successfully created with validation status FINE and persistent id 3

八、提交Job

sqoop:000> start job --jid 3

Submission details

Job ID: 3

Server URL: http://localhost:12000/sqoop/

Created by: hadoopUser

Creation date: 2015-01-16 10:30:17 CST

Lastly updated by: hadoopUser

External ID: job_1421373857783_0001

http://secondmgt:8088/proxy/application_1421373857783_0001/

2015-01-16 10:30:17 CST: BOOTING - Progress is not available

九、查看Job执行状态

sqoop:000> status job --jid 3

Submission details

Job ID: 3

Server URL: http://localhost:12000/sqoop/

Created by: hadoopUser

Creation date: 2015-01-16 10:30:17 CST

Lastly updated by: hadoopUser

External ID: job_1421373857783_0001

http://secondmgt:8088/proxy/application_1421373857783_0001/

2015-01-16 10:31:18 CST: RUNNING - 45.00 %

sqoop:000> status job --jid 3

Submission details

Job ID: 3

Server URL: http://localhost:12000/sqoop/

Created by: hadoopUser

Creation date: 2015-01-16 10:30:17 CST

Lastly updated by: hadoopUser

External ID: job_1421373857783_0001

http://secondmgt:8088/proxy/application_1421373857783_0001/

2015-01-16 10:31:24 CST: SUCCEEDED

Counters:

org.apache.hadoop.mapreduce.JobCounter

SLOTS_MILLIS_MAPS: 448084

TOTAL_LAUNCHED_MAPS: 10

SLOTS_MILLIS_REDUCES: 0

OTHER_LOCAL_MAPS: 10

org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter

BYTES_READ: 0

org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter

BYTES_WRITTEN: 0

org.apache.hadoop.mapreduce.TaskCounter

MAP_INPUT_RECORDS: 0

MERGED_MAP_OUTPUTS: 0

PHYSICAL_MEMORY_BYTES: 1562419200

SPILLED_RECORDS: 0

FAILED_SHUFFLE: 0

CPU_MILLISECONDS: 28060

COMMITTED_HEAP_BYTES: 846725120

VIRTUAL_MEMORY_BYTES: 8838815744

MAP_OUTPUT_RECORDS: 4

SPLIT_RAW_BYTES: 1255

GC_TIME_MILLIS: 596

org.apache.hadoop.mapreduce.FileSystemCounter

FILE_READ_OPS: 0

FILE_WRITE_OPS: 0

FILE_BYTES_READ: 0

FILE_LARGE_READ_OPS: 0

HDFS_BYTES_READ: 1255

FILE_BYTES_WRITTEN: 919590

HDFS_LARGE_READ_OPS: 0

HDFS_BYTES_WRITTEN: 930

HDFS_READ_OPS: 40

HDFS_WRITE_OPS: 20

org.apache.sqoop.submission.counter.SqoopCounters

ROWS_READ: 4

Job executed successfully

十、显示创建的Job

sqoop:000> show job

+----+------------+--------+-----------+---------+

| Id | Name | Type | Connector | Enabled |

+----+------------+--------+-----------+---------+

| 2 | First Job | IMPORT | 1 | true |

| 3 | Second Job | IMPORT | 1 | true |

+----+------------+--------+-----------+---------+

十一、删除创建的Job

sqoop:000> delete job --jid 3

sqoop:000> show job

+----+-----------+--------+-----------+---------+

| Id | Name | Type | Connector | Enabled |

+----+-----------+--------+-----------+---------+

| 2 | First Job | IMPORT | 1 | true |

+----+-----------+--------+-----------+---------+

十二、查看HDFS,是否将MySQL中数据已导入

[hadoopUser@secondmgt ~]$ hadoop fs -ls /output/sqoopsecond/*

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 0 2015-01-16 10:31 /output/sqoopsecond/_SUCCESS

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 104 2015-01-16 10:30 /output/sqoopsecond/part-m-00000.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:30 /output/sqoopsecond/part-m-00001.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:30 /output/sqoopsecond/part-m-00002.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:30 /output/sqoopsecond/part-m-00003.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:31 /output/sqoopsecond/part-m-00004.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:31 /output/sqoopsecond/part-m-00005.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:31 /output/sqoopsecond/part-m-00006.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 105 2015-01-16 10:31 /output/sqoopsecond/part-m-00007.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 86 2015-01-16 10:31 /output/sqoopsecond/part-m-00008.seq

Found 1 items

-rw-r--r-- 2 hadoopUser supergroup 119 2015-01-16 10:31 /output/sqoopsecond/part-m-00009.seq

 类似资料: