目录
3 Hive--自定义UDF函数(User-Defined Functions)
3.8 自定义UDTF代码:实现spilt功能,第一个参数传入需要分割的字符串,第二个参数传入分隔符
<property>
<name>hive.server2.webui.host</name>
<value>you hostname</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>19990</value>
</property>
$HIVE_HOME/bin/hiveserver2
OR
$HIVE_HOME/bin/hive --service hiveserver2
[work@bigdatatest01 ~]$ $HIVE_HOME/bin/hiveserver2 -H
usage: hiveserver2
--deregister <versionNumber> Deregister all instances of given
version from dynamic service discovery
-H,--help Print help information
--hiveconf <property=value> Use value for given property
$HIVE_HOME/bin/hiveserver2 --hiveconf hive.server2.thrift.port=14000
[work@bigdatatest01 ~]$ $HIVE_HOME/bin/beeline -u jdbc:hive2://hostname:port/default -n hadoop
OR
[work@bigdatatest01 ~]$ $HIVE_HOME/bin/beeline
beeline> !connect jdbc:hive2://hostname:port/default uesername passwd
1.4.1 需求
使用代码查询所有的databases
1.4.2 Code
1.4.2.1 pom依赖
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0-cdh5.16.2</version>
</dependency>
1.4.2.2 HiveJDBCClinet Code
package com.xk.bigdata.hive.jdbc;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class HiveJDBCClinet {
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws Exception {
Class.forName(driverName);
//replace "hive" here with the name of the user the queries should run as
Connection con = DriverManager.getConnection("jdbc:hive2://bigdatatest03:10000/default", "hive", "");
Statement stmt = con.createStatement();
// show databases
String sql = "show databases";
System.out.println("Running: " + sql);
ResultSet res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1));
}
}
}
1.4.3 结果
1.4.3.1 控制台输出
Running: show databases
bigdata
default
ods_bigdata
test
usage: hive
-d,--define <key=value> Variable substitution to apply to Hive
commands. e.g. -d A=B or --define A=B
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
-h <hostname> Connecting to Hive Server on remote host
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable substitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-p <port> Connecting to Hive Server on port number
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the
console)
[work@bigdatatest02 ~]$ ${HIVE_HOME}/bin/hive -e "show databases"
bigdata
default
ods_bigdata
test
[work@bigdatatest02 hive]$ vim demo.sql
show databases;
[work@bigdatatest02 hive]$ ${HIVE_HOME}/bin/hive -f demo.sql
bigdata
default
ods_bigdata
test
quit
exit
Use quit or exit to leave the interactive shell.
hive> exit;
[work@bigdatatest02 hive]$
set <key>=<value>
Sets the value of a particular configuration variable (key).
Note: If you misspell the variable name, the CLI will not show an error.
# 查看hive.cli.print.current.db参数
hive> set hive.cli.print.current.db;
hive.cli.print.current.db=false
# 把hive.cli.print.current.db改为true
hive> set hive.cli.print.current.db=true;
# 查看hive.cli.print.current.db参数
hive (default)> set hive.cli.print.current.db;
hive.cli.print.current.db=true
add FILE[S] <filepath> <filepath>*
add JAR[S] <filepath> <filepath>*
add ARCHIVE[S] <filepath> <filepath>*
Adds one or more files, jars, or archives to the list of resources in the distributed cache. See Hive Resources below for more information.
list FILE[S]
list JAR[S]
list ARCHIVE[S]
Lists the resources already added to the distributed cache. See Hive Resources below for more information.
delete FILE[S] <filepath>*
delete JAR[S] <filepath>*
delete ARCHIVE[S] <filepath>*
Removes the resource(s) from the distributed cache.
hive (default)> add FILE /home/work/data/demo.txt;
Added resources: [/home/work/data/demo.txt]
hive (default)> list FILE;
/home/work/data/demo.txt
hive (default)> delete FILE /home/work/data/demo.txt;
hive (default)> list FILE;
hive (default)> add FILE hdfs://nameservice1/data/wc/demo.txt;
Added resources: [hdfs://nameservice1/data/wc/demo.txt]
hive (default)> list FILE;
/tmp/c26afbf8-601a-4590-8d21-57ba20ce5f58_resources/demo.txt
hive (default)> delete FILEs;
hive (default)> list FILE;
A User-defined function (UDF) for use with Hive.
<p>
New UDF classes need to inherit from this UDF class (or from {@link
org.apache.hadoop.hive.ql.udf.generic.GenericUDF GenericUDF} which provides more flexibility at
the cost of more complexity).
<p>
Requirements for all classes extending this UDF are:
<ul>
<li>Implement one or more methods named {@code evaluate} which will be called by Hive (the exact
way in which Hive resolves the method to call can be configured by setting a custom {@link
UDFMethodResolver}). The following are some examples:
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0-cdh5.16.2</version>
</dependency>
自定义实现大写字母转小写字母
package com.xk.bigdata.hive.udf.lower;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
/**
* MyLower
*/
@Description(name = "my_lower",
value = "_FUNC_(str) - returns lower str",
extended = "Example:\n"
+ " > SELECT _FUNC_('HADOOP') FROM src LIMIT 1;\n"
+ " 'hadoop'")
public class MyLower extends UDF {
public Text evaluate(final Text s) {
if (s == null) {
return null;
}
return new Text(s.toString().toLowerCase());
}
}
3.4.1 打包并把jar放在Linux某个目录下面
[work@bigdatatest02 hive]$ pwd
/home/work/lib/hive
[work@bigdatatest02 hive]$ ll
total 8
-rw-r--r-- 1 work work 4257 Dec 15 16:40 hive-basic-1.0.jar
3.4.2 注册临时UDF
hive (default)> add JAR /home/work/lib/hive/hive-basic-1.0.jar;
Added [/home/work/lib/hive/hive-basic-1.0.jar] to class path
Added resources: [/home/work/lib/hive/hive-basic-1.0.jar]
hive (default)> create temporary function my_lower as 'com.xk.bigdata.hive.udf.lower.MyLower';
OK
Time taken: 0.328 seconds
hive (default)> select my_lower("HADOOP");
OK
hadoop
Time taken: 0.167 seconds, Fetched: 1 row(s)
hive> CREATE FUNCTION sayhello AS "com.xk.bigdata.hive.udf.lower.MyLower" USING JAR "/home/work/lib/hive/hive-basic-1.0.jar";
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask. Hive warehouse is non-local, but /home/work/lib/hive/hive-basic-1.0.jar specifies file on local filesystem. Resources on non-local warehouse should specify a non-local scheme/path
hive> CREATE FUNCTION my_lower AS "com.xk.bigdata.hive.udf.lower.MyLower" USING JAR "hdfs://nameservice1/lib/hive-basic-1.0.jar";
Added [/tmp/893136b4-bc59-44b0-8caa-a99961f8bd34_resources/hive-basic-1.0.jar] to class path
Added resources: [hdfs://nameservice1/lib/hive-basic-1.0.jar]
OK
Time taken: 0.414 seconds
hive> select my_lower("HADOOP");
OK
hadoop
Time taken: 0.675 seconds, Fetched: 1 row(s)
hive> drop function default.my_lower;
OK
Time taken: 0.122 seconds
[work@bigdatatest02 hive]$ vim hive-init.sql
add JAR /home/work/lib/hive/hive-basic-1.0.jar;
create temporary function my_lower as 'com.xk.bigdata.hive.udf.lower.MyLower';
[work@bigdatatest02 hive]$ hive -i hive-init.sql
system.registerUDF("my_lower", MyLower.class, false);
3.8.1 Code
package com.xk.bigdata.hive.udtf;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import java.util.ArrayList;
import java.util.List;
public class MySplitUDTF extends GenericUDTF {
List<String> list = new ArrayList<>();
/**
* 增加输出数据的schema
*
* @param argOIs
* @return
* @throws UDFArgumentException
*/
@Override
public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException {
List<String> structFieldNames = new ArrayList<>();
List<ObjectInspector> structFieldObjectInspectors = new ArrayList<>();
structFieldNames.add("word");
structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames, structFieldObjectInspectors);
}
/**
* args[0] : words 输入的数据
* args[1] : 传入的第二个参数,即分隔符
*
* @param args
* @throws HiveException
*/
@Override
public void process(Object[] args) throws HiveException {
if (args.length == 2) {
String words = args[0].toString();
String spiltFile = args[1].toString();
String[] spilts = words.split(spiltFile);
for (String word : spilts) {
list.clear();
list.add(word);
forward(list);
}
} else {
throw new HiveException("parameter error");
}
}
@Override
public void close() throws HiveException {
}
}
3.8.2 加载临时函数
hive> add JAR /home/work/lib/hive/hive-basic-1.0.jar;
Added [/home/work/lib/hive/hive-basic-1.0.jar] to class path
Added resources: [/home/work/lib/hive/hive-basic-1.0.jar]
hive> create temporary function my_split as 'com.xk.bigdata.hive.udtf.MySplitUDTF';
OK
Time taken: 0.038 seconds
3.8.3 结果展示
hive> select * from bigdata.wc;
OK
hadoop,spark,flink
hbase,hadoop,spark,flink
spark
hadoop
hadoop,spark,flink
hbase,hadoop,spark,flink
spark
hadoop
hbase,hadoop,spark,flink
hive> select my_split(words,',') from bigdata.wc;
hadoop
spark
flink
hbase
hadoop
spark
flink
spark
hadoop
hadoop
spark
flink
hbase
hadoop
spark
flink
spark
hadoop
hbase
hadoop
spark
flink
CREATE TABLE IF NOT EXISTS bigdata.emp(
emp_no String,
emp_name String,
dept_no String
)ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
hive> load data local inpath '/home/work/data/hive/emp.txt' overwrite into table bigdata.emp;
Loading data to table bigdata.emp
OK
Time taken: 3.162 seconds
hive> select * from bigdata.emp;
OK
7369 SMITH 20
7499 ALLEN 30
7521 WARD 30
7566 JONES 20
7654 MARTIN 30
7698 BLAKE 30
7782 CLARK 10
7788 SCOTT 20
7839 KING 10
7844 TURNER 30
7876 ADAMS 20
7900 JAMES 30
7902 FORD 20
7934 MILLER 10
Time taken: 0.641 seconds, Fetched: 14 row(s)
hive> set mapred.reduce.tasks;
mapred.reduce.tasks=-1
hive> set mapred.reduce.tasks=3;
hive> set mapred.reduce.tasks;
mapred.reduce.tasks=3
The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas Hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
4.3.1 order by
hive> insert overwrite local directory '/tmp/hive/sort' select * from bigdata.emp order by emp_no desc;
Query ID = work_20201215180311_b4845985-f372-4112-8c9a-9d19236a3910
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1608016084001_0004, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0004/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill job_1608016084001_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-12-15 18:03:28,726 Stage-1 map = 0%, reduce = 0%
2020-12-15 18:03:36,970 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.39 sec
2020-12-15 18:03:43,169 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.53 sec
MapReduce Total cumulative CPU time: 6 seconds 530 msec
Ended Job = job_1608016084001_0004
Moving data to local directory /tmp/hive/sort
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.53 sec HDFS Read: 7440 HDFS Write: 196 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 530 msec
OK
Time taken: 34.088 seconds
hive> set hive.mapred.mode=strict;
hive> set hive.mapred.mode;
hive.mapred.mode=strict
hive> select * from bigdata.emp order by emp_no desc;
FAILED: SemanticException 1:35 Order by-s without limit are disabled for safety reasons. If you know what you are doing, please set hive.strict.checks.orderby.no.limit to false and make sure that hive.mapred.mode is not set to 'strict' to proceed. Note that you may get errors or incorrect results if you make a mistake while using some of the unsafe features.. Error encountered near token 'emp_no'
hive> select * from bigdata.emp order by emp_no desc limit 1;
Query ID = work_20201215181124_d047171a-8fd9-4b83-b68b-6f0675aae83a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1608016084001_0005, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0005/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill job_1608016084001_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-12-15 18:11:43,143 Stage-1 map = 0%, reduce = 0%
2020-12-15 18:11:52,553 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.04 sec
2020-12-15 18:11:58,794 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.01 sec
MapReduce Total cumulative CPU time: 6 seconds 10 msec
Ended Job = job_1608016084001_0005
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.01 sec HDFS Read: 7913 HDFS Write: 114 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 10 msec
OK
7934 MILLER 10
Time taken: 35.612 seconds, Fetched: 1 row(s)
4.3.2 sort by
hive> insert overwrite local directory '/tmp/hive/sort' select * from bigdata.emp sort by emp_no desc;
Query ID = work_20201215181405_80904231-c85b-481b-8ee3-ce14a09bb4d9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1608016084001_0006, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0006/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill job_1608016084001_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2020-12-15 18:14:24,793 Stage-1 map = 0%, reduce = 0%
2020-12-15 18:14:33,072 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.26 sec
2020-12-15 18:14:42,335 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.21 sec
MapReduce Total cumulative CPU time: 15 seconds 210 msec
Ended Job = job_1608016084001_0006
Moving data to local directory /tmp/hive/sort
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 15.21 sec HDFS Read: 14162 HDFS Write: 196 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 210 msec
OK
Time taken: 39.026 seconds
[work@bigdatatest02 sort]$ cat 000000_0
7844TURNER30
7839KING10
7788SCOTT20
7782CLARK10
7698BLAKE30
7654MARTIN30
[work@bigdatatest02 sort]$ cat 000001_0
7934MILLER10
7900JAMES30
7876ADAMS20
7566JONES20
7521WARD30
7499ALLEN30
[work@bigdatatest02 sort]$ cat 000002_0
7902FORD20
7369SMITH20
4.3.3 distribute by
hive> insert overwrite local directory '/tmp/hive/distribute' select * from bigdata.emp distribute by length(emp_name);
Query ID = work_20201215194310_ac24b005-4385-44b6-b785-b5beb3f67227
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1608016084001_0007, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0007/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill job_1608016084001_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2020-12-15 19:43:26,280 Stage-1 map = 0%, reduce = 0%
2020-12-15 19:43:34,496 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.61 sec
2020-12-15 19:43:40,687 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.91 sec
MapReduce Total cumulative CPU time: 13 seconds 910 msec
Ended Job = job_1608016084001_0007
Moving data to local directory /tmp/hive/distribute
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 13.91 sec HDFS Read: 14729 HDFS Write: 196 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 13 seconds 910 msec
OK
Time taken: 32.279 seconds
[work@bigdatatest02 distribute]$ ls
000000_0 000001_0 000002_0
[work@bigdatatest02 distribute]$ cat 000000_0
7934MILLER10
7844TURNER30
7654MARTIN30
[work@bigdatatest02 distribute]$ cat 000001_0
7902FORD20
7839KING10
7521WARD30
[work@bigdatatest02 distribute]$ cat 000002_0
7782CLARK10
7900JAMES30
7876ADAMS20
7788SCOTT20
7698BLAKE30
7566JONES20
7499ALLEN30
7369SMITH20
4.3.4 cluster by
hive> insert overwrite local directory '/tmp/hive/distributeandsort' select * from bigdata.emp distribute by emp_no sort by emp_no;
Query ID = work_20201215194756_b5359ad0-df8b-4523-9ca7-51b9cbbad376
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1608016084001_0008, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0008/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill job_1608016084001_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2020-12-15 19:48:12,418 Stage-1 map = 0%, reduce = 0%
2020-12-15 19:48:21,652 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.39 sec
2020-12-15 19:48:29,858 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 12.81 sec
MapReduce Total cumulative CPU time: 12 seconds 810 msec
Ended Job = job_1608016084001_0008
Moving data to local directory /tmp/hive/distributeandsort
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 12.81 sec HDFS Read: 14280 HDFS Write: 196 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 810 msec
OK
Time taken: 34.157 seconds
hive> insert overwrite local directory '/tmp/hive/cluster' select * from bigdata.emp cluster by emp_no;
Query ID = work_20201215194933_6479ba1e-d9e9-41b4-991d-a9e599e45410
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1608016084001_0009, Tracking URL = http://bigdatatest02:8088/proxy/application_1608016084001_0009/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill job_1608016084001_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2020-12-15 19:49:53,537 Stage-1 map = 0%, reduce = 0%
2020-12-15 19:50:03,407 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.16 sec
2020-12-15 19:50:12,660 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.76 sec
MapReduce Total cumulative CPU time: 15 seconds 760 msec
Ended Job = job_1608016084001_0009
Moving data to local directory /tmp/hive/cluster
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 15.76 sec HDFS Read: 14220 HDFS Write: 196 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 760 msec
OK
Time taken: 39.878 seconds
[work@bigdatatest02 hive]$ cd distributeandsort/
[work@bigdatatest02 distributeandsort]$ ls
000000_0 000001_0 000002_0
[work@bigdatatest02 distributeandsort]$ cat 000000_0
7521WARD30
7566JONES20
7698BLAKE30
7782CLARK10
7788SCOTT20
7839KING10
7902FORD20
[work@bigdatatest02 distributeandsort]$ cat 000001_0
7369SMITH20
7654MARTIN30
7876ADAMS20
7900JAMES30
[work@bigdatatest02 distributeandsort]$ cat 000002_0
7499ALLEN30
7844TURNER30
7934MILLER10
[work@bigdatatest02 distributeandsort]$ cd ../cluster/
[work@bigdatatest02 cluster]$ ls
000000_0 000001_0 000002_0
[work@bigdatatest02 cluster]$ cat 000000_0
7521WARD30
7566JONES20
7698BLAKE30
7782CLARK10
7788SCOTT20
7839KING10
7902FORD20
[work@bigdatatest02 cluster]$ cat 000001_0
7369SMITH20
7654MARTIN30
7876ADAMS20
7900JAMES30
[work@bigdatatest02 cluster]$ cat 000002_0
7499ALLEN30
7844TURNER30
7934MILLER10