有关Kerberos认证相关的问题这里不展开讲,有需要的小伙伴可以自行百度,这里主要讲下使用datax抽数从个MySQL到Hive的kerberos认证问题处理解决办法。
datax(mysql2hive.json)(示例):
writer.parameter参数设置:
"haveKerberos":true,
"kerberosKeytabFilePath":"/xxx/kerberos/app_prd.keytab",
"defaultFS":"hdfs://nameservice1",
"kerberosPrincipal":"app_prd@FAYSON.COM"
代码如下(示例):
将 core-site.xml hdfs-site.xml hive-site.xml yarn-site.xml 相关的文件,分别放置在datax/hdfsreader/src/main/resources和datax/hdfswriter/src/main/resources目录下
source ~/.bashrc
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
这里直接将配置文件附在了json文件中更直观,不过看起来有些冗长。
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "xxxx",
"password": "xxxx",
"column" : [
"id"
,"les_id"
,"grade_id"
,"edition_id"
,"subject_id"
,"course_system_first_id"
,"course_system_second_id"
,"course_system_third_id"
,"course_system_four_id"
,"custom_points"
,"deleted"
,"created_at"
,"tea_id"
,"stu_id"
,"les_uid"
,"updated_at"
,"pt"
],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://xxxx:3306/test?useUnicode=true&characterEncoding=utf8"],
"table": ["ods_lesson_course_content_rt_df_tmp"]
}
]
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":"id" , "type":"int"},
{"name":"les_id" , "type":"int"},
{"name":"grade_id" , "type":"int"},
{"name":"edition_id", "type":"int"},
{"name":"subject_id", "type":"int"},
{"name":"course_system_first_id" , "type":"int"},
{"name":"course_system_second_id", "type":"int"},
{"name":"course_system_third_id" , "type":"int"},
{"name":"course_system_four_id" , "type":"int"},
{"name":"custom_points", "type":"string"},
{"name":"deleted" ,"type":"TINYINT"},
{"name":"created_at" ,"type":"string"},
{"name":"tea_id" ,"type":"int"},
{"name":"stu_id", "type":"int"},
{"name":"les_uid" ,"type":"string"},
{"name":"updated_at" ,"type":"string"}
],
"defaultFS": "hdfs://nameservice1",
"hadoopConfig":{
"dfs.nameservices": "nameservice1",
"dfs.ha.namenodes.nameservice1": "namenode286,namenode36",
"dfs.namenode.rpc-address.nameservice1.namenode286": "xxxx:8020",
"dfs.namenode.rpc-address.nameservice1.namenode36": "xxxx:8020",
"dfs.client.failover.proxy.provider.nameservice1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
},
"haveKerberos": "true",
"kerberosKeytabFilePath": "/home/xx/kerberos/xxx.keytab",
"kerberosPrincipal":"xxx@FAYSON.COM",
"encoding": "UTF-8",
"fileType": "orc",
"fileName": "ods_lesson_course_content_rt_df_orc_2",
"path": "/user/hive/warehouse/ods.db/ods_lesson_course_content_rt_df_orc_2/pt=2020-01-20",
"writeMode": "append", // append & overwrite
"fieldDelimiter" :"\u0001"
}
}
}
],
"setting": {
"speed": {
"channel": "5"
},
"errorLimit": {
"record": 0
}
}
}
}
这里主要提供了有关datax抽数的kerberos认证问题解决方案,但是如果Hadoop集群相关的配置文件发生了修改比如NameNode机器、端口号发生修改都需要重新打包,如果不想这么做也可以在配置文件中直接指定,在最后附上解决方案,具体使用哪一种自行决定,我们之所以决定打包到hdfs reader和writer插件中是因为不想json看起来太长,而且像集群的配置文件基本上不会做大的变动,当然如果是升级集群那就需要重新打包了,但是第二种方式就不需要。