当前位置: 首页 > 工具软件 > Kerberos > 使用案例 >

Datax从入门到精通03--Kerberos认证问题处理

尤飞尘
2023-12-01


前言

如果没有使用kerberos认证的Hadoop集群直接参照官方文档就好了,但是如果使用了kerberos安全认证的,那么接下来是一个比较值得推荐的做法。

一、Kerberos认证问题

有关Kerberos认证相关的问题这里不展开讲,有需要的小伙伴可以自行百度,这里主要讲下使用datax抽数从个MySQL到Hive的kerberos认证问题处理解决办法。

二、解决方案一

1.修改Json文件

datax(mysql2hive.json)(示例):

writer.parameter参数设置:

 "haveKerberos":true,
 "kerberosKeytabFilePath":"/xxx/kerberos/app_prd.keytab",
 "defaultFS":"hdfs://nameservice1",
 "kerberosPrincipal":"app_prd@FAYSON.COM"

2. 配置文件

代码如下(示例):

将 core-site.xml hdfs-site.xml hive-site.xml yarn-site.xml 相关的文件,分别放置在datax/hdfsreader/src/main/resources和datax/hdfswriter/src/main/resources目录下

3. 重新编译打包

source ~/.bashrc
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

三、解决方案二

这里直接将配置文件附在了json文件中更直观,不过看起来有些冗长。

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "xxxx",
                        "password": "xxxx",
                        "column" : [
                                       "id"                   
                                      ,"les_id"               
                                      ,"grade_id"             
                                      ,"edition_id"           
                                      ,"subject_id"           
                                      ,"course_system_first_id"
                                      ,"course_system_second_id"
                                      ,"course_system_third_id"
                                      ,"course_system_four_id"
                                      ,"custom_points"        
                                      ,"deleted"              
                                      ,"created_at"           
                                      ,"tea_id"               
                                      ,"stu_id"               
                                      ,"les_uid"              
                                      ,"updated_at"           
                                      ,"pt"
                                ],
                        "connection": [
                            {
                                "jdbcUrl": ["jdbc:mysql://xxxx:3306/test?useUnicode=true&characterEncoding=utf8"],
                                "table": ["ods_lesson_course_content_rt_df_tmp"]
                                  
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "hdfswriter",
                    "parameter": {
                        "column": [
                            {"name":"id"        , "type":"int"},
                            {"name":"les_id"    , "type":"int"},
                            {"name":"grade_id"  , "type":"int"},
                            {"name":"edition_id", "type":"int"},
                            {"name":"subject_id", "type":"int"},
                            {"name":"course_system_first_id" , "type":"int"},
                            {"name":"course_system_second_id", "type":"int"},
                            {"name":"course_system_third_id" , "type":"int"},
                            {"name":"course_system_four_id"  , "type":"int"},
                            {"name":"custom_points", "type":"string"},
                            {"name":"deleted"      ,"type":"TINYINT"},
                            {"name":"created_at"   ,"type":"string"},
                            {"name":"tea_id"       ,"type":"int"},
                            {"name":"stu_id",       "type":"int"},
                            {"name":"les_uid"      ,"type":"string"},
                            {"name":"updated_at"   ,"type":"string"}
  
                        ],
                        "defaultFS": "hdfs://nameservice1",
                        "hadoopConfig":{
                                 "dfs.nameservices": "nameservice1",
                                 "dfs.ha.namenodes.nameservice1": "namenode286,namenode36",
                                 "dfs.namenode.rpc-address.nameservice1.namenode286": "xxxx:8020",
                                 "dfs.namenode.rpc-address.nameservice1.namenode36": "xxxx:8020",
                                 "dfs.client.failover.proxy.provider.nameservice1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
                         },
                         "haveKerberos": "true",
                         "kerberosKeytabFilePath": "/home/xx/kerberos/xxx.keytab",
                         "kerberosPrincipal":"xxx@FAYSON.COM",
                         "encoding": "UTF-8",
                         "fileType": "orc",
                         "fileName": "ods_lesson_course_content_rt_df_orc_2",
                         "path": "/user/hive/warehouse/ods.db/ods_lesson_course_content_rt_df_orc_2/pt=2020-01-20",
                         "writeMode": "append", // append & overwrite
                         "fieldDelimiter" :"\u0001"               
                       }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": "5"
            },
             "errorLimit": {
                "record": 0
            }
        }
    }
}

总结

这里主要提供了有关datax抽数的kerberos认证问题解决方案,但是如果Hadoop集群相关的配置文件发生了修改比如NameNode机器、端口号发生修改都需要重新打包,如果不想这么做也可以在配置文件中直接指定,在最后附上解决方案,具体使用哪一种自行决定,我们之所以决定打包到hdfs reader和writer插件中是因为不想json看起来太长,而且像集群的配置文件基本上不会做大的变动,当然如果是升级集群那就需要重新打包了,但是第二种方式就不需要。

 类似资料: