当前位置: 首页 > 工具软件 > StartAdmin > 使用案例 >

GreenPlum 启动失败Failed to start Master instance in admin mode问题

艾昊明
2023-12-01

开发同事跟我说,测试环境的greenplun突然连接不上了,于是我登陆进去服务器,发现没有greenplun进程了,问开发同事是否有对greenplumn有过改动之类的,他们说没有动过,这就奇了怪了,咋回事呢?



自己手动尝试下gpstart启动报错
[gpadmin@00_mdw ~]$ gpstart
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Starting gpstart with args: 
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Gathering information and validating the environment...
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.10.0 build commit: f413ff3b006655f14b6b9aa217495ec94da5c96c'
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance in admin mode
20170517:10:54:01:017586 gpstart:00_mdw:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20170517:10:54:01:017586 gpstart:00_mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /home/gpadmin/gpdata/gpmaster/gpseg-1 -l /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/home/gpadmin/gpdata/gpmaster/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'
[gpadmin@00_mdw ~]$


日志信息比较简单,没有看出来啥有用的信息,砸破呢?
2017-05-16 11:18:20.666964 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,
2017-05-16 11:18:20.692596 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,
2017-05-16 11:18:20.693209 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,
2017-05-16 13:27:17.059691 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,
2017-05-16 13:27:17.062897 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,
2017-05-16 13:27:17.063528 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,
2017-05-17 10:53:59.610428 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,
2017-05-17 10:53:59.643630 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,
2017-05-17 10:53:59.644220 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,


去日志目录下面去查看所有的日志记录,看到最新的有一个.csv文件,gpdb-2017-05-17_112454.csv

博客来源地址:http://blog.csdn.net/mchdba/article/details/72383684,作者为mchdba黄杉,谢绝转载。

[gpadmin@00_mdw pg_log]$ ll -t
total 740
-rw-------. 1 gpadmin gpadmin    386 May 17 11:24 gpdb-2017-05-17_112454.csv
-rw-------. 1 gpadmin gpadmin   3951 May 17 11:24 startup.log
-rw-------. 1 gpadmin gpadmin    384 May 17 10:53 gpdb-2017-05-17_105359.csv
-rw-------. 1 gpadmin gpadmin    384 May 16 13:27 gpdb-2017-05-16_132717.csv
-rw-------. 1 gpadmin gpadmin    384 May 16 11:18 gpdb-2017-05-16_111820.csv
-rw-------. 1 gpadmin gpadmin  30004 May 16 11:17 gpdb-2017-05-16_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 15 00:00 gpdb-2017-05-15_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 14 00:00 gpdb-2017-05-14_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 13 00:00 gpdb-2017-05-13_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 12 00:00 gpdb-2017-05-12_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 11 00:00 gpdb-2017-05-11_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May 10 00:00 gpdb-2017-05-10_000000.csv
-rw-------. 1 gpadmin gpadmin  13073 May  9 21:14 gpdb-2017-05-09_000000.csv
-rw-------. 1 gpadmin gpadmin  18458 May  8 11:38 gpdb-2017-05-08_000000.csv
-rw-------. 1 gpadmin gpadmin      0 May  7 00:00 gpdb-2017-05-07_000000.csv
[gpadmin@00_mdw pg_log]$ more gpdb-2017-05-17_112454.csv
2017-05-17 11:24:54.936656 CST,,,p17681,th-400611552,,,,0,,,seg-1,,,,,"LOG","F0000","invalid authentication method ""127.0.0.1/28""",,,,,"line 87 of configuration file ""/home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf""",,0,,"hba.c",1095,
2017-05-17 11:24:54.936871 CST,,,p17681,th-400611552,,,,0,,,seg-1,,,,,"FATAL","XX000","could not load pg_hba.conf",,,,,,,0,,"postmaster.c",1529,
[gpadmin@00_mdw pg_log]$ 

看到gpdb-2017-05-17_112454.csv文件里面描述的很清晰,是pg_hba.conf配置文件有误,然后去找配置文件/home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf,注释掉报错的那一行【line 87 of configuration file 】“127.0.0.1/28"”

#local all all 127.0.0.1/28 trust



然后再次启动greenplum集群,ok,可以启动起来了
[gpadmin@00_mdw pg_log]$ gpstart
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting gpstart with args: 
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Gathering information and validating the environment...
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.10.0 build commit: f413ff3b006655f14b6b9aa217495ec94da5c96c'
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance in admin mode
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Setting new master era
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master Started...
20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Shutting down master
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 02_sdw directory /home/gpadmin/gpdata/gpdatam1/gpseg0 <<<<<
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 02_sdw directory /home/gpadmin/gpdata/gpdatam2/gpseg1 <<<<<
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 01_sdw directory /home/gpadmin/gpdata/gpdatam1/gpseg4 <<<<<
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 01_sdw directory /home/gpadmin/gpdata/gpdatam2/gpseg5 <<<<<
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master instance parameters
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Database                 = template1
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master Port              = 5432
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master directory         = /home/gpadmin/gpdata/gpmaster/gpseg-1
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Timeout                  = 600 seconds
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master standby           = Off 
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------------------
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Segment instances that will be started
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------------------
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Host      Datadir                                Port    Role
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   01_sdw    /home/gpadmin/gpdata/gpdatap1/gpseg0   40000   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   01_sdw    /home/gpadmin/gpdata/gpdatap2/gpseg1   40001   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   02_sdw    /home/gpadmin/gpdata/gpdatap1/gpseg2   40000   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatam1/gpseg2   50000   Mirror
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   02_sdw    /home/gpadmin/gpdata/gpdatap2/gpseg3   40001   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatam2/gpseg3   50001   Mirror
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatap1/gpseg4   40000   Primary
20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatap2/gpseg5   40001   Primary

Continue with Greenplum instance startup Yy|Nn (default=N):
> y
20170517:11:28:25:017745 gpstart:00_mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
... 
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Process results...
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Successful segment starts                                            = 8
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration)   = 4   <<<<<<<<
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Successfully started 8 of 8 segment instances, skipped 4 other segments 
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-****************************************************************************
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-There are 4 segment(s) marked down in the database
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-****************************************************************************
20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance 00_mdw directory /home/gpadmin/gpdata/gpmaster/gpseg-1 
20170517:11:28:29:017745 gpstart:00_mdw:gpadmin-[INFO]:-Command pg_ctl reports Master 00_mdw instance active
20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[INFO]:-No standby master configured.  skipping...
20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Number of segments not attempted to start: 4
20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[INFO]:-Check status of database with gpstate utility
[gpadmin@00_mdw pg_log]$

bty有意思的是greenplum的关键报错信息竟然不在log日志里面,而是记录在了同目录的csv文件里面,这大大惊呆我,哈哈。

最后问题分析,为啥这条127的配置,greenplum就起不起来了呢,去查看pg_hba.conf文件,猜测原因有如下情况:
(1)因为已经有了一个127.0.0.1/28的配置了,导致相互冲突了 ``` [gpadmin@00_mdw ~]$ more /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf |grep 127 host all gpadmin 127.0.0.1/28 trust #local all all 127.0.0.1/28 trust [gpadmin@00_mdw ~]$ ``` (2)local后面只能跟ident之类的配置,不能跟127.....trust的配置 ``` [gpadmin@00_mdw ~]$ more /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf |grep local |grep -v "#" local all gpadmin ident local replication gpadmin ident #local all all 127.0.0.1/28 trust [gpadmin@00_mdw ~]$ ```
 类似资料: