早期手机平台上通常是在设备中增加一个硬件看门狗(WatchDog), 软件系统必须定时的向看门狗硬件中写值来表示自己没出故障(俗称“喂狗”), 否则超过了规定的时间看门狗就会重新启动设备. 大体原理是, 在系统运行以后启动了看门狗的计数器, 看门狗就开始自动计数,如果到了一定的时间还不去清看门狗,那么看门狗计数器就会溢出从而引起看门狗中断,造成系统复位。
而手机, 其实是一个超强超强的单片机, 其运行速度比单片机快N倍, 存储空间比单片机大N倍, 里面运行了若干个线程, 各种软硬件协同工作, Android 的 SystemServer 是一个非常复杂的进程,里面运行的服务超过五十种,是最可能出问题的进程,因此有必要对 SystemServer 中运行的各种线程实施监控。
但是如果使用硬件看门狗的工作方式,每个线程隔一段时间去喂狗,不但非常浪费CPU,而且会导致程序设计更加复杂。因此 Android 开发了 Watchdog 类作为软件看门狗来监控 SystemServer 中的线程。一旦发现问题,Watchdog 会杀死 SystemServer 进程。
Watchdog主要有两个作用
判断线程是否卡住的方法
MessageQueue.isPolling
Monitor.monitor
---
HandlerChecker 检查looper是否阻塞
monitor 检查是否死锁
Watchdog的工作机制 https://img-blog.csdnimg.cn/img_convert/e5c8133c7f86583251c775de4ceae9c0.jpeg
Watchdog 是在 SystemServer 进程中被初始化和启动的,在 SystemServer 的 run 方法中,各种Android 服务被注册和启动,其中也包括了Watchdog 的初始化和启动,代码如下:
final Watchdog watchdog = Watchdog.getInstance();//line: 864
watchdog.init(context, mActivityManagerService);
在 SystemServer 中 startOtherServices()
的后半段,在 AMS(ActivityManagerService) 的 SystemReady 接口的 CallBack 函数中实现 Watchdog 的启动:
Watchdog.getInstance().start();//line: 1852
super("watchdog");
//初始化每一个我们希望检查的线程
//这里没有检查后台线程
//共享的前台线程是主检查器, 还有分配其monitor检查其它线程
mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
"foreground thread", DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// 为主线程添加检查器
mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
"main thread", DEFAULT_TIMEOUT));
// 为共享UI线程添加检查器
mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
"ui thread", DEFAULT_TIMEOUT));
// 为共享IO线程添加检查器
mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
"i/o thread", DEFAULT_TIMEOUT));
// 为共享display线程添加检查器.
mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
"display thread", DEFAULT_TIMEOUT));
// 初始化检查器 binder线程.
addMonitor(new BinderThreadMonitor());
mOpenFdMonitor = OpenFdMonitor.create();
// See the notes on DEFAULT_TIMEOUT.
assert DB ||
DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
Watchdog的构造方法中创建了一些HandlerChecker对象, 并添加到自己的监听队列中.
线程名 | 对应handler | 说明 | Timeout |
---|---|---|---|
foreground thread | FgThread.getHandler() | 前台线程 | 60s |
main thread | new Handler(Looper.getMainLooper()) | 主线程 | 60s |
ui thread | UiThread.getHandler() | UI线程 | 60s |
i/o thread | IoThread.getHandler() | IO线程 | 60s |
display thread | DisplayThread.getHandler() | Display线程 | 60s |
PackageManager | addThread(mHandler, time) | PackageManagerService主动add的线程 | 10min |
PackageManager | addThread(mHandler, time) | PermissionManagerService主动add的线程 | 60s |
PowerManagerService | addThread(mHandler, time) | PowerManagerService主动add的线程 | 60s |
ActivityManagerService | addThread(mHandler, time) | ActivityManagerService主动add的线程 | 60s |
monitor程名 | 说明 | Timeout |
---|---|---|
BinderThreadMonitor | 检查Binder线程 | 60s |
OpenFdMonitor | 检查fd线程 | 60s |
TvRemoteService | addMonitor(this) mLock | |
ActivityManagerService | addMonitor(this) this | |
MediaProjectionManagerService | addMonitor(this) mLock | |
MediaRouterService | addMonitor(this) mLock | |
MediaSessionService | addMonitor(this) mLock | |
InputManagerService | addMonitor(this) mInputFilterLock nativeMonitor(mPtr); | |
PowerManagerService | addMonitor(this) mLock | |
NetworkManagementService | addMonitor(this) mConnector | |
StorageManagerService | addMonitor(this) mVold | |
WindowManagerService | addMonitor(this) mWindowMap |
public final class HandlerChecker implements Runnable
HandlerChecker用于检查句柄线程的状态和调度监视器回调, 其原理就是通过各个Handler的looper的MessageQueue来判断该线程是否卡住了。当然,该线程是运行在SystemServer进程中的线程。
Watchdog中会构建很多的HandlerChecker, 可以分为两类
两类HandlerChecker的侧重点不同
public final class HandlerChecker implements Runnable {
private final Handler mHandler;
private final String mName;
private final long mWaitMax;
private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();
private boolean mCompleted;
private Monitor mCurrentMonitor;
private long mStartTime;
HandlerChecker(Handler handler, String name, long waitMaxMillis) {
mHandler = handler; //线程handler
mName = name; //名称
mWaitMax = waitMaxMillis; //等待超时时间
mCompleted = true; //线程状态
}
}
这个方法是在Watchdog中的run方法会调用, 是HandlerChecker的核心方法, 用来检查HandlerChecker是否发生了死锁.
public void scheduleCheckLocked() {
if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
// If the target looper has recently been polling, then
// there is no reason to enqueue our checker on it since that
// is as good as it not being deadlocked. This avoid having
// to do a context switch to check the thread. Note that we
// only do this if mCheckReboot is false and we have no
// monitors, since those would need to be executed at this point.
mCompleted = true;
return;
}
if (!mCompleted) {
// we already have a check in flight, so no need
return;
}
mCompleted = false;
mCurrentMonitor = null;
mStartTime = SystemClock.uptimeMillis();
mHandler.postAtFrontOfQueue(this);
}
在scheduleCheckLocked 中,其实主要是处理mMonitorChecker 的情况,对于其他的没有monitor 注册进来的且处于polling 状态的 HandlerChecker 是不去检查的,例如,UiThread,肯定一直处于polling 状态。
mHandler.getLooper().getQueue().isPolling()
这个方法可以判断当前线程是否被卡住.
true: 表示looper当前正在轮询事件,
这个方法的实现在MessageQueue中,可以看到上面的注释写到:返回当前的looper线程是否在polling工作来做,这个是个很好的用于检测loop是否存活的方法。
frameworks/base/core/java/android/os/MessageQueue.java
/**
* Returns whether this looper's thread is currently polling for more work to do.
* This is a good signal that the loop is still alive rather than being stuck
* handling a callback. Note that this method is intrinsically racy, since the
* state of the loop can change before you get the result back.
*
* <p>This method is safe to call from any thread.
*
* @return True if the looper is currently polling for events.
* @hide
*/
public boolean isPolling() {
synchronized (this) {
return isPollingLocked();
}
}
@Override
public void run() {
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
mCurrentMonitor.monitor();
}
synchronized (Watchdog.this) {
mCompleted = true;
mCurrentMonitor = null;
}
}
public int getCompletionStateLocked() {
if (mCompleted) {
return COMPLETED;
} else {
long latency = SystemClock.uptimeMillis() - mStartTime;
if (latency < mWaitMax/2) {
return WAITING;
} else if (latency < mWaitMax) {
return WAITED_HALF;
}
}
return OVERDUE;
}
状态 | 描述 |
---|---|
COMPLETED | 对应消息已处理完毕线程无阻塞 |
WAITING | 对应消息处理花费0~29秒,继续运行 |
WAITED_HALF | 对应消息处理花费30~59秒,线程可能已经被阻塞,需要保存当前AMS堆栈状态, 继续监听 |
OVERDUE | 对应消息处理已经花费超过60, 准备 kill 当前进程. 能够走到这里,说明已经发生了超时60秒了。那么下面接下来全是应对超时的情况 |
这里的HandlerChecker使用的传入参数都是创建的HandlerThread线程的Handler
java.lang.Object
↳ Thread implements Runnable
↳ HandlerThread extends Thread
↳ ServiceThread extends HandlerThread
↳ FgThread extends ServiceThread
public ServiceThread(String name, int priority, boolean allowIo)
private FgThread() {
super("android.fg", android.os.Process.THREAD_PRIORITY_DEFAULT, true /*allowIo*/);
}
private UiThread() {
super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/);
}
private IoThread() {
super("android.io", android.os.Process.THREAD_PRIORITY_DEFAULT, true /*allowIo*/);
}
private DisplayThread() {
//DisplayThread运行重要的东西,但这些东西不如AnimationThread中运行的东西重要。
//因此,将优先级设置为较低的一个。
super("android.display", Process.THREAD_PRIORITY_DISPLAY + 1, false /*allowIo*/);
}
frameworks/base/core/java/android/os/Process.java
public static final int THREAD_PRIORITY_DEFAULT = 0; //默认的线程优先级
public static final int THREAD_PRIORITY_LOWEST = 19; //最低的线程级别
public static final int THREAD_PRIORITY_BACKGROUND = 10; //后台线程建议设置这个优先级
public static final int THREAD_PRIORITY_FOREGROUND = -2; //用户正在交互的UI线程,代码中无法设置该优先级,系统会按照情况调整到该优先级
public static final int THREAD_PRIORITY_DISPLAY = -4; //也是与UI交互相关的优先级界别,但是要比THREAD_PRIORITY_FOREGROUND优先
public static final int THREAD_PRIORITY_URGENT_DISPLAY = -8; //显示线程的最高级别,用来处理绘制画面和检索输入事件
public static final int THREAD_PRIORITY_AUDIO = -16; //声音线程的标准级别
public static final int THREAD_PRIORITY_URGENT_AUDIO = -19; //声音线程的最高级别,优先程度较THREAD_PRIORITY_AUDIO要高。
public static final int THREAD_PRIORITY_MORE_FAVORABLE = -1; //相对THREAD_PRIORITY_DEFAULT稍微优先
public static final int THREAD_PRIORITY_LESS_FAVORABLE = 1; // 相对THREAD_PRIORITY_DEFAULT稍微落后一些
应用设置线程优先级的方法如下, 但是有一些级别是不允许应用设置的, 是由系统进行分配的.
Process.setThreadPriority(Process.THREAD_PRIORITY_BACKGROUND +
Process.THREAD_PRIORITY_LESS_FAVORABLE)
public String describeBlockedStateLocked() {
if (mCurrentMonitor == null) {
return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";
} else {
return "Blocked in monitor " + mCurrentMonitor.getClass().getName()
+ " on " + mName + " (" + getThread().getName() + ")";
}
}
打印Monitor信息
Monitor是一个接口, 用来
public interface Monitor {
void monitor();
}
ActivityManagerService
WindowManagerService
PowerManagerService
InputManagerService
MediaSessionService
MediaRouterService
StorageManagerService
NetworkManagementService
NativeDaemonConnector
MediaProjectionManagerService
TvRemoteService
BinderThreadMonitor
OpenFdMonitor
Monitor是一个接口,实现这个接口的类有好几个。比如:如下是android9.0搜出来的结果
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QpJfi2aa-1666612570217)(/home/jun/Desktop/Plane3/CoreSystemServer/watchdog/WatchdogImplClass.png)]
这么多的类实现了该接口, 他们都注册到了Watchdog中, 如AMS中
public class ActivityManagerService extends IActivityManager.Stub
implements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback {
......
public ActivityManagerService(Context systemContext) {
......
Watchdog.getInstance().addMonitor(this);
Watchdog.getInstance().addThread(mHandler);
......
}
......
/** In this method we try to acquire our lock to make sure that we have not deadlocked */
public void monitor() {
synchronized (this) { }
}
......
}
public void addThread(Handler thread) {
addThread(thread, DEFAULT_TIMEOUT); //60s
}
public void addThread(Handler thread, long timeoutMillis) {
synchronized (this) {
if (isAlive()) {
throw new RuntimeException("Threads can't be added once the Watchdog is running");
}
final String name = thread.getLooper().getThread().getName();
mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
}
}
public void addMonitor(Monitor monitor) {
synchronized (this) {
if (isAlive()) {
throw new RuntimeException("Monitors can't be added once the Watchdog is running");
}
mMonitorChecker.addMonitor(monitor);
}
}
Watchdog的核心方法, 检查线程死锁, looper阻塞, 收集信息和kill掉system_server进程, 重启
@Override
public void run() {
boolean waitedHalf = false;
while (true) {
final List<HandlerChecker> blockedCheckers;
final String subject;
final boolean allowRestart;
int debuggerWasConnected = 0;
synchronized (this) {
long timeout = CHECK_INTERVAL;
// Make sure we (re)spin the checkers that have become idle within
// this wait-and-check interval
for (int i=0; i<mHandlerCheckers.size(); i++) {
//调用每个HandlerChecker的scheduleCheckLocked() 方法
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();
}
if (debuggerWasConnected > 0) {
debuggerWasConnected--;
}
// NOTE: We use uptimeMillis() here because we do not want to increment the time we
// wait while asleep. If the device is asleep then the thing that we are waiting
// to timeout on is asleep as well and won't have a chance to run, causing a false
// positive on when to kill things.
long start = SystemClock.uptimeMillis();
while (timeout > 0) {
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
try {
wait(timeout);
} catch (InterruptedException e) {
Log.wtf(TAG, e);
}
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
}
boolean fdLimitTriggered = false;
if (mOpenFdMonitor != null) {
fdLimitTriggered = mOpenFdMonitor.monitor();
}
if (!fdLimitTriggered) {
final int waitState = evaluateCheckerCompletionLocked();
if (waitState == COMPLETED) { //线程状态正常,重新轮询
// The monitors have returned; reset
waitedHalf = false;
continue;
} else if (waitState == WAITING) {//处于阻塞状态,但监测时间小于30s,继续监测
// still waiting but within their configured intervals; back off and recheck
continue;
} else if (waitState == WAITED_HALF) {//处于阻塞状态,监测时间已经超过30s,开始dump一些系统信息,然后继续监测30s
if (!waitedHalf) {
// We've waited half the deadlock-detection interval. Pull a stack
// trace and wait another half.
ArrayList<Integer> pids = new ArrayList<Integer>();
pids.add(Process.myPid());
ActivityManagerService.dumpStackTraces(true, pids, null, null,
getInterestingNativePids());
waitedHalf = true;
}
continue;
}
// something is overdue!
blockedCheckers = getBlockedCheckersLocked();
subject = describeCheckersLocked(blockedCheckers);
} else {
blockedCheckers = Collections.emptyList();
subject = "Open FD high water mark reached";
}
allowRestart = mAllowRestart;
}
// If we got here, that means that the system is most likely hung.
// First collect stack traces from all threads of the system process.
// Then kill this process so that the system will restart.
EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
ArrayList<Integer> pids = new ArrayList<>();
pids.add(Process.myPid());
if (mPhonePid > 0) pids.add(mPhonePid);
// Pass !waitedHalf so that just in case we somehow wind up here without having
// dumped the halfway stacks, we properly re-initialize the trace file.
final File stack = ActivityManagerService.dumpStackTraces(
!waitedHalf, pids, null, null, getInterestingNativePids());
// Give some extra time to make sure the stack traces get written.
// The system's been hanging for a minute, another second or two won't hurt much.
SystemClock.sleep(2000);
// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
doSysRq('w');
doSysRq('l');
// Try to add the error to the dropbox, but assuming that the ActivityManager
// itself may be deadlocked. (which has happened, causing this statement to
// deadlock and the watchdog as a whole to be ineffective)
Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
public void run() {
mActivity.addErrorToDropBox(
"watchdog", null, "system_server", null, null,
subject, null, stack, null);
}
};
dropboxThread.start();
try {
dropboxThread.join(2000); // wait up to 2 seconds for it to return.
} catch (InterruptedException ignored) {}
IActivityController controller;
synchronized (this) {
controller = mController;
}
if (controller != null) {
Slog.i(TAG, "Reporting stuck state to activity controller");
try {
Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
// 1 = keep waiting, -1 = kill system
int res = controller.systemNotResponding(subject);
if (res >= 0) {
Slog.i(TAG, "Activity controller requested to coninue to wait");
waitedHalf = false;
continue;
}
} catch (RemoteException e) {
}
}
// Only kill the process if the debugger is not attached.
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
if (debuggerWasConnected >= 2) {
Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
} else if (debuggerWasConnected > 0) {
Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
} else if (!allowRestart) {
Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
} else {
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
Slog.w(TAG, "*** GOODBYE!");
Process.killProcess(Process.myPid());
System.exit(10);
}
waitedHalf = false;
}
}
run() 方法就是死循环, 不断的去遍历所有HandlerChecker,并调其监控方法,等待三十秒,评估状态。
遍历所有的HandlerChecker, 并调用其scheduleCheckLocked方法, 记录开始时间
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();
}
等待 30 秒
// 等待30秒
//使用uptimeMills是为了不把手机睡眠时间算进入,手机睡眠时系统服务同样睡眠
long start = SystemClock.uptimeMillis();
while (timeout > 0) {
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
try {
wait(timeout);
} catch (InterruptedException e) {
Log.wtf(TAG, e);
}
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
}
评估Checker的状态,里面会遍历所有的HandlerChecker,并获取最大的返回值。
最大的返回值有四种情况:
boolean fdLimitTriggered = false;
if (mOpenFdMonitor != null) {
fdLimitTriggered = mOpenFdMonitor.monitor();
}
if (!fdLimitTriggered) {
final int waitState = evaluateCheckerCompletionLocked();
if (waitState == COMPLETED) {
// The monitors have returned; reset
waitedHalf = false;
continue;
} else if (waitState == WAITING) {
// still waiting but within their configured intervals; back off and recheck
continue;
} else if (waitState == WAITED_HALF) {
if (!waitedHalf) {
// We've waited half the deadlock-detection interval. Pull a stack
// trace and wait another half.
ArrayList<Integer> pids = new ArrayList<Integer>();
pids.add(Process.myPid());
ActivityManagerService.dumpStackTraces(true, pids, null, null,
getInterestingNativePids());
waitedHalf = true;
}
continue;
}
// something is overdue!
blockedCheckers = getBlockedCheckersLocked();
subject = describeCheckersLocked(blockedCheckers);
} else {
blockedCheckers = Collections.emptyList();
subject = "Open FD high water mark reached";
}
fdMonitor
public boolean monitor() {
if (mFdHighWaterMark.exists()) {
dumpOpenDescriptors();
return true;
}
return false;
}
收集信息
杀死系统进程
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
Slog.w(TAG, "*** GOODBYE!");
Process.killProcess(Process.myPid());
System.exit(10);
评估Checker的状态,里面会遍历所有的HandlerChecker,并获取最大的返回值。
private int evaluateCheckerCompletionLocked() {
int state = COMPLETED;// COMPLETED = 0
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
state = Math.max(state, hc.getCompletionStateLocked());
}
return state;
}
private ArrayList<HandlerChecker> getBlockedCheckersLocked() {
ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
if (hc.isOverdueLocked()) {
checkers.add(hc);
}
}
return checkers;
}
private String describeCheckersLocked(List<HandlerChecker> checkers) {
StringBuilder builder = new StringBuilder(128);
for (int i=0; i<checkers.size(); i++) {
if (builder.length() > 0) {
builder.append(", ");
}
builder.append(checkers.get(i).describeBlockedStateLocked());
}
return builder.toString();
}
通过 monitor() 方法检查死锁针对不同线程之间的,而服务主线程是否阻塞是针对主线程,所以通过 sendMessage() 方式是只能检测主线程是否阻塞,而不能检测是否死锁,因为如果服务主线程和另外一个线程发生死锁(如另外一个线程synchronized 关键字长时间持有某个锁,不释放),此时向主线程发送 Message,主线程的Handler是可以继续处理的。
常见Log有下面两种,一种是Blocked in handler 、另外一种是: Blocked in monitor
11-15 06:56:39.696 24203 24902 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in handler on main thread (main), Blocked in handler on ui thread (android.ui)
11-15 06:56:39.696 24203 24902 W Watchdog: main thread stack trace:
11-15 06:56:39.696 24203 24902 W Watchdog: at android.os.MessageQueue.nativePollOnce(Native Method)
11-15 06:56:39.696 24203 24902 W Watchdog: at android.os.MessageQueue.next(MessageQueue.java:323)
11-15 06:56:39.696 24203 24902 W Watchdog: at android.os.Looper.loop(Looper.java:142)
11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.server.SystemServer.run(SystemServer.java:377)
11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.server.SystemServer.main(SystemServer.java:239)
11-15 06:56:39.696 24203 24902 W Watchdog: at java.lang.reflect.Method.invoke(Native Method)
11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:901)
11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:791)
11-15 06:56:39.696 24203 24902 W Watchdog: ui thread stack trace:
......
10-26 00:07:00.884 1000 17132 17312 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.Watchdog$BinderThreadMonitor on foreground thread (android.fg)
10-26 00:07:00.884 1000 17132 17312 W Watchdog: foreground thread stack trace:
10-26 00:07:00.885 1000 17132 17312 W Watchdog: at android.os.Binder.blockUntilThreadAvailable(Native Method)
10-26 00:07:00.885 1000 17132 17312 W Watchdog: at com.android.server.Watchdog$BinderThreadMonitor.monitor(Watchdog.java:381)
10-26 00:07:00.885 1000 17132 17312 W Watchdog: at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:353)
10-26 00:07:00.885 1000 17132 17312 W Watchdog: at android.os.Handler.handleCallback(Handler.java:873)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: at android.os.Looper.loop(Looper.java:193)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:65)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: *** GOODBYE!
Android SystemServer 中 WatchDog 机制介绍
Watchdog识别到SystemServer线程死锁后, 会收集打印信息, 代码在run函数中
while (true) {
//如果发生了死锁或者消息队列阻塞就会走到下面
// If we got here, that means that the system is most likely hung.
// First collect stack traces from all threads of the system process.
// Then kill this process so that the system will restart.
EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
ArrayList<Integer> pids = new ArrayList<>();
pids.add(Process.myPid());
if (mPhonePid > 0) pids.add(mPhonePid);
// Pass !waitedHalf so that just in case we somehow wind up here without having
// dumped the halfway stacks, we properly re-initialize the trace file.
final File stack = ActivityManagerService.dumpStackTraces(
!waitedHalf, pids, null, null, getInterestingNativePids());
// Give some extra time to make sure the stack traces get written.
// The system's been hanging for a minute, another second or two won't hurt much.
SystemClock.sleep(2000);
// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
doSysRq('w');
doSysRq('l');
// Try to add the error to the dropbox, but assuming that the ActivityManager
// itself may be deadlocked. (which has happened, causing this statement to
// deadlock and the watchdog as a whole to be ineffective)
Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
public void run() {
mActivity.addErrorToDropBox(
"watchdog", null, "system_server", null, null,
subject, null, stack, null);
}
};
dropboxThread.start();
try {
dropboxThread.join(2000); // wait up to 2 seconds for it to return.
} catch (InterruptedException ignored) {}
IActivityController controller;
synchronized (this) {
controller = mController;
}
if (controller != null) {
Slog.i(TAG, "Reporting stuck state to activity controller");
try {
Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
// 1 = keep waiting, -1 = kill system
int res = controller.systemNotResponding(subject);
if (res >= 0) {
Slog.i(TAG, "Activity controller requested to coninue to wait");
waitedHalf = false;
continue;
}
} catch (RemoteException e) {
}
}
// Only kill the process if the debugger is not attached.
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
if (debuggerWasConnected >= 2) {
Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
} else if (debuggerWasConnected > 0) {
Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
} else if (!allowRestart) {
Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
} else {
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
Slog.w(TAG, "*** GOODBYE!");
Process.killProcess(Process.myPid());
System.exit(10);
}
waitedHalf = false;
}
输出event log
EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
dump 堆栈信息
ArrayList<Integer> pids = new ArrayList<>();
pids.add(Process.myPid());
if (mPhonePid > 0) pids.add(mPhonePid);
// Pass !waitedHalf so that just in case we somehow wind up here without having
// dumped the halfway stacks, we properly re-initialize the trace file.
final File stack = ActivityManagerService.dumpStackTraces(
!waitedHalf, pids, null, null, getInterestingNativePids());
// Give some extra time to make sure the stack traces get written.
// The system's been hanging for a minute, another second or two won't hurt much.
SystemClock.sleep(2000);
dump kerner info
// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
doSysRq('w');
doSysRq('l');
收集dropbox信息
// Try to add the error to the dropbox, but assuming that the ActivityManager
// itself may be deadlocked. (which has happened, causing this statement to
// deadlock and the watchdog as a whole to be ineffective)
Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
public void run() {
mActivity.addErrorToDropBox(
"watchdog", null, "system_server", null, null,
subject, null, stack, null);
}
};
dropboxThread.start();
try {
dropboxThread.join(2000); // wait up to 2 seconds for it to return.
} catch (InterruptedException ignored) {}
kill 掉系统进程, 如果不在debug模式, 就kill掉自己
// Only kill the process if the debugger is not attached.
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
if (debuggerWasConnected >= 2) {
Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
} else if (debuggerWasConnected > 0) {
Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
} else if (!allowRestart) {
Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
} else {
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
Slog.w(TAG, "*** GOODBYE!");
Process.killProcess(Process.myPid());
System.exit(10);
}
指的是 /data/anr
final String tracesDirProp = SystemProperties.get("dalvik.vm.stack-trace-dir", "");