背景是我们的项目有用到boringssl。然后用户(魅族手机)报了一个bug,使用arm64的库的时候,会crash。但是使用armv7的库时不会。
我找了日志,定位到代码crash的代码在
https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/fipsmodule/rand/urandom.c#171
// init_once initializes the state of this module to values previously
// requested. This is the only function that modifies |urandom_fd| and
// |urandom_buffering|, whose values may be read safely after calling the
// once.
static void init_once(void) {
CRYPTO_STATIC_MUTEX_lock_read(rand_lock_bss_get());
int fd = *urandom_fd_requested_bss_get();
CRYPTO_STATIC_MUTEX_unlock_read(rand_lock_bss_get());
#if defined(USE_NR_getrandom)
int have_getrandom;
uint8_t dummy;
ssize_t getrandom_ret =
boringssl_getrandom(&dummy, sizeof(dummy), GRND_NONBLOCK);
if (getrandom_ret == 1) {
*getrandom_ready_bss_get() = 1;
have_getrandom = 1;
} else if (getrandom_ret == -1 && errno == EAGAIN) {
// We have getrandom, but the entropy pool has not been initialized yet.
have_getrandom = 1;
} else if (getrandom_ret == -1 && errno == ENOSYS) {
// Fallthrough to using /dev/urandom, below.
have_getrandom = 0;
} else {
// NOTICE: fatal error wiill crash apps
// Other errors are fatal.
perror("getrandom");
abort();
}
...
}
这段代码的意思就是要获得一个“真”随机数。
从代码逻辑来看,使用 ret = syscall(__NR_getrandom, buf, buf_len, flags);
的时候报了ENOSYS\EAGAIN之外的其他的错误。
如若是ENOSYS报错,程序会退化到使用/dev/urandom
获得随机数。
boringssl 这里的处理比较简单,__NR_getrandom遇到未知错误就直接终止进程。arm-v7和arm64在这里的 __NR_getrandom
的值不同,分别是384和278。
应该是用户手机的系统调用被修改过了,以至于出现了意料之外的场景。检查其他版本的boringssl
https://boringssl.googlesource.com/boringssl/+/refs/heads/3112/crypto/fipsmodule/rand/urandom.c
https://boringssl.googlesource.com/boringssl/+/refs/heads/3538/crypto/fipsmodule/rand/urandom.c
// init_once initializes the state of this module to values previously
// requested. This is the only function that modifies |urandom_fd| and
// |urandom_buffering|, whose values may be read safely after calling the
// once.
static void init_once(void) {
CRYPTO_STATIC_MUTEX_lock_read(rand_lock_bss_get());
int fd = *urandom_fd_requested_bss_get();
CRYPTO_STATIC_MUTEX_unlock_read(rand_lock_bss_get());
#if defined(USE_NR_getrandom)
uint8_t dummy;
long getrandom_ret =
syscall(__NR_getrandom, &dummy, sizeof(dummy), GRND_NONBLOCK);
if (getrandom_ret == 1) {
*urandom_fd_bss_get() = kHaveGetrandom;
return;
} else if (getrandom_ret == -1 && errno == EAGAIN) {
fprintf(
stderr,
"getrandom indicates that the entropy pool has not been initialized. "
"Rather than continue with poor entropy, this process will block until "
"entropy is available.\n");
do {
getrandom_ret =
syscall(__NR_getrandom, &dummy, sizeof(dummy), 0 /* no flags */);
} while (getrandom_ret == -1 && errno == EINTR);
if (getrandom_ret == 1) {
*urandom_fd_bss_get() = kHaveGetrandom;
return;
}
}
#endif // USE_NR_getrandom
if (fd == kUnset) {
do {
fd = open("/dev/urandom", O_RDONLY);
} while (fd == -1 && errno == EINTR);
}
if (fd < 0) {
perror("failed to open /dev/urandom");
abort();
}
...
https://boringssl.googlesource.com/boringssl/+/refs/heads/3945/crypto/fipsmodule/rand/urandom.c
从3112到3538的实现都是没有获取到就直接fallback到/dev/urandom
。但是3945到最新的实现是主动抛出了异常。最后我是这样改的:
// init_once initializes the state of this module to values previously
// requested. This is the only function that modifies |urandom_fd| and
// |urandom_buffering|, whose values may be read safely after calling the
// once.
static void init_once(void) {
CRYPTO_STATIC_MUTEX_lock_read(rand_lock_bss_get());
int fd = *urandom_fd_requested_bss_get();
CRYPTO_STATIC_MUTEX_unlock_read(rand_lock_bss_get());
#if defined(USE_NR_getrandom)
int have_getrandom;
uint8_t dummy;
ssize_t getrandom_ret =
boringssl_getrandom(&dummy, sizeof(dummy), GRND_NONBLOCK);
if (getrandom_ret == 1) {
*getrandom_ready_bss_get() = 1;
have_getrandom = 1;
} else if (getrandom_ret == -1 && errno == EAGAIN) {
// We have getrandom, but the entropy pool has not been initialized yet.
have_getrandom = 1;
} else if (getrandom_ret == -1 && errno == ENOSYS) {
// Fallthrough to using /dev/urandom, below.
have_getrandom = 0;
} else {
// Modified by Yeshen at 05/19/2020
// Fallthrough to using /dev/urandom, below.
have_getrandom = 0;
// Modified by Yeshen at 05/19/2020
}
...
}
已经有fallback机制了,程序能继续实现生成随机数的功能,这里实在是没必要中断进程。