当前位置: 首页 > 工具软件 > BoringSSL > 使用案例 >

记一次 boringssl crash

许沛
2023-12-01

背景是我们的项目有用到boringssl。然后用户(魅族手机)报了一个bug,使用arm64的库的时候,会crash。但是使用armv7的库时不会。

我找了日志,定位到代码crash的代码在

https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/fipsmodule/rand/urandom.c#171

// init_once initializes the state of this module to values previously
// requested. This is the only function that modifies |urandom_fd| and
// |urandom_buffering|, whose values may be read safely after calling the
// once.
static void init_once(void) {
  CRYPTO_STATIC_MUTEX_lock_read(rand_lock_bss_get());
  int fd = *urandom_fd_requested_bss_get();
  CRYPTO_STATIC_MUTEX_unlock_read(rand_lock_bss_get());
#if defined(USE_NR_getrandom)
  int have_getrandom;
  uint8_t dummy;
  ssize_t getrandom_ret =
      boringssl_getrandom(&dummy, sizeof(dummy), GRND_NONBLOCK);
  if (getrandom_ret == 1) {
    *getrandom_ready_bss_get() = 1;
    have_getrandom = 1;
  } else if (getrandom_ret == -1 && errno == EAGAIN) {
    // We have getrandom, but the entropy pool has not been initialized yet.
    have_getrandom = 1;
  } else if (getrandom_ret == -1 && errno == ENOSYS) {
    // Fallthrough to using /dev/urandom, below.
    have_getrandom = 0;
  } else {
    
    // NOTICE: fatal error wiill crash apps

    // Other errors are fatal.
    perror("getrandom");
    abort();
  }

  ...
}

这段代码的意思就是要获得一个“真”随机数。

从代码逻辑来看,使用 ret = syscall(__NR_getrandom, buf, buf_len, flags); 的时候报了ENOSYS\EAGAIN之外的其他的错误。
如若是ENOSYS报错,程序会退化到使用/dev/urandom 获得随机数。
boringssl 这里的处理比较简单,__NR_getrandom遇到未知错误就直接终止进程。arm-v7和arm64在这里的 __NR_getrandom 的值不同,分别是384和278。

应该是用户手机的系统调用被修改过了,以至于出现了意料之外的场景。检查其他版本的boringssl

https://boringssl.googlesource.com/boringssl/+/refs/heads/3112/crypto/fipsmodule/rand/urandom.c
https://boringssl.googlesource.com/boringssl/+/refs/heads/3538/crypto/fipsmodule/rand/urandom.c

// init_once initializes the state of this module to values previously
// requested. This is the only function that modifies |urandom_fd| and
// |urandom_buffering|, whose values may be read safely after calling the
// once.
static void init_once(void) {
  CRYPTO_STATIC_MUTEX_lock_read(rand_lock_bss_get());
  int fd = *urandom_fd_requested_bss_get();
  CRYPTO_STATIC_MUTEX_unlock_read(rand_lock_bss_get());
#if defined(USE_NR_getrandom)
  uint8_t dummy;
  long getrandom_ret =
      syscall(__NR_getrandom, &dummy, sizeof(dummy), GRND_NONBLOCK);
  if (getrandom_ret == 1) {
    *urandom_fd_bss_get() = kHaveGetrandom;
    return;
  } else if (getrandom_ret == -1 && errno == EAGAIN) {
    fprintf(
        stderr,
        "getrandom indicates that the entropy pool has not been initialized. "
        "Rather than continue with poor entropy, this process will block until "
        "entropy is available.\n");
    do {
      getrandom_ret =
          syscall(__NR_getrandom, &dummy, sizeof(dummy), 0 /* no flags */);
    } while (getrandom_ret == -1 && errno == EINTR);
    if (getrandom_ret == 1) {
      *urandom_fd_bss_get() = kHaveGetrandom;
      return;
    }
  }
#endif  // USE_NR_getrandom
  if (fd == kUnset) {
    do {
      fd = open("/dev/urandom", O_RDONLY);
    } while (fd == -1 && errno == EINTR);
  }
  if (fd < 0) {
    perror("failed to open /dev/urandom");
    abort();
  }
...

https://boringssl.googlesource.com/boringssl/+/refs/heads/3945/crypto/fipsmodule/rand/urandom.c

从3112到3538的实现都是没有获取到就直接fallback到/dev/urandom。但是3945到最新的实现是主动抛出了异常。最后我是这样改的:

// init_once initializes the state of this module to values previously
// requested. This is the only function that modifies |urandom_fd| and
// |urandom_buffering|, whose values may be read safely after calling the
// once.
static void init_once(void) {
  CRYPTO_STATIC_MUTEX_lock_read(rand_lock_bss_get());
  int fd = *urandom_fd_requested_bss_get();
  CRYPTO_STATIC_MUTEX_unlock_read(rand_lock_bss_get());
#if defined(USE_NR_getrandom)
  int have_getrandom;
  uint8_t dummy;
  ssize_t getrandom_ret =
      boringssl_getrandom(&dummy, sizeof(dummy), GRND_NONBLOCK);
  if (getrandom_ret == 1) {
    *getrandom_ready_bss_get() = 1;
    have_getrandom = 1;
  } else if (getrandom_ret == -1 && errno == EAGAIN) {
    // We have getrandom, but the entropy pool has not been initialized yet.
    have_getrandom = 1;
  } else if (getrandom_ret == -1 && errno == ENOSYS) {
    // Fallthrough to using /dev/urandom, below.
    have_getrandom = 0;
  } else {
    
    // Modified by Yeshen at 05/19/2020 
	// Fallthrough to using /dev/urandom, below.
    have_getrandom = 0;
    // Modified by Yeshen at 05/19/2020

  }

  ...
}

已经有fallback机制了,程序能继续实现生成随机数的功能,这里实在是没必要中断进程。

 类似资料: