最近在学习《机器学习实战》这本书时,朴素贝叶斯那里遇到了这样的问题。
def train_native_bayes(train_matrix,train_category):
num_train_docs=len(train_matrix)
num_words=len(train_matrix[0])
p=sum(train_category)/float(num_train_docs)
p_0_num=zeros(num_words)
p_1_num=zeros(num_words)
p_0_denom=0.0
p_1_denom=0.0
for i in range(num_train_docs):
if train_category[i]==1:
p_1_num+=train_matrix[i]
p_1_denom+=sum(train_matrix[i])
else:
p_0_num+=train_matrix[i]
p_0_denom+=sum(train_matrix[i])
p_1_vector=log(p_1_num/p_1_denom)
p_0_vector=log(p_0_num/p_0_denom)
return p_0_vector,p_1_vector,p
然后运行时出现了下面的问题:
F:\PycharmProject\bayes_practice_1.py:74: RuntimeWarning: divide by zero encountered in log
p_1_vector=log(p_1_num/p_1_denom)
F:\PycharmProject\bayes_practice_1.py:75: RuntimeWarning: divide by zero encountered in log
p_0_vector=log(p_0_num/p_0_denom)
F:\PycharmProject\bayes_practice_1.py:84: RuntimeWarning: invalid value encountered in multiply
p_1 = sum(need_to_classify_vector * p_1_vector) + log(p_class) #element-wise mult
F:\PycharmProject\bayes_practice_1.py:85: RuntimeWarning: invalid value encountered in multiply
p_0 = sum(need_to_classify_vector * p_0_vector) + log(1.0 - p_class)
虽然不影响最终的结果,但是警告看起来让人不舒服。
我们排查原因,是存在数字太小的原因,溢出,计算过程中出现-inf,再做其他运算,结果还是-inf。
比如我们展示一下结果:
train_mat=[]
for i in dataset:
train_mat.append(set_of_words_vector(my_vacab_set,i))
p_0_vector,p_1_vector,p=train_native_bayes(train_mat,class_vector)
print(p_0_vector)
结果如下:
[-3.17805383 -3.17805383 -3.17805383 -inf -3.17805383 -2.48490665
-3.17805383 -3.17805383 -inf -3.17805383 -3.17805383 -3.17805383
-inf -inf -inf -inf -3.17805383 -inf
-3.17805383 -3.17805383 -inf -inf -3.17805383 -2.07944154
-3.17805383 -3.17805383 -inf -3.17805383 -3.17805383 -inf
-3.17805383 -3.17805383]
当概率很小时,取对数后结果趋于负无穷大。
我们改变浮点数的精度为1e-5
p_1_vector=log(p_1_num/p_1_denom+1e-5)
p_0_vector=log(p_0_num/p_0_denom+1e-5)
这样就不会再报错,结果也没有-inf了。
[ -3.17781386 -3.17781386 -3.17781386 -3.17781386 -3.17781386
-3.17781386 -3.17781386 -3.17781386 -2.07936154 -3.17781386
-11.51292546 -3.17781386 -11.51292546 -11.51292546 -3.17781386
-3.17781386 -11.51292546 -3.17781386 -3.17781386 -11.51292546
-11.51292546 -11.51292546 -11.51292546 -11.51292546 -3.17781386
-3.17781386 -11.51292546 -2.48478666 -3.17781386 -3.17781386
-11.51292546 -3.17781386]