当前位置: 首页 > 面试题库 >

Keras精度不变

窦宏旷
2023-03-14
问题内容

我有数千个音频文件,我想使用Keras和Theano对它们进行分类。到目前为止,我为每个音频文件生成了一个28x28的声谱图(可能更大一些,但我现在只是想让算法起作用),然后将图像读入矩阵。因此,最终我得到了这个大图像矩阵,以馈入网络进行图像分类。

在一个教程中,我找到了这个mnist分类代码:

import numpy as np

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense
from keras.utils import np_utils

batch_size = 128
nb_classes = 10
nb_epochs = 2

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
X_train /= 255
X_test /= 255

print(X_train.shape[0], "train samples")
print(X_test.shape[0], "test samples")

y_train = np_utils.to_categorical(y_train, nb_classes)
y_test =  np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(Dense(output_dim = 100, input_dim = 784, activation= "relu"))
model.add(Dense(output_dim = 200, activation = "relu"))
model.add(Dense(output_dim = 200, activation = "relu"))
model.add(Dense(output_dim = nb_classes, activation = "softmax"))

model.compile(optimizer = "adam", loss = "categorical_crossentropy")

model.fit(X_train, y_train, batch_size = batch_size, nb_epoch = nb_epochs, show_accuracy = True, verbose = 2, validation_data = (X_test, y_test))
score = model.evaluate(X_test, y_test, show_accuracy = True, verbose = 0)
print("Test score: ", score[0])
print("Test accuracy: ", score[1])

此代码运行,并且我得到预期的结果:

(60000L, 'train samples')
(10000L, 'test samples')
Train on 60000 samples, validate on 10000 samples
Epoch 1/2
2s - loss: 0.2988 - acc: 0.9131 - val_loss: 0.1314 - val_acc: 0.9607
Epoch 2/2
2s - loss: 0.1144 - acc: 0.9651 - val_loss: 0.0995 - val_acc: 0.9673
('Test score: ', 0.099454972004890438)
('Test accuracy: ', 0.96730000000000005)

到现在为止,一切都运行良好,但是,当我将上述算法应用于数据集时,准确性就会陷入困境。

我的代码如下:

import os

import pandas as pd

from sklearn.cross_validation import train_test_split

from keras.models import Sequential
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.core import Dense, Activation, Dropout, Flatten
from keras.utils import np_utils

import AudioProcessing as ap
import ImageTools as it

batch_size = 128
nb_classes = 2
nb_epoch = 10


for i in range(20):
    print "\n"
# Generate spectrograms if necessary
if(len(os.listdir("./AudioNormalPathalogicClassification/Image")) > 0):
    print "Audio files are already processed. Skipping..."
else:
    print "Generating spectrograms for the audio files..."
    ap.audio_2_image("./AudioNormalPathalogicClassification/Audio/","./AudioNormalPathalogicClassification/Image/",".wav",".png",(28,28))

# Read the result csv
df = pd.read_csv('./AudioNormalPathalogicClassification/Result/result.csv', header = None)

df.columns = ["RegionName","IsNormal"]

bool_mapping = {True : 1, False : 0}

nb_classes = 2

for col in df:
    if(col == "RegionName"):
        a = 3      
    else:
        df[col] = df[col].map(bool_mapping)

y = df.iloc[:,1:].values

y = np_utils.to_categorical(y, nb_classes)

# Load images into memory
print "Loading images into memory..."
X = it.load_images("./AudioNormalPathalogicClassification/Image/",".png")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

X_train = X_train.reshape(X_train.shape[0], 784)
X_test = X_test.reshape(X_test.shape[0], 784)
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
X_train /= 255
X_test /= 255

print("X_train shape: " + str(X_train.shape))
print(str(X_train.shape[0]) + " train samples")
print(str(X_test.shape[0]) + " test samples")

model = Sequential()


model.add(Dense(output_dim = 100, input_dim = 784, activation= "relu"))
model.add(Dense(output_dim = 200, activation = "relu"))
model.add(Dense(output_dim = 200, activation = "relu"))
model.add(Dense(output_dim = nb_classes, activation = "softmax"))

model.compile(loss = "categorical_crossentropy", optimizer = "adam")

print model.summary()

model.fit(X_train, y_train, batch_size = batch_size, nb_epoch = nb_epoch, show_accuracy = True, verbose = 1, validation_data = (X_test, y_test))
score = model.evaluate(X_test, y_test, show_accuracy = True, verbose = 1)
print("Test score: ", score[0])
print("Test accuracy: ", score[1])

AudioProcessing.py

import os
import scipy as sp
import scipy.io.wavfile as wav
import matplotlib.pylab as pylab
import Image

def save_spectrogram_scipy(source_filename, destination_filename, size):
    dt = 0.0005
    NFFT = 1024       
    Fs = int(1.0/dt)  
    fs, audio = wav.read(source_filename)
    if(len(audio.shape) >= 2):
        audio = sp.mean(audio, axis = 1)
    fig = pylab.figure()    
    ax = pylab.Axes(fig, [0,0,1,1])    
    ax.set_axis_off()
    fig.add_axes(ax) 
    pylab.specgram(audio, NFFT = NFFT, Fs = Fs, noverlap = 900, cmap="gray")
    pylab.savefig(destination_filename)
    img = Image.open(destination_filename).convert("L")
    img = img.resize(size)
    img.save(destination_filename)
    pylab.clf()
    del img

def audio_2_image(source_directory, destination_directory, audio_extension, image_extension, size):
    nb_files = len(os.listdir(source_directory));
    count = 0
    for file in os.listdir(source_directory):
        if file.endswith(audio_extension):        
            destinationName = file[:-4]
            save_spectrogram_scipy(source_directory + file, destination_directory + destinationName + image_extension, size)
            count += 1
            print ("Generating spectrogram for files " + str(count) + " / " + str(nb_files) + ".")

ImageTools.py

import os
import numpy as np
import matplotlib.image as mpimg
def load_images(source_directory, image_extension):
    image_matrix = []
    nb_files = len(os.listdir(source_directory));
    count = 0
    for file in os.listdir(source_directory):
        if file.endswith(image_extension):
            with open(source_directory + file,"r+b") as f:
                img = mpimg.imread(f)
                img = img.flatten()                
                image_matrix.append(img)
                del img
                count += 1
                #print ("File " + str(count) + " / " + str(nb_files) + " loaded.")
    return np.asarray(image_matrix)

所以我运行上面的代码并收到:

Audio files are already processed. Skipping...
Loading images into memory...
X_train shape: (2394L, 784L)
2394 train samples
1027 test samples
--------------------------------------------------------------------------------
Initial input shape: (None, 784)
--------------------------------------------------------------------------------
Layer (name)                  Output Shape                  Param #
--------------------------------------------------------------------------------
Dense (dense)                 (None, 100)                   78500
Dense (dense)                 (None, 200)                   20200
Dense (dense)                 (None, 200)                   40200
Dense (dense)                 (None, 2)                     402
--------------------------------------------------------------------------------
Total params: 139302
--------------------------------------------------------------------------------
None
Train on 2394 samples, validate on 1027 samples
Epoch 1/10
2394/2394 [==============================] - 0s - loss: 0.6898 - acc: 0.5455 - val_loss: 0.6835 - val_acc: 0.5716
Epoch 2/10
2394/2394 [==============================] - 0s - loss: 0.6879 - acc: 0.5522 - val_loss: 0.6901 - val_acc: 0.5716
Epoch 3/10
2394/2394 [==============================] - 0s - loss: 0.6880 - acc: 0.5522 - val_loss: 0.6842 - val_acc: 0.5716
Epoch 4/10
2394/2394 [==============================] - 0s - loss: 0.6883 - acc: 0.5522 - val_loss: 0.6829 - val_acc: 0.5716
Epoch 5/10
2394/2394 [==============================] - 0s - loss: 0.6885 - acc: 0.5522 - val_loss: 0.6836 - val_acc: 0.5716
Epoch 6/10
2394/2394 [==============================] - 0s - loss: 0.6887 - acc: 0.5522 - val_loss: 0.6832 - val_acc: 0.5716
Epoch 7/10
2394/2394 [==============================] - 0s - loss: 0.6882 - acc: 0.5522 - val_loss: 0.6859 - val_acc: 0.5716
Epoch 8/10
2394/2394 [==============================] - 0s - loss: 0.6882 - acc: 0.5522 - val_loss: 0.6849 - val_acc: 0.5716
Epoch 9/10
2394/2394 [==============================] - 0s - loss: 0.6885 - acc: 0.5522 - val_loss: 0.6836 - val_acc: 0.5716
Epoch 10/10
2394/2394 [==============================] - 0s - loss: 0.6877 - acc: 0.5522 - val_loss: 0.6849 - val_acc: 0.5716
1027/1027 [==============================] - 0s
('Test score: ', 0.68490593621422047)
('Test accuracy: ', 0.57156767283349563)

我尝试更改网络,添加更多的纪元,但是无论如何,我总是会得到相同的结果。我不明白为什么我得到同样的结果。

任何帮助,将不胜感激。谢谢。

编辑:我发现一个错误,其中像素值无法正确读取。我将下面的ImageTools.py修复为:

import os
import numpy as np
from scipy.misc import imread

def load_images(source_directory, image_extension):
    image_matrix = []
    nb_files = len(os.listdir(source_directory));
    count = 0
    for file in os.listdir(source_directory):
        if file.endswith(image_extension):
            with open(source_directory + file,"r+b") as f:
                img = imread(f)                
                img = img.flatten()                        
                image_matrix.append(img)
                del img
                count += 1
                #print ("File " + str(count) + " / " + str(nb_files) + " loaded.")
    return np.asarray(image_matrix)

现在我实际上得到的灰度像素值是从0到255,所以现在将其除以255是有意义的。但是,我仍然得到相同的结果。


问题答案:

最可能的原因是优化器不适合您的数据集。这是文档中的Keras优化器列表。

我建议您首先尝试使用默认参数值的SGD。如果仍然不起作用,请将学习率除以10。如有必要,请几次。如果您的学习率达到1e-6,但仍然不起作用,那么您还有另一个问题。

总之,替换此行:

model.compile(loss = "categorical_crossentropy", optimizer = "adam")

有了这个:

from keras.optimizers import SGD
opt = SGD(lr=0.01)
model.compile(loss = "categorical_crossentropy", optimizer = opt)

如果不起作用,请更改几次学习率。

如果这是问题所在,那么您应该会看到损失仅在短短几个时期之后就降低了。



 类似资料:
  • 我很难用Keras微调Inception模型。 我已经成功地使用教程和文档生成了一个完全连接的顶层模型,该模型从一开始就使用瓶颈特性将我的数据集分类为适当的类别,准确率超过99%。 纪元50/50 1424/1424 [==============================] - 17s 12ms/步长-损耗:0.0398-acc:0.9909 我还能够将此模型堆叠在《盗梦空间》的顶部,以创

  • 我的Keras CNN模型(基于AlexNet的一个实现)的训练精度总是接近0.5(在+-0.02以内),验证精度总是精确的0.5。它是一个二进制分类模型,其中train/val的分割大约为85/15,并且在这两个集合中,图像对每个类进行50/50的分割。

  • 问题内容: 文档的内容如下(强调我的意思)。 此方法只能用于测量经过的时间,与系统或挂钟时间的任何其他概念无关。返回的值表示自某个固定但任意时间以来的纳秒(也许是将来的时间,因此值可能为负)。 此方法提供纳秒精度,但不一定提供纳秒精度。 无法保证值更改的频率。 如我所见,这可以用两种不同的方式解释: 在句子中 大胆 上面是指个人的返回值。然后,将在数字意义上理解精度和准确性。也就是说,精度是指有效

  • 问题内容: java中双值的乘法运算符的保证精度是多少? 例如,2.2 * 100是220.00000000000003,但是220是双精度数。220.00000000000003是220之后的下一个两倍。 问题答案: 乘法工作正常,但不能精确表示为双精度。最接近的双打是: 2.199999999999999733(0x4001999999999999) 2.200000000000000177(

  • 我有一个神经网络,它对3个输出进行分类。我的数据集非常小,我有340张火车图像和60张测试图像。我构建了一个模型,当我编译时,我的结果是: 纪元97/100 306/306 [==============================] - 46s 151ms/阶跃损失: 0.2453-精度: 0.8824-val_loss: 0.3557-val_accuracy: 0.8922纪元98/10

  • 我想要得到一个特定位置附近的所有商店,但似乎我有一个查询字符串的问题。我已经检查了postgis的版本是PostGIS2.5.2_2。我也检查过,看看经度和纬度是否有双精度。 我尝试将查询重写到不同的查询字符串中,但仍然得到相同的错误。 我的实体: 我的存储库接口: 在postgres数据库中工作,但在java应用程序中,它返回一个错误: psqlException:error:“:”处或附近的语