问题：

使用ImageDataGenerator时的Keras分割列车测试集

马泓

2023-03-14

我有一个目录，其中包含图像的子文件夹（根据标签）。我想在Keras中使用ImageDataGenerator时将此数据拆分为训练集和测试集。虽然是模型。keras中的fit（）具有用于指定拆分的参数验证\u split，我找不到用于模型的相同参数。安装发电机（）。怎么做？

train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=32,
    class_mode='binary')

model.fit_generator(
    train_generator,
    samples_per_epoch=nb_train_samples,
    nb_epoch=nb_epoch,
    validation_data=??,
    nb_val_samples=nb_validation_samples)

我没有单独的验证数据目录，需要从训练数据中分离出来

共有3个答案

司马狐若

2023-03-14

如果我们在ImageDataGenerator中使用子集，那么相同的增强将应用于培训和验证。如果只想在培训集中应用增强功能，可以使用splitfolders软件包拆分文件夹，该软件包可以使用pip直接安装。

https://pypi.org/project/split-folders/

这将把数据集分为train、val和test目录，然后您可以为它们中的每一个创建单独的生成器。

冷夜洛

2023-03-14

例如，您有这样的文件夹

full_dataset
|--horse (40 images)
|--donkey (30 images)
|--cow ((50 images)
|--zebra (70 images)

第一条路

image_generator = ImageDataGenerator(rescale=1/255, validation_split=0.2)    

train_dataset = image_generator.flow_from_directory(batch_size=32,
                                                 directory='full_dataset',
                                                 shuffle=True,
                                                 target_size=(280, 280), 
                                                 subset="training",
                                                 class_mode='categorical')

validation_dataset = image_generator.flow_from_directory(batch_size=32,
                                                 directory='full_dataset',
                                                 shuffle=True,
                                                 target_size=(280, 280), 
                                                 subset="validation",
                                                 class_mode='categorical')

第二条路

import glob
horse = glob.glob('full_dataset/horse/*.*')
donkey = glob.glob('full_dataset/donkey/*.*')
cow = glob.glob('full_dataset/cow/*.*')
zebra = glob.glob('full_dataset/zebra/*.*')

data = []
labels = []

for i in horse:   
    image=tf.keras.preprocessing.image.load_img(i, color_mode='RGB', 
    target_size= (280,280))
    image=np.array(image)
    data.append(image)
    labels.append(0)
for i in donkey:   
    image=tf.keras.preprocessing.image.load_img(i, color_mode='RGB', 
    target_size= (280,280))
    image=np.array(image)
    data.append(image)
    labels.append(1)
for i in cow:   
    image=tf.keras.preprocessing.image.load_img(i, color_mode='RGB', 
    target_size= (280,280))
    image=np.array(image)
    data.append(image)
    labels.append(2)
for i in zebra:   
    image=tf.keras.preprocessing.image.load_img(i, color_mode='RGB', 
    target_size= (280,280))
    image=np.array(image)
    data.append(image)
    labels.append(3)

data = np.array(data)
labels = np.array(labels)

from sklearn.model_selection import train_test_split
X_train, X_test, ytrain, ytest = train_test_split(data, labels, test_size=0.2,
                                                random_state=42)

第一种方法的主要缺点是，你不能用来显示图片。如果你写validation_dataset[1]会出错。但如果我使用第一种方式：X_test[1]

丁和歌

2023-03-14

Keras现在使用ImageDataGenerator从单个目录添加了序列/验证拆分：

train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2) # set validation split

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    train_data_dir, # same directory as training data
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary',
    subset='validation') # set as validation data

model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = nb_epochs)

https://keras.io/preprocessing/image/

类似资料：

使用Keras评估模型时的测试分数与测试精度

历元1/15 1200/1200 历元2/15 1200/1200[========================================================]-3s-损失：0.5481-Acc:0.7250-Val_Loss:0.4645-Val_Acc:0.8025 3S-损失:0.5078-ACC:0.7558-Val_Loss:0.4354-Val_ACC:0.797
在自定义类中使用列测试分割时的类型错误（单例数组…）

TypeError:单例数组( 当我试图在我的自定义AZHU_EmailClassifier_2类中运行train_test_split函数时，我遇到了这个错误。我的班级：当我运行train_test_split函数时，会触发错误。整个错误消息： -------------------------------------------------------------------------
使用Keras理解多元时间序列分类

我试图理解如何正确地将数据输入到我的keras模型中，以便使用LSTM神经网络将多变量时间序列数据分为三类。我已经看了不同的资源——主要是杰森·布朗利的这三篇优秀的博客文章，其他的SO问题和不同的论文，但是没有一个信息完全符合我的问题案例，我也不知道我的问题是否数据预处理/将其输入模型是正确的，所以我想如果我在这里指定我的确切条件，我可能会得到一些帮助。我试图对多元时间序列数据进行分类，其原始
列车\u测试\u拆分而不是拆分数据

有一个数据帧，它总共由14列组成，最后一列是整数值为0或1的目标标签。我已经定义了- X=df。iloc[：，1:13]——由特征值组成两者的长度相同，X是由13列组成的数据帧，shape（159880，13），y是具有shape（159880，）的数组类型但是，当我在X，y上执行列车测试分割时，该功能无法正常工作。下面是简单的代码- X_序列，y_序列，X_测试，y_测试=序列测试分割（
weka“测试分割预测”未按与数据相同的顺序列出

考虑以下虚构的arff文件：使用WEKA 3-8，在Explorer中打开上述ARFF。单击分类。选择J48分类器，保留所有默认设置。在“测试选项”下，选择“百分比分割=50%”。单击“更多选项”，选择“输出预测”- 点击开始您将看到以下输出： //跳过报告的其余部分... 注意输入arff文件中的最后五个实例是按顺序排列的双赢-输-赢然而，实际输出“测试分割预测”的顺序是：输赢赢赢为什
C++中使用max_element时的分割错误

根据其他建议，取消引用可能会出现问题，但我在调用max_element函数时甚至在取消引用之前就出现了分段错误。最小可复制示例：
HTTP分割/伪造测试 (OTG-INPVAL-016)

Summary This section illustrates examples of attacks that leverage specific features of the HTTP protocol, either by exploiting weaknesses of the web application or peculiarities in the way different
为什么我得到GroupShuffleSplit（列车测试拆分）的错误

我有2个数据集，应用了5个不同的ML模型。数据集1：形状是。（int64，float64）。数据集2：形状是。（int64，float64）。我使用不同的型号。代码是但是我得到了一个常见的错误，比如这个，这个，还有这个。我已经浏览了所有这些帖子，但对错误一无所知。我使用了和

使用ImageDataGenerator时的Keras分割列车测试集

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档