问题：

Numpy到TF记录：有没有更简单的方法来处理来自TF记录的批处理输入？

怀浩大

2023-03-14

我的问题是如何从多个（或分片）TFR记录中获取批输入。我读过这个例子https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L410.以培训集为例，基本管道是：（1）首先生成一系列TF记录（例如，train-000-of-005，train-001-of-005，…），（2）从这些文件名生成一个列表，并将它们输入到tf中。火车string_input_producer获取队列，（3）同时生成一个tf。使用tf进行其他操作。火车批处理加入以生成批处理输入。

我认为这很复杂，我不确定这个过程的逻辑。在我的例子中，我有一个列表。npy文件，我想生成分片tfrecords（多个单独的tfrecords，而不仅仅是一个大文件）。每一个。npy文件包含不同数量的正样本和负样本（2类）。一种基本方法是生成一个大型tfrecord文件。但是文件太大（~20Gb）。所以我求助于碎片记录。有没有更简单的方法？谢谢

共有1个答案

陈项禹

2023-03-14

使用数据集API简化了整个过程。以下是两部分：（1）：将numpy数组转换为tfrecords和（2,3,4）：读取tfrecords以生成批处理。

    def npy_to_tfrecords(...):
       # write records to a tfrecords file
       writer = tf.python_io.TFRecordWriter(output_file)

       # Loop through all the features you want to write
       for ... :
          let say X is of np.array([[...][...]])
          let say y is of np.array[[0/1]]

         # Feature contains a map of string to feature proto objects
         feature = {}
         feature['X'] = tf.train.Feature(float_list=tf.train.FloatList(value=X.flatten()))
         feature['y'] = tf.train.Feature(int64_list=tf.train.Int64List(value=y))

         # Construct the Example proto object
         example = tf.train.Example(features=tf.train.Features(feature=feature))

         # Serialize the example to a string
         serialized = example.SerializeToString()

         # write the serialized objec to the disk
         writer.write(serialized)
      writer.close()

    # Creates a dataset that reads all of the examples from filenames.
    filenames = ["file1.tfrecord", "file2.tfrecord", ..."fileN.tfrecord"]
    dataset = tf.contrib.data.TFRecordDataset(filenames)
    # for version 1.5 and above use tf.data.TFRecordDataset

    # example proto decode
    def _parse_function(example_proto):
      keys_to_features = {'X':tf.FixedLenFeature((shape_of_npy_array), tf.float32),
                          'y': tf.FixedLenFeature((), tf.int64, default_value=0)}
      parsed_features = tf.parse_single_example(example_proto, keys_to_features)
     return parsed_features['X'], parsed_features['y']

    # Parse the record into tensors.
    dataset = dataset.map(_parse_function)  

    # Shuffle the dataset
    dataset = dataset.shuffle(buffer_size=10000)

    # Repeat the input indefinitly
    dataset = dataset.repeat()  

    # Generate batches
    dataset = dataset.batch(batch_size)

    # Create a one-shot iterator
    iterator = dataset.make_one_shot_iterator()

    # Get batch X and y
    X, y = iterator.get_next()

类似资料：

Numpy to TFrecords：是否有更简单的方法来处理来自tfrecords的批量输入？

问题内容：我的问题是关于如何从多个（或分片的）tfrecords获取批处理输入。我已经阅读了示例https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L410。基本的管道，把培训作为集为例，（1）首先产生一系列tfrecords（例如，，，…），从这些文件名（2），生成一个
spring批处理不处理所有记录

我正在使用spring批处理使用RepositoryItemReader从postgresql DB读取记录，然后将其写入主题。我看到大约有100万条记录需要处理，但它并没有处理所有的记录。我已经将reader的pageSize设置为10,000并且与提交间隔（块大小）相同
Spring批处理分区-所有线程处理相同的记录

我正在spring Boot中使用异步任务执行器对数百万条记录的数据进行分区，块大小为1000条，网格大小为10条。为了从数据库中获取特定的分区数据，我正在使用项目读取器的before步骤中的StepExecution获取分区数据的开始和结束索引（来自Partitioner类）。例如：项目阅读器 Item Reader遍历testData列表并将testData值返回给writer TestDa
创建TF记录----ModuleNotFoundError:没有名为“tensorflow”的模块

！Python{SCRIPTS_PATH'/generate_tfrecord.py'}-x{IMAGE_PATH'/火车'}-l{ANNOTATION_PATH'/label_map.pbtxt'}-o{ANNOTATION_PATH'/train.record'} !python{SCRIPTS_PATH'/generate_tfrecord.py'}-x{IMAGE_PATH'/test'}
Spring批处理：一次处理多条记录

我使用的是spring批处理，和通常使用的一样，我有读取器、处理器和写入器。我有两个问题 1>Reader查询所有200条记录（表中记录总大小为200，我给出了pageSize=200)，因此它得到所有200条记录，在处理器中，我们需要所有这些记录的列表，因为我们必须将每个记录与其他199条记录进行比较，以便将它们分组在不同的层中。因此我在想，如果我们能在处理步骤中得到那个列表，我就可以操纵它们
Spring Batch：ItemProcessor不处理所有记录

我的批处理作业不处理所有已读记录。完成作业后，Spring批处理日志中读取了198282条记录，但在处理器中我有一条日志，在开始处理之前只记录了196503条，但有时，处理器处理了所有的记录。步进 Spring启动版本:2.0.1

Numpy到TF记录：有没有更简单的方法来处理来自TF记录的批处理输入？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档