COVID-Net工程源码详解(二) - COVIDx.md解析

袁单鹗
2023-12-01

docs/COVIDx.md内容如下:

# COVIDx Dataset
**Update 06/26/2020: Released new dataset with over 14000 CXR images containing 473 COVID-19 train samples. Test dataset remains the same for consistency.**\
**Update 05/13/2020: Released new dataset with 258 COVID-19 train and 100 COVID-19 test samples. There are constantly new xray images being added to covid-chestxray-dataset, Figure1, Actualmed and COVID-19 radiography database so we included train_COVIDx3.txt and test_COVIDx3.txt, which are the xray images we used for training and testing of the CovidNet-CXR3 models.**

The current COVIDx dataset is constructed by the following open source chest radiography datasets:
* https://github.com/ieee8023/covid-chestxray-dataset
* https://github.com/agchung/Figure1-COVID-chestxray-dataset
* https://github.com/agchung/Actualmed-COVID-chestxray-dataset
* https://www.kaggle.com/tawsifurrahman/covid19-radiography-database
* https://www.kaggle.com/c/rsna-pneumonia-detection-challenge (which came from: https://nihcc.app.box.com/v/ChestXray-NIHCC)

<!--We especially thank the Radiological Society of North America, National Institutes of Health, Figure1, Actualmed, M.E.H. Chowdhury et al., Dr. Joseph Paul Cohen and the team at MILA involved in the COVID-19 image data collection project for making data available to the global community.-->

## Steps to generate the dataset

1. Download the datasets listed above
 * `git clone https://github.com/ieee8023/covid-chestxray-dataset.git`
 * `git clone https://github.com/agchung/Figure1-COVID-chestxray-dataset.git`
 * `git clone https://github.com/agchung/Actualmed-COVID-chestxray-dataset.git`
 * go to this [link](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database) to download the COVID-19 Radiography database. Only the COVID-19 image folder and metadata file is required. The overlaps between covid-chestxray-dataset are handled
 * go to this [link](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data) to download the RSNA pneumonia dataset
2. Create a `data` directory and within the data directory, create a `train` and `test` directory
3. Use [create\_COVIDx\_v3.ipynb](../create_COVIDx_v3.ipynb) to combine the three dataset to create COVIDx. Make sure to remember to change the file paths.
4. We provide the train and test txt files with patientId, image path and label (normal, pneumonia or COVID-19). The description for each file is explained below:
 * [train\_COVIDx2.txt](../train_COVIDx3.txt): This file contains the samples used for training COVIDNet-CXR.
 * [test\_COVIDx2.txt](../test_COVIDx3.txt): This file contains the samples used for testing COVIDNet-CXR.

## COVIDx data distribution

Chest radiography images distribution
|  Type | Normal | Pneumonia | COVID-19 | Total |
|:-----:|:------:|:---------:|:--------:|:-----:|
| train |  7966  |    5459   |   473    | 13898 |
|  test |   100  |     100   |   100    |   300 |

Patients distribution
|  Type | Normal | Pneumonia | COVID-19 |  Total |
|:-----:|:------:|:---------:|:--------:|:------:|
| train |  7966  |    5444   |    320   |  13730 |
|  test |   100  |      98   |     74   |    272 |

# COVIDx Dataset

COVIDx数据集


**Update 06/26/2020: Released new dataset with over 14000 CXR images containing 473 COVID-19 train samples. Test dataset remains the same for consistency.**\

2020年6月26日更新:发布新的数据集,此数据集有超过14000个胸部X光图像,其中包含473个COVID训练样例。测试数据集出于一致性考虑和之前一样。


**Update 05/13/2020: Released new dataset with 258 COVID-19 train and 100 COVID-19 test samples. There are constantly new xray images being added to covid-chestxray-dataset, Figure1, Actualmed and COVID-19 radiography database so we included train_COVIDx3.txt and test_COVIDx3.txt, which are the xray images we used for training and testing of the CovidNet-CXR3 models.**

2020年5月13日更新:发布新的数据集,此数据集有258个COVID-19训练样例和100个COVID-19测试样例。总有新的X光图像被添加到covid-chestxray-dataset数据集中来,Figure1,Actualmed和COVID-19 radiography database,所以我们加入了train_COVIDx3.txt和test_COVIDx3.txt,这2个文件是我们用来训练和测试CovidNet-CXR3模型的X光图像。

 

The current COVIDx dataset is constructed by the following open source chest radiography datasets:

现在的COVIDx数据集由以下开源胸部X光数据集组成:


* https://github.com/ieee8023/covid-chestxray-dataset
* https://github.com/agchung/Figure1-COVID-chestxray-dataset
* https://github.com/agchung/Actualmed-COVID-chestxray-dataset
* https://www.kaggle.com/tawsifurrahman/covid19-radiography-database
* https://www.kaggle.com/c/rsna-pneumonia-detection-challenge (来自于: https://nihcc.app.box.com/v/ChestXray-NIHCC)

 

<!--We especially thank the Radiological Society of North America, National Institutes of Health, Figure1, Actualmed, M.E.H. Chowdhury et al., Dr. Joseph Paul Cohen and the team at MILA involved in the COVID-19 image data collection project for making data available to the global community.-->

我们特别感谢北美放射学会,美国国立卫生研究院,Figure1,M.E.H. Chowdhury等机构以及蒙特利尔学习算法学院的约瑟夫·保罗·科恩博士及其团队在COVID-19图像数据集收集,使其能够为公共所使用的工程中所作的贡献。

 

## Steps to generate the dataset

生成数据集的步骤

1. Download the datasets listed above

1. 下载上边列出的数据集(实际上在github工程中也可以选择下载zip,而无需通过下边git clone命令下载)
 * `git clone https://github.com/ieee8023/covid-chestxray-dataset.git`
 * `git clone https://github.com/agchung/Figure1-COVID-chestxray-dataset.git`
 * `git clone https://github.com/agchung/Actualmed-COVID-chestxray-dataset.git`
 * go to this [link](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database) to download the COVID-19 Radiography database. Only the COVID-19 image folder and metadata file is required. The overlaps between covid-chestxray-dataset are handled
 * go to this [link](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data) to download the RSNA pneumonia dataset


2. Create a `data` directory and within the data directory, create a `train` and `test` directory

2. 创建“data”文件夹并在其下创建“train”和“test”文件夹


3. Use [create\_COVIDx\_v3.ipynb](../create_COVIDx_v3.ipynb) to combine the three dataset to create COVIDx. Make sure to remember to change the file paths.

3. 使用create_COVIDx_v3.ipynb来合并3个数据集以创建COVIDx。记得修改文件路径。(实际上这一步骤已经不适合最新的工程源码了,最新的文件为create_COVIDx.ipynb,并且使用的数据集已经扩大到了5个。)


4. We provide the train and test txt files with patientId, image path and label (normal, pneumonia or COVID-19). The description for each file is explained below:

4. 我们提供了带有患者ID,图像路径和标签(正常,普通肺炎或者新冠肺炎)的训练和测试文本文件。每个文件的详细描述如下:
 * [train\_COVIDx2.txt](../train_COVIDx3.txt): This file contains the samples used for training COVIDNet-CXR.
 * [test\_COVIDx2.txt](../test_COVIDx3.txt): This file contains the samples used for testing COVIDNet-CXR.

[train_COVIDx2.txt](../train_COVIDx3.txt):此文件包含了用于训练COVIDNet-CXR的样本。

[test_COVIDx2.txt](../test_COVIDx3.txt):此文件包含了用于测试COVIDNet-CXR的样本。

(实际上目前已经到train_COVIDx4.txt和test_COVIDx4.txt了。)

 

## COVIDx data distribution

COVIDx数据分布

Chest radiography images distribution

胸部X光图像分布
|  Type | Normal | Pneumonia | COVID-19 | Total |
|:-----:|:------:|:---------:|:--------:|:-----:|
| train |  7966  |    5459   |   473    | 13898 |
|  test |   100  |     100   |   100    |   300 |

Patients distribution

患者分布
|  Type | Normal | Pneumonia | COVID-19 |  Total |
|:-----:|:------:|:---------:|:--------:|:------:|
| train |  7966  |    5444   |    320   |  13730 |
|  test |   100  |      98   |     74   |    272 |

 类似资料: