Paper reading (四十一):Deep Learning in Label-free Cell Classification

贾越
2023-12-01

论文题目:Deep Learning in Label-free Cell Classification

scholar 引用:190

页数:16

发表时间:2016.03

发表刊物:nature scientific reports

作者:Claire Lifan Chen, Ata Mahjoubfar,..., Bahram Jalali1 University of California

摘要:

Label-free cell analysis is essential to personalized genomics, cancer diagnostics, and drug development as it avoids adverse effects of staining reagents on cellular viability and cell signaling. However, currently available label-free cell analysis mostly rely only on a single feature and lack sufficient differentiation. Also, the sample size analyzed by these assays is limited due to their low throughput. Here, we integrate feature extraction and deep learning with high-throughput quantitative imaging enabled by photonic time stretch, achieving record high accuracy in label-free cell classification. Our system captures quantitative optical phase and intensity images and extracts multiple biophysical features of individual cells. These biophysical measurements from a hyperdimensional feature space in which supervised learning is performed for cell classification. We compare various learning algorithms including artificial neural network, support vector machine, logistic regression, and a novel deep learning pipeline, which adopts global optimization of receiver operating characteristics. As a validataion of the enhanced sensitivity and specificity of our system, we show classification of white blood T-cells against colon cancer cells, as well as lipid accumulating algal strains for biofuel production. This system opens up a new path to data-driven phentypic diagnose and better understanding of the heterogeneous gene expressions in cells.

结论:

  • TS-QPI relies on spectral multiplexing to capture simultaneously both phase and intensity quantitative images in a single measurement, generating a wealth of information of each individual cell and eliminating the need for labeling with biomarkers. 
  • we summarized the information content of these images in a set of 16 features for each cell, and performed classification in the hyperdimensional space composed of these features. 
  • We demonstrated application of various learning algorithms including deep neural networks, support vector machine, logistic regression, naive Bayes, as well as a new training method based on area under the ROC curve. 使用了5种方法
  • classification accuracy by using the TS-QPI hyperdimensional space is more than 17% better than the conventional size-based techniques.
  • Our system paves the way to cellular phenotypic analysis as well as data-driven diagnostics
  • is a valuable tool for high-throughput label-free cell screening in medical, biotechnological, and research applications.

Introduction:

  • Deep learning extracts patterns and knowledge from rich multidimenstional datasets. 
  • flow cytometry is a powerful tool for large-scale cell analysis due to its ability to measure anisotropic elastic light scattering of millions of individual cells as well as emission of fluorescent labels conjugated to cells.
  • In addition to classification accuracy, the throughput is another critical specification of a flow cytometer. 
  • there is a fundamental trade-off between throughput and accuracy in any measurement system
  • our group has developed a label-free imaging flow-cytometry technique based on coherent optical implementation of the photonic time stretch concept
  • This instrument overcomes the trade-off between sensitivity and speed by using Amplified Time-stretch Dispersive Fourier Transform
  • On another note, surface markers used to label cells, such as EpCAM, are unavailable in some applications
  • The multiplexed biophysical features thus lead to information-rich hyper-dimensional representation of the cells for label-free classification with high statistical precision.
  • We further improved the accuracy, repeatability, and the balance between sensitivity and specificity of our label-free cell classification by a novel machine learning pipeline, which harnesses the advantages of multivariate supervised learning, as well as unique training by evolutionary global optimization of receiver operating characteristics (ROC). 之前提到的5种方法的最后一种效果最好?

正文组织架构:

1. Introduction

2. Results

2.1 Time Stretch Quantitative Phase Imaging

2.2 Feature Extraction

2.3 Machine Learning

2.4 Demonstration in Classification of Cancerous Cells

2.5 Demonstration in Algal Lipid Content Classification

3. Discussion

4. Conclusion

5. Methods

5.1 Time Stretch Quantitative Phase Imaging (TS-QPI) System

5.2 Coherent Detection and Phase Extraction

5.3 Cell Transmittance Extraction

5.4 Image Reconstruction

5.5 Big Data Analytics Pipeline

5.6 Data Cleaning

正文部分内容摘录:

1. Biological Problem: What biological problems have been solved in this paper?

  • label-free cell classification

2. Main discoveries: What is the main discoveries in this paper?

  • time stretch quantitative phase imaging (TS-QPI)
  • classification accuracy by using the TS-QPI hyperdimensional space is more than 17% better than the conventional size-based techniques.
  • Our system paves the way to cellular phenotypic analysis as well as data-driven diagnostics
  • Our AUC-based deep learning model (DNN + AUC) has both the highest accuracy and consistency against support vector machine (SVM) with Gaussian kernel, logistic regression (LR), naive Bayes, and conventional deep neural network trained by cross entropy and backpropagation (DNN).

3. ML(Machine Learning) Methods: What are the ML methods applied in this paper?

  • Input: The multiplexed biophysical features thus lead to information-rich hyper-dimensional representation of the cells for label-free classification with high statistical precision.
  • expert designed features 那就是人工设计的特征啊~  传统machine learning?
  • A total of 16 features are chosen among the features extracted from fusion of optical phase and loss images of each cell. 
  • Machine learning pipeline. Information of quantitative optical phase and loss images are fused to extract multivariate biophysical features of each cell, which are fed into a fully-connected neural network
  • The network is composed of multiple hidden layers, which automatically learn representations of the data at different levels of abstraction, and thus, is considered a form of deep learning
  • The deep neural networks in our experiments have 3 hidden layers with 48 ReLUs in each.
  • For a trained classifier in hyperspace, receiver operating characteristics (ROC) curve describes the sensitivity and specificity of a classifier collection that includes nonlinear classifiers scaled in the direction of their normal vector field.
  • ROC highlights the trade-off between sensitivity and specificity and the area under ROC (AUC) provides a quantitative robust measure of classifier performance

4. ML Advantages: Why are these ML methods better than the traditional methods in these biological problems?

  • Deep learning extracts patterns and knowledge from rich multidimenstional datasets. 
  • Furthermore, the interquartile range of the balanced accuracy (shown with box plot) is the smallest for the regularized AUC-based deep learning model, which confirms its consistency and repeatability are the best among learning methods.
  • The mean accuracies of all learning models are beyond 85%, reflecting the advantages of simultaneous hyperdimensional biophysical features that TS-QPI provides for label-free cell classification. Furthermore, the interquartile range of the balanced accuracy (shown with box plot) is the smallest for the regularized AUC-based deep learning model, which confirms its consistency and repeatability are the best among learning methods.
  • The AUC parameter serves as an effective analysis metric for finding the best classifier collection and has been proven to be advantageous over the mean square error for evaluating learning algorithms
  • It is clear that additional dimensions improve distinguishment among different cell types compared to individual features.

5. Biological Significance: What is the biological significance of these ML methods’ results?

  • To visualize the multivariate classification results, data points are depicted in the space of the first two PCA components.

6. Prospect: What are the potential applications of these machine learning methods in biological science?

  • Our system paves the way to cellular phenotypic analysis as well as data-driven diagnostics
  • is a valuable tool for high-throughput label-free cell screening in medical, biotechnological, and research applications.

7. Mine Question(Optional)

  • DNN+AUC, DNN  figure8,看起来最佳性能差不多啊,range and quartiles 也许除了DNN+AUC,还有其他的方法可以提升性能。
 类似资料: