DATA-SCIENCE-BOWL-2018

授权协议 GPL-3.0 License
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 游高杰
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

DATA-SCIENCE-BOWL-2018

Find the nuclei in divergent images to advance medical discovery

Spot Nuclei. Speed Cures.

Imagine speeding up research for almost every disease, from lung cancer and heart disease to rare disorders. The 2018 Data Science Bowl offers our most ambitious mission yet: create an algorithm to automate nucleus detection.

We’ve all seen people suffer from diseases like cancer, heart disease, chronic obstructive pulmonary disease, Alzheimer’s, and diabetes. Many have seen their loved ones pass away. Think how many lives would be transformed if cures came faster.

By automating nucleus detection, you could help unlock cures faster—from rare disorders to the common cold.

Deep Learning Tutorial for Kaggle Find the nuclei in divergent images to advance medical discovery competition, using Keras

This tutorial shows how to use Keras library to build deep neural network for Find the nuclei in divergent images to advance medical discoveryMore info on this Kaggle competition can be found on https://www.kaggle.com/c/data-science-bowl-2018.

This deep neural network achieves ~0.302 score on the leaderboard based on test images,and can be a good staring point for further, more serious approaches.

The architecture was inspired by U-Net: Convolutional Networks for Biomedical Image Segmentation.

Overview

Data

cd data
  mkdir stage1_train stage1_test
  unzip stage1_train.zip -d stage1_train/
  unzip stage1_test.zip -d stage1_test/

Data for the competition is available in the data folder.data_util.py just loads the images and saves them into NumPy binary format files .npy for faster loading later.

Pre-processing

The images are not pre-processed in any way,except resizing 256 x 256

Run the model

python main.py

It will train,predict and generate submission file

Run the Data-science-bowl-2018 notebook on Google colab

  1. Download the Data-science-bowl-2018.ipynb notebook from this repo
  2. Goto Colab
  3. Goto File-->Upload Notebook . Upload the notebook
  4. Goto menu Runtime-->Change runtime and select HardWare accelerator GPU (Free Nvidia K80 GPU from google,it can run continues for 12hrs)
  5. Execute all cells and download the submission files from colab (Codes included at end of Notebook for downloading files to local system from colab)

To learn more about colab Click Here

Training

The model is trained for 150 epochs,where each epoch took 8sec on NVIDIA K80 GPU

loss function used keras binary_cross_entropy

The weights are updated by Adam optimizer, with a 1e-5 learning rate.

Dependencies

  • skimage
  • Tensorflow
  • Keras >= 2.1.2
  • Pandas

Python version 3

Model

The provided model is basically a convolutional auto-encoder, but with a twist - it has skip connections from encoder layers to decoder layers that are on the same "level".See picture below (note that image size and numbers of convolutional filters in this tutorial differs from the original U-Net architecture).

This deep neural network is implemented with Keras functional API, which makes it extremely easy to experiment with different interesting architectures.

Output from the network is a 128 x 128 which represents mask that should be learned. Sigmoid activation functionmakes sure that mask pixels are in [0, 1] range.

Model Constructed using KERAS API

  • ____tz_zs   前段时间参加了 kaggle 2018 data science bowl ,初生牛犊不怕虎,于是我撸起袖子就开始干了。 尽管,没能得到好的结果,参与过程中的收获和提高,也是很值得高兴的。 这里记录下这次的失败,以便下次吸取教训、更进一步。 同时,也希望能够帮到那些看到我这篇博客的新人朋友。   一个项目的步骤分为:数据预处理、模型构造、模型训练、模型评估 总体思路:训练一

  • 这次比赛的重点,在于过拟合问题和模型的泛化能力。 我就是疏忽了这一点,然后在Public leaderboard上是13名,然后最后成绩上,调到800多名。 7th总结 final models was an emsemble: 0.3LGB, 0.3 CATB, 0.4NN 模型总类要多,这三种对于大数据最常见 20 fold-bagging for all models, NN additio

  • 1、读取数据 import pandas as pd labels_df = pd.read_csv('/home/zengxl/datasets/stage1_') ERROR -- ValueError: Only call `softmax_cross_entropy_with_logits` with named arguments (labels=..., logits=...,

 相关资料
  • 数据是新的石油。 该声明显示了如何通过捕获,存储和分析满足各种需求的数据来驱动每个现代IT系统。 无论是为商业做出决定,预测天气,研究生物学中的蛋白质结构还是设计营销活动。 所有这些情景都涉及使用数学模型,统计数据,图表,数据库以及数据分析背后的商业或科学逻辑的多学科方法。 因此,我们需要一种能够满足数据科学所有这些不同需求的编程语言。 Python就像一种语言一样闪亮,因为它拥有众多的库和内置功

  • Complete-Data-Science-Toolkits The overall objective of this toolkit is to provide and offer a free collection of data analysis and machine learning that is specifically suited for doing data science.

  • Data Science Learning Repository of code, resources and utilities related to different data science and machine learning topics. For learning, practicing and teaching purposes. Utils can be installed

  • Best Data Science Resources Hey, Data Enthusiasts out there! Finally, after lots of requests from the community I finally came up with the best free Data Science Resouces which will equip you with all

  • Data Science Collected Resources A trove of carefully curated resources and links (on the topics of software, platforms, language, techniques, etc.) related to data science, all in one place. Please f

  • data-science-ipython-notebooks Index deep-learning tensorflow theano keras caffe scikit-learn statistical-inference-scipy pandas matplotlib numpy python-data kaggle-and-business-analyses spark mapredu