Kaggle-EEG

授权协议 Readme
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 萧卜霸
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

EEG Seizure Prediction

Gareth Paul Jones
3rd place Melbourne University AES/MathWorks/NIH Seizure Prediction
2016

See also:

Description

This code is designed to process the raw data from Melbourne University AES/MathWorks/NIH Seizure Prediction, train a seizureModel (train.m), then predict seizure occurrence from a new test set (predict.m).

Data

The raw data contains 16 channel inter-cranial EEG recordings from 3 patients. It's split in to interictal (background) periods and preictal (before-seizure) periods.

Features

Various feautres are extracted from the raw data, inlcuding:

  • Frequency power in EEG bands
  • Summary statistics in the temporal domain
  • Correlation between channels in the frequency and temporal domains

These features are extracted with various windows sizes (240, 160, and 80s in the 3rd place submission) and are combined in to a single data set before training the models. Processed features are saved to disk for faster subsequent loading.

Models

Two models are fit to the processed data:

  • An RUS Boosted tree ensemble
  • A Quadratic SVM

These models are handled by the seizureModel object and are fit to all the data, rather than individual models being trained for each subject. The predictions of each model are ensembled with a simple mean, which produces a considerably better score than either model alone.

Running

Training and prediction stages can be run independently from their respective scripts, or together from testRun.m. If running from testRun.m paths need to be set in predict.m and train.m first. Warning: testRun.m is designed to run entirely from scratch and deletes all .mat files from the working directory when it starts!

Both predict.m and train.m expect the same directory structure as provided for the competition, and training is specifically written to handle the temporal relationships in this dataset - it would need modification to work correctly with new data.

  • Extract the original Kaggle data to a folder, eg. R:\EEG Data\Original\

  • Extract the second test set released on Kaggle into a folder named New, R:\EEG Data\New\

  • Set paths the paths params.paths.dataDir and params.paths.or in predict.m and train.m

    • params.paths.or should be the path to the "Original" folder created above.
    • params.paths.dataDir should be the "New" folder from above. Data from the original training and test sets will be copied here to create a new training set.
  • Run train.m

    • The first function copyTestLeakToTrain.m creates a new training/test set in params.paths.dataDir. This set will be used for training and the folder structure should look like this:
  • Run predict.m

    • params.paths.dataDir should be the "New" directory, eg R:\EEG Data\New\

Processed features and final submission file are saved in to working directory to save time on subsequent runs.

Scripts

train.m script:

  • Processes raw data
    • Creates new test set from original test and training sets
  • Extracts features and saves in featuresObject (featuresTrain)
  • Trains an SVM and RUS boosted tree ensemble, saves the compact version of these.

predict.m script:

  • Loads trained models (SVM and tree ensemble saved as seizureModel objects)
  • Loads new data
    • Extracts features and saves in a featuresObject (featuresTest)
  • Predicts new data
    • Reduces epoch predictions to segment predictions
    • Ensembles SVM and tree ensemble
  • Saves in to .csv submission file as per Kaggle specification

Classes

featuresObject

  • Handles extraction of features and combination of features generated using different window lenghts.
    seizureModel
  • Handles training of SVM or RBT.
    cvPart
  • Used instead of MATLAB's cvpartition object to handle cross-validation. Allows grouping of subject data from consecutive time periods in the training set, preventing data leak that otherwise leads to over optimistic scoring of the model's performance locally.

Requirements

  • Original Kaggle data or trained models
  • MATLAB 2016b:
  • Statistics and Machine Learning Toolbox

Notes

  • If seeds are now setting correctly, should score ~0.8059 (= 2nd place)
  • Uses new version of featuresObject that holds only one dataset, rather than both train and test sets
  • All parallel processing has been removed for hold out testing
  • All figures should be suppressed in prediction stage

To do

  • Save use structure and params.divS to each seizureModel
  • Add feature descriptions
  • 不做实验能否发表论文?答案当然是肯定的。对于刚进入EEG领域的同学来说,利用网上公开的EEG数据库练练手,顺便发表一些论文是个不错的选择。公开数据库对于促进科学研究的快速发展意义重大,公开数据的建立可以让全世界各国的领域内研究者对某些问题进行更深入更全面的研究。在大数据时代,数据库的开放和共享已逐渐成为研究领域的趋势之一,目前很多国外期刊杂志也鼓励投稿者共享和开放研究数据。笔者很久以前写过一篇类似

  • MI BCI II dataset Ia: 任务:确定受试者是试图产生皮质消极性还是皮质积极性 特点: 多标记 标签数 维度/通道数 多源 小样本 数据规模 时长 N 2 6 N Y 268 train; 293 test 3.5s Birbaumer, N., Flor, H., Ghanayim, N., Hinterberger, T., Iverson, I., Taub, E., Kot

  • 目录 传统方法  基于深度学习 EEGNet: A Compact Convolutional Neural Network for EEG-based Brain-Computer Interfaces MIN2Net_End-to-End Multi-Task Learning for Subject-Independent Motor Imagery EEG Classifification

  • (1)人脑连接组计划(Human Connectome Project, HCP) 该数据库目前被试数约1200人,包括结构MRI、静息态MRI、任务态fMRI、MEG等数据模态,其他数据还包括人口统计学数据、神经心理学数据、基因数据。 网址:http://www.humanconnectome.org/ (2)1000功能连接组计划 1000功能连接组计划(1000 Functional Con

 相关资料
  • Kaggle 是一个网站流量预测项目,项目采用Python语言开发,可以给大家的流量预测建模提供一些思路。 数据模型 Kaggle的训练数据集由大约14.5万套时间序列组成,每一套时间序列代表的是每天不同维基百科文章页的浏览次数,时间记录的周期为2015年7月1日到2017年9月10日。而我们的目标是为了预测2017年9月13日到2017年11月13日之间每天的页面浏览量。其中,需要检测的流量包括

  • For lesson 1, competition is dogs-vs-cats-redux-kernels-edition: pip install kaggle-cli kg config -g -u `username` -p `password` -c `competition` kg download unzip train.zip unzip test.zip Links http

  • Summary Pytorch Kaggle starter is a framework for managing experiments in Kaggle competitions. It reduces time to first submission by providing a suite of helper functions for model training, data loa

  • Hello Kaggle! �� Kaggle의 공식 문서와 캐글 가이드라는 책을 읽고 Kaggle의 정의나 기본적인 사용법들에 대하여 정리해보았습니다. 캐글 가이드는 동양북스라는 출판사에서 나온 책입니다. 책에 대해 궁금하신 분을 위해 링크도 첨부해두겠습니다. 캐글 가이드: 전 세계 데이터 과학자와 소통하고, 경쟁하고, 성장하기 - 동양북스 (동양북스 공식

  • 在本教程竞赛中,我们对情感分析进行了一些 深入 研究。谷歌的 Word2Vec 是一种受深度学习启发的方法,专注于单词的含义。

  • 30-Days-of-ML-Kaggle �� About the Hands On Program �� Machine learning beginner → Kaggle competitor in 30 days. Non-coders welcome The program starts Monday, August 2, and lasts four weeks. It's desig