pytorch-kaggle-starter

Pytorch starter kit for Kaggle competitions
授权协议 MIT License
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 司马高明
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Summary

Pytorch Kaggle starter is a framework for managing experiments in Kaggle competitions. It reduces time to first submission by providing a suite of helper functions for model training, data loading, adjusting learning rates, making predictions, ensembling models, and formatting submissions.

Inside are example Jupyter notebooks walking through how to get strong scores on popular competitions:

These notebooks outline basic, single-model submissions. Scores can be improved significantly by ensembling models and using test-time augmentation.

Features

  1. Experiments - Launch experiments from python dictionaries inside jupyter notebooks or python scripts. Attach Visualizers (Visdom, Kibana), Metrics (Accuracy, F2, Loss), or external datastores (S3, Elasticsearch)
  2. Monitoring - Track experiments from your phone or web-browser in real-time with Visdom, a lightweight visualization framework from Facebook
  3. Notifications - Receive email notifications when experiments complete or fail
  4. Sharing - Upload experiments, predictions and ensembles to S3 for other users to download
  5. Analysis - Compare experiments across users with Kibana. Design custom dashboards for specific competitions
  6. Helpers - Reduce time to submission with helper code for common tasks--custom datasets, metrics, storing predictions, ensembling models, making submissions, and more.
  7. Torchsample - Includes the latest release of ncullen93's torchsample project for additional trainer helpers and data augmentations.

Requirements

  1. Anaconda with Python3
  2. Pytorch
  3. Other requirements: pip install -r requirements.txt
  4. conda install -c menpo opencv
  5. Server with GPU and Cuda installed

Datasets

To get started you'll need to move all training and test images to the project_root/datasets/inputs directory (then either trn_jpg tst_jpg subdirectories). Running the first cell of each notebook creates the directory structure outlined in the config.py file.

There is no need to create separate directories for classes or validation sets. This is handled by the data_fold.py module and the FileDataset, which expects a list of filepaths and targets. After trying out a lot of approaches, I found this to be the easiest and most extensible. You'll sometimes need to generate a metadata.csv file separately if Kaggle didn't provide one. This sort of competition-specific code can live in the competitions/ directory.

Visdom

Visualize experiment progress on your phone with Facebook's new Visdom framework.

Kibana

Spin up an Elasticsearch cluster locally or on AWS to start visualizing or tracking experiments. Create custom dashboards with Kibana's easy-to-use drag and drop chart creation tools.

Filter and sort experiments, zoom to a specific time period, or aggregate metrics across experiments and see updates in real time.

Emails

Receive emails when experiments compete or fail using AWS SES service.

Kaggle CLI

Quickly download and submit with the kaggle cli tool.

kg download -c dogs-vs-cats-redux-kernels-edition -v -u USERNAME -p PASSWORD
kg submit -m 'my sub' -c dogs-vs-cats-redux-kernels-edition -v -u USERNAME -p PASSWORD my_exp_tst.csv

Best practices

  • Use systemd for always running Visdom and Jupyter servers

Unit Tests

Run tests with:

python -m pytest tests/

Other run commands:

python -m pytest tests/ (all tests)
python -m pytest -k filenamekeyword (tests matching keyword)
python -m pytest tests/utils/test_sample.py (single test file)
python -m pytest tests/utils/test_sample.py::test_answer_correct (single test method)
python -m pytest --resultlog=testlog.log tests/ (log output to file)
python -m pytest -s tests/ (print output to console)

TODO

  • Add TTA (test time augmentation) example
  • Add Pseudolabeling example
  • Add Knowledge Distillation example
  • Add Multi-input/Multi-target examples
  • Add stacking helper functions
 相关资料
  • Kaggle 是一个网站流量预测项目,项目采用Python语言开发,可以给大家的流量预测建模提供一些思路。 数据模型 Kaggle的训练数据集由大约14.5万套时间序列组成,每一套时间序列代表的是每天不同维基百科文章页的浏览次数,时间记录的周期为2015年7月1日到2017年9月10日。而我们的目标是为了预测2017年9月13日到2017年11月13日之间每天的页面浏览量。其中,需要检测的流量包括

  • 本文向大家介绍使用pytorch完成kaggle猫狗图像识别方式,包括了使用pytorch完成kaggle猫狗图像识别方式的使用技巧和注意事项,需要的朋友参考一下 kaggle是一个为开发商和数据科学家提供举办机器学习竞赛、托管数据库、编写和分享代码的平台,在这上面有非常多的好项目、好资源可供机器学习、深度学习爱好者学习之用。 碰巧最近入门了一门非常的深度学习框架:pytorch,所以今天我和大家

  • For lesson 1, competition is dogs-vs-cats-redux-kernels-edition: pip install kaggle-cli kg config -g -u `username` -p `password` -c `competition` kg download unzip train.zip unzip test.zip Links http

  • EEG Seizure Prediction Gareth Paul Jones 3rd place Melbourne University AES/MathWorks/NIH Seizure Prediction 2016 See also: Seizure Prediction Competition, 3rd Place Winner's Interview: Gareth Jones U

  • Hello Kaggle! �� Kaggle의 공식 문서와 캐글 가이드라는 책을 읽고 Kaggle의 정의나 기본적인 사용법들에 대하여 정리해보았습니다. 캐글 가이드는 동양북스라는 출판사에서 나온 책입니다. 책에 대해 궁금하신 분을 위해 링크도 첨부해두겠습니다. 캐글 가이드: 전 세계 데이터 과학자와 소통하고, 경쟁하고, 성장하기 - 동양북스 (동양북스 공식

  • 在本教程竞赛中,我们对情感分析进行了一些 深入 研究。谷歌的 Word2Vec 是一种受深度学习启发的方法,专注于单词的含义。