scikit-learn-videos

授权协议 Readme
开发语言 Python
所属分类 神经网络/人工智能、 机器学习/深度学习
软件类型 开源软件
地区 不详
投 递 者 乐城
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Introduction to Machine Learning with scikit-learn

This video series will teach you how to solve Machine Learning problems using Python's popular scikit-learn library. There are 10 video tutorials totaling 4.5 hours, each with a corresponding Jupyter notebook. The notebook contains everything you see in the video: code, output, images, and comments.

Note: The notebooks in this repository have been updated to use Python 3.9.1 and scikit-learn 0.23.2. The original notebooks (shown in the video) used Python 2.7 and scikit-learn 0.16, and can be downloaded from the archive branch. You can read about how I updated the code in this blog post.

You can watch the entire series on YouTube, and view all of the notebooks using nbviewer.

Once you complete this video series, I recommend enrolling in my online course, Machine Learning with Text in Python, to gain a deeper understanding of scikit-learn and Natural Language Processing.

Table of Contents

  1. What is Machine Learning, and how does it work? (video, notebook)

    • What is Machine Learning?
    • What are the two main categories of Machine Learning?
    • What are some examples of Machine Learning?
    • How does Machine Learning "work"?
  2. Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook (video, notebook)

    • What are the benefits and drawbacks of scikit-learn?
    • How do I install scikit-learn?
    • How do I use the Jupyter Notebook?
    • What are some good resources for learning Python?
  3. Getting started in scikit-learn with the famous iris dataset (video, notebook)

    • What is the famous iris dataset, and how does it relate to Machine Learning?
    • How do we load the iris dataset into scikit-learn?
    • How do we describe a dataset using Machine Learning terminology?
    • What are scikit-learn's four key requirements for working with data?
  4. Training a Machine Learning model with scikit-learn (video, notebook)

    • What is the K-nearest neighbors classification model?
    • What are the four steps for model training and prediction in scikit-learn?
    • How can I apply this pattern to other Machine Learning models?
  5. Comparing Machine Learning models in scikit-learn (video, notebook)

    • How do I choose which model to use for my supervised learning task?
    • How do I choose the best tuning parameters for that model?
    • How do I estimate the likely performance of my model on out-of-sample data?
  6. Data science pipeline: pandas, seaborn, scikit-learn (video, notebook)

    • How do I use the pandas library to read data into Python?
    • How do I use the seaborn library to visualize data?
    • What is linear regression, and how does it work?
    • How do I train and interpret a linear regression model in scikit-learn?
    • What are some evaluation metrics for regression problems?
    • How do I choose which features to include in my model?
  7. Cross-validation for parameter tuning, model selection, and feature selection (video, notebook)

    • What is the drawback of using the train/test split procedure for model evaluation?
    • How does K-fold cross-validation overcome this limitation?
    • How can cross-validation be used for selecting tuning parameters, choosing between models, and selecting features?
    • What are some possible improvements to cross-validation?
  8. Efficiently searching for optimal tuning parameters (video, notebook)

    • How can K-fold cross-validation be used to search for an optimal tuning parameter?
    • How can this process be made more efficient?
    • How do you search for multiple tuning parameters at once?
    • What do you do with those tuning parameters before making real predictions?
    • How can the computational expense of this process be reduced?
  9. Evaluating a classification model (video, notebook)

    • What is the purpose of model evaluation, and what are some common evaluation procedures?
    • What is the usage of classification accuracy, and what are its limitations?
    • How does a confusion matrix describe the performance of a classifier?
    • What metrics can be computed from a confusion matrix?
    • How can you adjust classifier performance by changing the classification threshold?
    • What is the purpose of an ROC curve?
    • How does Area Under the Curve (AUC) differ from classification accuracy?
  10. Building a Machine Learning workflow (video, notebook)

    • Why should you use a Pipeline?
    • How do you encode categorical features with OneHotEncoder?
    • How do you apply OneHotEncoder to selected columns with ColumnTransformer?
    • How do you build and cross-validate a Pipeline?
    • How do you make predictions on new data using a Pipeline?
    • Why should you use scikit-learn (rather than pandas) for preprocessing?

Bonus Video

At the PyCon 2016 conference, I taught a 3-hour tutorial that builds upon this video series and focuses on text-based data. You can watch the tutorial video on YouTube.

Here are the topics I covered:

  1. Model building in scikit-learn (refresher)
  2. Representing text as numerical data
  3. Reading a text-based dataset into pandas
  4. Vectorizing our dataset
  5. Building and evaluating a model
  6. Comparing models
  7. Examining a model for further insight
  8. Practicing this workflow on another dataset
  9. Tuning the vectorizer (discussion)

Visit this GitHub repository to access the tutorial notebooks and many other recommended resources.

  • by Kavita Ganesan 通过Kavita Ganesan 如何使用TF-IDF和Python的Scikit-Learn从文本中提取关键字 (How to extract keywords from text with TF-IDF and Python’s Scikit-Learn) Back in 2006, when I had to use TF-IDF for keyword

  • 参考:http://scikit-learn.org/stable/presentations.html scikit-learn的User Guide基本看完了(除了具体estimator部分),这里再摘录scikit-learn官方网站提供的额外资源,供之后学习。 关于supervised learning和unsupervised learning中涉及到的estimator,用到的时候再看

 相关资料
  • scikit-learn 是一个 Python 的机器学习项目。是一个简单高效的数据挖掘和数据分析工具。基于 NumPy、SciPy 和 matplotlib 构建。 Installation 依赖 scikit-learn 要求: Python (>= 2.7 or >= 3.3) NumPy (>= 1.8.2) SciPy (>= 0.13.3) 运行示例需要 Matplotlib >= 1

  • 你可以使用 Keras 的 Sequential 模型(仅限单一输入)作为 Scikit-Learn 工作流程的一部分,通过在此找到的包装器: keras.wrappers.scikit_learn.py。 有两个封装器可用: keras.wrappers.scikit_learn.KerasClassifier(build_fn=None, **sk_params), 这实现了Scikit-Le

  • 校验者: @小瑶 翻译者: @片刻 Note 如果你想为这个项目做出贡献,建议你 安装最新的开发版本 . 安装最新版本 Scikit-learn 要求: Python (>= 2.7 or >= 3.3), NumPy (>= 1.8.2), SciPy (>= 0.13.3). 如果你已经有一个安全的 numpy 和 scipy,安装 scikit-learn 最简单的方法是使用 pip pip

  • 问题内容: 我试图在Linux Mint 12上安装scikit-learn,但失败了。我从http://pypi.python.org/pypi/scikit- learn/ 下载了该软件包并安装了 然后,我将目录更改为home并启动python2.7 shell。在导入sklearn时,我得到了: 我认为问题出在scipy的空间。这是因为当我做 我得到与Scikit学习相同的错误。 请帮忙。谢

  • 问题内容: 我正在处理不平衡类(5%1)的分类问题。我想预测班级,而不是概率。 在二进制分类问题中,默认情况下是否使用scikit ?如果没有,默认方法是什么?如果可以,该如何更改? 在scikit中,某些分类器可以选择,但并非全部都可以。使用,是否将实际人口比例用作阈值? 在不支持的分类器中执行此操作的方式是什么?除了自己使用然后计算类。 问题答案: 默认情况下,scikit是否使用0.5? 在

  • 问题内容: 读取执行的scikit学习中tensroflow:http://learningtensorflow.com/lesson6/和scikit学习:http://scikit- learn.org/stable/modules/generated/sklearn.cluster.KMeans.html 我努力决定使用哪种实现。 scikit-learn作为tensorflow docke