Hi there! This guide is for you:
I learned Python by hacking first, and getting serious later. I wanted to do this with Machine Learning. If this is your style, join me in getting a bit ahead of yourself.
I suggest you get your feet wet to start. You'll boost your confidence.
You can install Python 3 and all of these packages in a few clicks with the Anaconda Python distribution. Anaconda is popular in Data Science and Machine Learning communities. You can use whichever tool you want. conda vs. pip vs. virtualenv
Some options:
For other options, see markusschanta/awesome-jupyter → Hosted Notebook Solutions or ml-tooling/best-of-jupyter → Notebook Environments
Learn how to use Jupyter Notebook (5-10 minutes). (You can learn by screencast instead.)
Now, follow along with this brief exercise (10 minutes): An introduction to machine learning with scikit-learn. Do it in ipython
or IPython Notebook. It'll really boost your confidence.
You just classified some hand-written digits using scikit-learn. Neat huh?
scikit-learn is the go-to library for machine learning in Python. It's used widely. Machine learning is hard. You'll be glad your tools are easy to work with.
I encourage you to look at the scikit-learn homepage and spend about 5 minutes looking over the names of the strategies (Classification, Regression, etc.), and their applications. Don't click through yet! Just get a glimpse of the vocabulary.
Let's learn a bit more about Machine Learning, and a couple of common ideas and concerns. Read "A Visual Introduction to Machine Learning, Part 1" by Stephanie Yee and Tony Chu.
It won't take long. It's a beautiful introduction ... Try not to drool too much!
OK. Let's dive deeper.
Read "A Few Useful Things to Know about Machine Learning" by Prof. Pedro Domingos. It's densely packed with valuable information, but not opaque. The author understands that there's a lot of "black art" and folk wisdom, and they invite you in.
Take your time with this one. Take notes. Don't worry if you don't understand it all yet.
The whole paper is packed with value, but I want to call out two points:
When you work on a real Machine Learning problem, you should focus your efforts on your domain knowledge and data before optimizing your choice of algorithms. Prefer to Do Simple Things until you have to increase complexity. You should not rush into neural networks because you think they're cool. To improve your model, get more data. Then use your knowledge of the problem to explore and process the data. You should only optimize the choice of algorithms after you have gathered enough data, and you've processed it well.
(Chart inspired by a slide from Alex Pinto's talk, "Secure Because Math: A Deep-Dive on ML-Based Monitoring".)
Before you take a break, grab some podcasts.
First, download an interview with Prof. Domingos on the Data Skeptic podcast (2018). Prof. Domingos wrote the paper we read earlier. You might also start reading his book, The Master Algorithm by Prof. Pedro Domingos, a clear and accessible overview of machine learning.
Next, subscribe to more machine learning and data science podcasts! These are great, low-effort resources that you can casually learn more from. To learn effectively, listen over time, with plenty of headspace. Do not speed up your podcasts!
Subscribe to Talking Machines.
I suggest this listening order:
Want to subscribe to more podcasts? Here's a good listicle of suggestions, and another.
OK! Take a break, come back refreshed.
Next, pick one or two of these IPython Notebooks and play along.
There are more places to find great IPython Notebooks:
Know another great notebook? Please submit a PR!
Now you should be hooked, and hungry to learn more. Pick one of the courses below and start on your way.
Prof. Andrew Ng's Machine Learning is a popular and esteemed free online course. I've seen it recommended often. And emphatically.
It's helpful if you decide on a pet project to play around with, as you go, so you have a way to apply your knowledge. You could use one of these Awesome Public Datasets. And remember, IPython Notebook is your friend.
Also, you should grab an in-depth textbook to use as a reference. The two best options are Understanding Machine Learning and Elements of Statistical Learning. You'll see these recommended as reference textbooks. You only need to use one of the two options as your main reference; here's some context/comparison to help you pick which one is right for you. You can download each book free as PDFs at those links - so grab them!
Here are some other free online courses I've seen recommended. (Machine Learning, Data Science, and related topics.)
Start with the support forums and chats related to the course(s) you're taking.
Check out datascience.stackexchange.com and stats.stackexchange.com – such as the tag, machine-learning. There are some subreddits like /r/machinelearning.
There are also many relevant discussions on Quora, for example: What is the difference between Data Analytics, Data Analysis, Data Mining, Data Science, Machine Learning, and Big Data?
For help and community in meatspace, seek out meetups. Data Science Weekly's Big List of Data Science Resources may help you.
You'll want to get more familiar with Pandas.
odo
library for converting between many formats.dask
: A Pandas-like interface, but for larger-than-memory data and "under the hood" parallelism. Very interesting, but only needed when you're getting advanced.Some good cheat sheets I've come across. (Please submit a Pull Request to add other useful cheat sheets.)
I'm not repeating the materials mentioned above, but here are some other Data Science resources:
From the "Bayesian Machine Learning" overview on Metacademy:
... Bayesian ideas have had a big impact in machine learning in the past 20 years or so because of the flexibility they provide in building structured models of real world phenomena. Algorithmic advances and increasing computational resources have made it possible to fit rich, highly structured models which were previously considered intractable.
You can learn more by studying one of the following resources. Both resources use Python, PyMC, and Jupyter Notebooks.
"Machine learning systems automatically learn programs fromdata." Pedro Domingos, in "A Few Useful Things to Know about Machine Learning." The programs you generate will require maintenance. Like any way of creating programs faster, you can rack up technical debt.
Here is the abstract of Machine Learning: The High-Interest Credit Card of Technical Debt:
Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.
If you're following this guide, you should read that paper. You can also listen to a podcast episode interviewing one of the authors of this paper.
A few more articles on the challenges running ML-powered systems in Production:
So you are dabbling with Machine Learning. You've got Hacking Skills. Maybe you've got some "knowledge" in Domingos' sense (some "Substantive Expertise" or "Domain Knowledge"). This diagram is modified slightly from Drew Conway's "Data Science Venn Diagram." It isn't a perfect fit for us, but it may get the point across:
Please don't sell yourself as a Machine Learning expert while you're still in the Danger Zone. Don't build bad products or publish junk science. (Also please don't be evil.) This guide can't tell you how you'll know you've "made it" into Machine Learning competence ... let alone expertise. It's hard to evaluate proficiency without schools or other institutions. This is a common problem for self-taught people.
You need practice. On Hacker News, user olympus commented to say you could use competitions to practice and evaluate yourself. Kaggle and ChaLearn are hubs for Machine Learning competitions. You can find some examples of code for popular Kaggle competitions here. For smaller exercises, try HackerRank.
You also need understanding. You should review what Kaggle competition winners say about their solutions, for example, the "No Free Hunch" blog. These might be over your head at first but once you're starting to understand and appreciate these, you know you're getting somewhere.
Competitions and challenges are just one way to practice. You shouldn't limit yourself, though - and you should also understand that Machine Learning isn't all about Kaggle competitions.
Here's a complementary way to practice: do practice studies.
And repeat. Re-phrasing this, it fits with the scientific method: formulate a question (or problem statement), create a hypothesis, gather data, analyze the data, and communicate results. (You should watch this video about the scientific method in data science, and/or read this article.)
How can you come up with interesting questions? Here's one way. Every Sunday, browse datasets and write down some questions. Also, sign up for Data is Plural, a newsletter of interesting datasets; look at these, datasets, and write down questions. Stay curious. When a question inspires you, start a study.
This advice, to do practice studies and learn from peer review, is based on a conversation with Dr. Randal S. Olson. Here's more advice from Olson, quoted with permission:
I think the best advice is to tell people to always present their methods clearly and to avoid over-interpreting their results. Part of being an expert is knowing that there's rarely a clear answer, especially when you're working with real data.
As you repeat this process, your practice studies will become more scientific, interesting, and focused. The most important part of this process is peer review.
Here are some communities where you can reach out for peer review:
Post to any of those, and ask for feedback. You'll get feedback. You'll learn a ton. As experts review your work you will learn a lot about the field. You'll also be practicing a crucial skill: accepting critical feedback.
When I read the feedback on my Pull Requests, first I repeat to myself, "I will not get defensive, I will not get defensive, I will not get defensive." You may want to do that before you read reviews of your Machine Learning work too.
Machine Learning can be powerful, but it is not magic.
Whenever you apply Machine Learning to solve a problem, you are going to be working in some specific problem domain. To get good results, you or your team will need "substantive expertise" AKA "domain knowledge." Learn what you can, for yourself... But you should also collaborate. You'll have better results if you collaborate with domain experts. (What's a domain expert? See the Wikipedia entry, or c2 wiki's rather subjective but useful blurb.)
I couldn't say it better:
Machine learning won’t figure out what problems to solve. If you aren’t aligned with a human need, you’re just going to build a very powerful system to address a very small—or perhaps nonexistent—problem.
Quote is from "The UX of AI" by Josh Lovejoy, whole article is a great read!
In other words, You Are Not The User.
Today we are surrounded by software that utilizes Machine Learning. Often, the results are directly user-facing, and intended to enhance UX.
Before you start working ML into your software, you should get a better understanding of UX, as well as how ML and UX can relate. As an informal way to get into this subject, start with this:
Then, if you you know a coworker or friend who works in UX, take them out for coffee or lunch and pick their brain. I think they'll have words of encouragement as well as caution. You won't be an expert by any means, but maybe it'll help you konw if/when to reach out for help, review, or guidance.
Spoiler: you should work with UX specialists whenever you can!
There was a great BlackHat webcast on this topic, Secure Because Math: Understanding Machine Learning-Based Security Products. Slides are here, video recording is here. If you're using ML to recommend some media, overfitting could be harmless. If you're relying on ML to protect from threats, overfitting could be downright dangerous. Check the full presentation if you are interested in this space.
If you want to explore this space more deeply, there is a lot of reading material in the below links:
In early editions of this guide, there was no specific "Deep Learning" section. I omitted it intentionally. I think it is not effective for us to jump too far ahead. I also know that if you become an expert in traditional Machine Learning, you'll be capable of moving onto advanced subjects like Deep Learning, whether or not I've put that in this guide. We're just trying to get you started here!
Maybe this is a way to check your progress: ask yourself, does Deep Learning seem like magic? If so, take that as a sign that you aren't ready to work with it professionally. Let the fascination motivate you to learn more. I have read some argue you can learn Deep Learning in isolation; I have read others recommend it's best to master traditional Machine Learning first. Why not start with traditional Machine Learning, and develop your reasoning and intuition there? You'll only have an easier time learning Deep Learning after that. After all of it, you'll able to tackle all sorts of interesting problems.
In any case, when you decide you're ready to dive into Deep Learning, here are some helpful resources.
If you are working with data-intensive applications at all, I'll recommend this book:
Lastly, here are some other useful links regarding Big Data and ML.
Here are some other guides to Machine Learning. They can be alternatives or complements to this guide.
pattern_classification
GitHub repository maintained by the author, which contains IPython notebooks about various machine learning algorithms and various data science related resources.Dive into Deep Learning 1 Introduce Most neural networks contain a few principles: using linear or nonlinear units alternately,which are called layer. using gradient descent to update network paramete
备注:回忆整理原已经学习过的机器学习和深度学习方面的算法. 可视化数据 自动编码器 模型融合
编辑器是开发人员花费大部分时间的区域。 掌握编辑器是提高任何资源生产力的第一步。 本章讨论编辑器的可视元素,最常见的编辑器操作和提供lint检查的SonarLint插件。 编辑的视觉元素 IntelliJ有许多规定,包括视觉元素,旨在帮助开发人员浏览和理解编码的真实状态。 现在让我们通过不同的规定 - 天沟区 编辑器的装订区位于IDE的左侧,如下图所示 - Labels 我们现在将了解标签的工作原
本项目将《动手学深度学习》 原书中MXNet代码实现改为TensorFlow2实现。经过archersama的导师咨询李沐老师,这个项目的实施已得到李沐老师的同意。原书作者:阿斯顿·张、李沐、扎卡里 C. 立顿、亚历山大 J. 斯莫拉以及其他社区贡献者,GitHub地址:https://github.com/d2l-ai/d2l-zh 此书的中、英版本存在一些不同,本项目主要针对此书的中文版进行T
Docker 容器镜像是一个轻量、独立、含有运行某个应用所需全部软件的可执行包,那么一个 Docker 镜像里面会包含什么东西?这个名为 Dive 的工具正是用来分析和浏览 Docker 镜像每层的内容。 通过分析 Docker 镜像,我们可以发现在各个层之间可能重复的文件,并通过移除它们来减小 Docker 镜像的大小。 Dive 是一个用 Go 语言编写的自由开源工具。Dive 工具不仅仅是一
本教程的前四章旨在为初学者提供IntelliJ的基本概述。 本节深入探讨了IntelliJ,并讨论了有关项目,其格式以及其他内容的更多信息。 了解项目 项目是您正在使用的应用程序或软件。 它可以包含多个模块,类,库,配置等。 它是层次结构中最顶层的元素。 了解模块 模块在“项目”下面有一个梯级。 模块是一个独立的实体,可以独立于其他模块进行编译,调试和运行。 单个项目可以包含多个模块。 您可以随时
原文:https://joshondesign.com/p/books/canvasdeepdive/title.html 实例源码 Github 地址:https://github.com/joshmarinacci/canvasdeepdive-examples 实例下载:https://www.wenjiangs.com/wp-content/uploads/2022/01/canvasde
What you are reading is an ebook experiment. It is built to showcase the power of modern web standards with interactive electronic texts. Everything you see is done with HTML, CSS and Javascript; bund