当前位置: 首页 > 工具软件 > ph-open > 使用案例 >

机器学习资源-Harvard Ph.D Sam维护

黄伟
2023-12-01

这篇博文转自哈佛大学博士生Sam整理的机器学习资料,包括了数据基础、几何、概率论、统计学习、深度学习等。内容非常丰富,Blog是完全拷贝用于备份。最新内容建议阅读Sam维护的博文:https://sgfin.github.io/learning-resources/


ML Resources

This is a not-particularly-systematic attempt to curate a handful of my favorite resources for learning statistics and machine learning. This isn’t meant to be comprehensive, and in fact is still missing the vast majority of my favorite explainers. Rather, it’s just a smattering of resources I’ve found myself turning to multiple times and thus would like to have in one place. The organizatiion is as follows:

Finally, I’ve added a section with links to a few miscellanous websites that often produce great content.

Of the above, the second section is both the most incomplete and the one that I am most excited about. I hope to use it to capture the best explanations of tricky topics that I have read online, to make it easier to re-learn them later when I inevitably forget. (In a perfect world, Chris Olah and/or distill.pub would just write an article on everything, but in the meantime I have to gather scraps from everywhere else.)

If you stumble upon this list and have suggestions for me to add (especially for the middle section!), please feel free to reach out! But I’m only trying to post things on here that I’ve read, so it may be caught in my to-read list for a while before it makes it on here. Of course, the source for this webpage is on github, so you can also just take it.

Open Courses and Textbooks

I’m trying to limit to this list to things that are legally accessible online, for free.

Foundation

FileDescription
Math for ML BookMath for machine learning book by Faisal and Ong, available on github.
Boyd Applied Linear AlgebraFreely available book from Boyd and Vandenberghe on Applied LA (website).
Fast.ai Computational Linear AlgebraRachel Thomas has put together this great online textbook for computational linear algebra with accompanying youtube videos.
MIT 6.041 Intro ProbabilityJohn Tsitsiklis et al have put together some great resources. Their classic MIT intro to probability has been archived on OCW and also offered on Edx (Part 1Part 2). The textbook is also excellent.
Joe Blitzstein’s Stat110Joe Blitzstein’s undergrad probability course has a high overlap in content with 6.041. Like 6.041, it also has a great textbookyoutube videos, and an edx offering. It’s a bit more playful, as well.
MathematicalMonkThis guy is amazing. Some 250 youtube tutorials on ML, Probability, and Information Theory. What’s great about these playlists is any individual video could go into section 2!

Statistics

FileDescription
Doug Sparks’ Stats 200Nice course notes from Doug Sparks 2014 offering of stats 200
Modern Statistics for Modern BiologyThis online textbook is from Susan Holmes and Wolfgang Huber, and provides a nice and accessible intro to the parts of modern data science revelant to computational biologists. It also happens to be a piece of typographic art, created with bookdown.
Statistical RethinkingLecture Videos on youtube accompany this very well-reviewed introductory textbook.
Hernan and Robbins Causal Inference BookLong-upcoming textbook on causal inference (from the epidemiology perspective), with drafts fairly frequently updated on the web page.

Classic Machine Learning

FileDescription
CS 229 Lecture NotesClassic note set from Andrew Ng’s amazing grad-level intro to ML: CS229.
ESL and ISL from Hastie et alBeginner (ISL) and Advanced (ESL) presentation to classic machine learning from world-class stats professors. Slides and video for a MOOC on ISL is available here.
CS 228 PGM NotesReally great course notes on Probabilistic Graphical Models from at Stanford. PDF export wasn’t ideal so linking only to website.
Blei Foundations of Graphical Models Course2016 course notes on Foundations of Graphical Models from David Blei 2016 website

Deep Learning

FileDescription
Roger Grosse’s CSC231 NotesNotes from Roger Grosse’s CSC 231 full website here. Probably the single best intro to DL course I’ve found from any university. Notes and slides are gorgeous.
Fast.AiWonderful set of intro lectures + notebooks from Jeremy Howard and Rachel Thomas. In addition, Hiromi Suenaga has released excellent and self-contained notes of the whole series with timestamp links back to videos: FastAI DL Part 1FastAI DL Part 2, and FastAI ML.
CS231N DL for VisionAmazing notes from Andrej Karapthy, with lectures on Youtube as well.
CS224 Deep Learning for NLP 2017Fantastic course notes on Deep Learning for NLP from Stanford’s CS224. Github repo here
CMU CS 11-747Fantastic course on Deep Learning for NLP from CMU’s Graham Neubig. Really great lecture videos on Youtube here
Deep Learning BookThis textbook by Ian Goodfellow, Yoshua Bengio, and Aaron Courville is probably the closest we have to a de facto standard textbook for DL.

Reinforcement Learning

FileDescription
Sutton and Barto Open RL BookDe-facto standard intro to RL, even though the textbook is only now about to be published!
Berkeley Deep Reinforcement LearningRL class from Berkely taught by top dogs in the field, lectures posted to Youtube.

Optimization

FileDescription
Boyd Convex Optimization BookFamous and freely available textbook from Boyd and Vandenberghe, accompanied by slidesand Youtube videos. More advanced follow-up class here
NYU Optimization-based Data Analysis 2016 and 2017Fantastic course notes on Optimization-based data analysis from NYU 2016 website and 2017 website.

Tutorials, Overviews, and (Individual) Lecture Notes

This section is fledgling at best, but was my real motivation in making this page. Archetypes include basically anything on distill.pub, good blog or medium posts, etc. Depth-first learning looks like a great access point here, but I haven’t gotten to do more than skim any of those, yet.

Fundamentals

FileDescription
CS 229 Linear Algebra NotesLinear algebra reference from Stanford’s Machine Learning Course.
Matrix Calc for DL (pdf here)Really nice overview of matrix calculus for deep learning from Parr/Howard. Citable on on arxiv.

Probability and Statistics

FileDescription
Hernan Selection BiasNice summary of selection bias via DAGs by Hernan et al.

Classic Machine Learning/Data Science NOS

FileDescription
Roughgarden SVD NotesReally great presentation of SVD from Tim Rougharden’s CS168 at Stanford.
Roughgarden PCA NotesReally great presentaiton of PCA from Tim Rougharden’s CS168 at Stanford.

Bayesian Machine Learning

FileDescription
Blei Exponential Familes/Variational InferenceA couple of the course notes I particularly like from Blei’s 2011 Probabilistic Modeling Course )
Blei Variational Inference ReviewOverview on Variational Inference from David Blei available on arxiv

Deep Learning

FileDescription
Adversarial Examples/Robust ML Part 1Part 2, and Part 3The Madry lab is one of the top research groups in robust deep learning research. They put together a fantastic intro to these topics on their blog. I hope they keep making posts…
Distill AttentionAmazingly clear presentation of the attention mechanism and its (early) variants
Distill Building InterpretabilityCoolest visualizations of NN internals I’ve ever seen
Distill Feature VisualizationRunning theme: If it’s only distill.pub, read it.
Chris Olah Understanding LSTMsChris Olah is a master of his craft, and here offers a fantastic overview of LSTMs and GRUs.

Natural Language Processing

FileDescription
Chris Olah on Word EmbeddingsChris Olah explaining world embeddings and the like.
The Annotated TransformerHarvard’s Sasha Rush created a line-by-line annotation of “Attention is All You Need” that also serves as a working notebook. Pedagogical brilliance, and it would be awesome to do this for a couple papers per year.
Goldberg’s Primer on NNs for NLPOverview of Deep Learning for NLP from Yoav Goldberg downloaded from here.
Neubig’s Tutorial on NNs for NLPOverview of Deep Learning for NLP from Graham Neubig. Downloaded from arxiv and pairs nicely with his course and videos.

Reinforcement Learning

FileDescription
Karpathy’s Pong From PixelsAndrej Karpathy has a real gift for didactics. This is a self-contained explanation of deep reinforcement learning sufficient to understand a basic atari agent.
Weng’s A (Long) Peek into RLA nice blog post covering the foundations of reinforcement learning
OpenAI’s Intro to RLThe introductory tutorial for OpenAIs new “Spinning Up in Deep RL” website

Information Theory

FileDescription
Chris Olah Visual Information TheoryAs always, Chris Olah creates an amazing presentation both in words and images. Goal is to visualize key information theory concepts.
Cover and Thomas Ch2 - Entropy and InformationThe extremely well-written introductory chapter from the classic information theory textbook.
Cover and Thomas Ch11 - Info Theory and StatisticsThe information theory and statistics chapter from the classic information theory textbook.
Deriving Probability Distributions from Maximum Entropy PrincipleIt feels slimey and self-serving to include this, but I wrote this post to better understand how information theory can be used to understand/derive common probability distributions from first principles.
Deriving the information entropy of the multivariate gaussianAnother blog post I wrote to try to understand information theory + statistics.

Optimization

FileDescription
Ruder Gradient Descent Overview (PDF here)Great overview of gradient descent algorithms.
Bottou Large-Scale OptimizationNotes on Optimization from Bottou, Curtis, and Nocedal. Downloaded from arxiv.

Cheatsheets

Math

FileDescription
Probability CheatsheetProbability cheat sheet, from William Chen’s github
CS 229 TA Cheatsheet 2018TA cheatsheet from the 2018 offering of Stanford’s Machine Learning Course, Github repo here.
CS Theory CheatsheetCS theory cheat sheet, originally accessed here

Programming

FileDescription
R dplyr cheatsheetCheatsheet for Hadley’s amazing data wrangling package, dplyr. One of many from RStudio
R ggplot2 cheatsheetCheatsheet for Hadley’s amazing plotting package, ggplot2. One of many from RStudio
SQL Joins cheatsheetGraphical description of classic SQL joins w/ toy code
Python pandas cheatsheetCheatsheet for python’s data wrangling package, pandas. Downloaded from here
Python numpy cheatsheetCheatsheet for python’s numerical package, numpy. Downloaded from Datacamp
Python keras cheatsheetCheatsheet for python’s NN package, keras. Downloaded from Datacamp.
Python scikit-learn cheatsheetCheatsheet for python’s ML package, scikit-learn. Downloaded from Datacamp.
Python seaborn tutorialTutorial for python’s plotting system, seaborn. Haven’t found a great one yet for matplotlib.
Graphic Design cheatsheetCute little graphic design cheatsheet downloaded from here

Miscellaneous websites

FileDescription
Chris Olah’s BlogEssentially everything on here is gold. I am so grateful for the hours he must put into these posts.
distill.pubDistill navigates a really interesting gap between super-blog and research journal. I wish that we had more publications like this.
Pytorch TutorialsThe tutorials put out by the pytorch developers are really fantastic. Easy to see why the community is growing so fast.
Sebastian Ruder’s blogSebastian has produced a lot of really great explanations, like the one on gradient descent methods I linked to above. He also maintains a website tracking progress on NLP benchmarks
Berkeley AI Research (BAIR) BlogBAIR produces a lot of great research, and uses this blog to release more accessible presentations of their papers.
Off the Convex PathNice blog on machine learning and optimization.
Ferenc Huszár’s blogPretty popular blog that has a lot of explorations/musings on ML from an author with a rigorous mathematical perspective
Thibaut Lienart’s BlogThis website has some notes on math and optimization that seem interesting.
 类似资料: