A reading list for the larval stage sysadmin. This list is focused on the UNIX family of OSes, mainly because that is my area of expertise, but PRs about other OSes are welcome.
So you've got your first sysadmin/sre job or internship. Congratulations, it's going to be an interesting ride.
Articles and Books to Read
A Few Ops Lessions We All Learn the Hard Way - A collection of lessons that everyone in Ops inevitably learns. You may not personally experience all of them, but they'll ring true after you're in ops for a while.
Clean Code - Every year, countless hours and significant resources are lost because of poorly written code. But it doesn't have to be that way. Martin has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code "on the fly" into a book that will instill within you the values of a software craftsman and make you a better programmer-but only if you work at it.
Continuous Delivery - A book that has rapidly become the guide to planning and implementing build pipelines.
DevOps Roadmap - Community driven, articles, resources, guides, interview questions, quizzes for DevOps. Learn to become a modern DevOps engineer by following the steps, skills, resources and guides listed in this roadmap.
Effective DevOps - A practical guide for creating affinity among teams and promoting efficient tool usage in your company.
Git Magic (free ebook) - git is a version control Swiss army knife. A reliable versatile multipurpose revision control tool whose extraordinary flexibility makes it tricky to learn, let alone master.
Hello DNS - Every sysadmin/sre needs to know how DNS works. Start with DNS Basics it's a good introduction.
Lean Startup or Lean Enterprise - This pair describes the process surrounding implementation and use of Lean principles in Startup and Enterprise organizations. There are a number of companion pieces that extend the principles to specific fields of study and implementation, such as Lean Analytics.
LinkedIn's School of SRE - Linkedin is using this curriculum for onboarding non-traditional hires and new college grads into the SRE role.
Systems Performance: Enterprise and Cloud by Brendan Gregg, this book is an award winner and a favorite of many a sysadmin and SRE, it addresses systems performance at scale.
The Art of Capacity Planning - John Allspaw's book is a hands-on and practical guide to planning for such growth, with many techniques and considerations to help you plan, deploy, and manage web application infrastructure.
The Art of Monitoring - James Turnbull's book on the art of modern application and infrastructure monitoring and metrics.
The Practice of Cloud System Administration, by Tom Limoncelli. Focuses on “distributed” or “cloud” computing and brings a DevOps/SRE sensibility to the practice of system administration. Includes case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises.
Time Management for System Administrators, by Tom Limoncelli. You're going to be pulled in a dozen different directions, if you can't manage your time you and your job performance are going to suffer.
Web Operations: Keeping the Data on Time - A collection of essays and interviews, with web veterans such as Theo Schlossnagle, Baron Schwartz, and Alistair Croll that will teach you strategies for designing your web site to scale up smoothly to web-scale load.
Wizard Zines - Julia Evans has a great set of zines she's published about many topics useful to a starting (or even an experienced) sysadmin/SRE.
Languages
The Dev part of DevOps means you're going to inevitably end up writing some code. Here's a list of free programming books for many languages.
Here are some of the scripting languages you're most likely to see in your infrastructure, with links to some good references and tutorials.
Awk
The awk family (awk, gawk, nawk and I'm sure I've missed other implementations) of scripting languages is one of the oldest - the first version of awk was written in 1977, but it's on pretty much any unix (even minimal variants that might not have perl, python or ruby) and is still very useful.
I still use it frequently for pulling columns out of tabular output because by default (and unlike cut where you have to count spaces) it treats consecutive runs of whitespace characters as a delimiter, so for example you can pipe things to awk '{print $3}', but it's Turing-complete - people can and have written complex programs in it.
bash is objectively a terrible programming language. All variables default to being globals, there is no module system built into the language, dealing with hashes is horrible, and there are other horrors resulting from it trying to be backward compatible with sh.
That said, it is on every system, so every *NIX sysadmin needs to know bash.
Here are some useful resources to help you step up your shell scripting game:
The Art of the Command Line - A good set of notes and tips on using the command-line useful when working on Linux/Unix.
shellcheck is a lint for bash. It'll help you find unused variables, deprecated syntax and other things that make your bash scripts less stable. You can install it with apt-get, brew, cabal, or yum.
shellharden - is a syntax highlighter and a tool to semi-automate the rewriting of scripts to ShellCheck conformance, mainly focused on quoting.
zshelldoc - Documentation generator for Bash & ZSH, with call-trees, comment extraction, etc.
Finally, remember that bash is not sh. If you're writing a script in bash, and testing it with bash, don't put #!/bin/sh as the shebang. Firstly, because bash behaves differently when called as sh, and secondly, not all *NIX systems (and not even all linux distributions) use bash as their /bin/sh any more.
Powershell
Often you'll find yourself in a Windows enviroment, like it or not. These resources might help you in those cases -
PS Cmdlets In Your Inbox lets you schedule a task to get PowerShell Cmdlets via email daily or at the command line.
Python
Python has much better support for string manipulation and system infrastructure than Bash. In addition, there is a rich library of modules supporting various tasks you can use in your scripts that are just a pip3 install away.
Perl has a long history of being the system administrator's friend, bringing the best of bash, sed and awk together. It is also suitable for building tools for the system administrator to utilise in their work.
cpanm - An alternative and very friendly tool for installing modules from CPAN. This pairs well with perlbrew.
perlbrew - A tool for managing one or more Perl installations, without needing to modify the system-level Perl.
pinto - A tool for managing a private CPAN repository.
Web
Mojolicious - A rich framework for doing all things web, from building web services and sites to building HTTP client applications.
Plack/PSGI - The Perl implementation of WSGI, with many Plack servers available for use.
Modules
Task::Kensho - A list of recommended modules for many purposes, including reading configuration files, connecting to databases, logging, sending email, web crawling and development, and handling XML.
Terraform is a tool that allows you to configure your infrastructure as code, just like Chef/Puppet/etc allow you to manage the configuration of individual machines as code, with all the benefits of being able to diff, code review, etc. Terraform works with (as of this edit) AWS, Google Cloud, Microsoft Azure, vSphere and many other systems.
AWS
AWSCli provides a unified command line interface to Amazon Web Services. Wean yourself off of the webui if you want to be truly productive.
og-aws is an excellent resource to AWS written by and for engineers who use AWS extensively.
S3cmd is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage, Backblaze B2 or DreamHost DreamObjects.
Azure
Az PowerShell Module The cross-platform (i.e. PS Core) Azure PowerShell module. Replaces the AzureRm module and provides a migration path from it.
Azure CLI 2.0 new preview CLI interface for Azure (written in python).
Planet OpenStack An aggregated feed from across the Internet of OpenStack-related content, including contributions from individuals.
Public Clouds Similar to AWS, GCP or Azure, this is a list of providers who offer cloud services running on OpenStack.
SuperUser SuperUser is an online 'publication' aggregating and editorialising content related to OpenStack and Open Infrastructure.
Stackalytics Code contribution statistics to OpenStack and related projects.
Configuration Management
Quite simply, if you aren't using configuration management, you're doing it wrong.
You don't want to manually configure any servers - no matter how hard you try, they won't end up truly identical and having meat typing in commands takes far too long per server, doesn't scale, and the manual labor will discourage you from standing up new VMs for testing.
Treating your configuration as something described in text files allows you to treat it like code. You can do pull-requests, get your changes reviewed by your team, view the differences between your configuration at different times, and almost most-importantly, find out who changed the configuration, when, and if they wrote good commit messages, why.
There are several good options:
Ansible is designed to be minimal in nature, consistent, secure, and highly reliable. Owned & supported by Red Hat.
CFEngine has been in continuous development since 1993. Unlike some of its peers on this list, it is written in C and is built with speed and scalability in mind. It should be considered for very, very large systems and for very small (think embedded) systems.
Chef is written in Ruby and Erlang and uses a Ruby DSL to describe system configuration.
Puppet makes it easy to automate the provisioning, configuration and ongoing management of your machines and the software running on them. Make rapid, repeatable changes and automatically enforce the consistency of systems and devices – across physical and virtual machines, on premise or in the cloud.
Salt orchestrates the build and ongoing management of your infrastructure.
Docker
Docker is a tool for running and managing containers. Containers are rapidly growing in popularity for local development (as an alternative to virtual machines), and can also run software in production with tools like Kubernetes or Amazon ECS.
Installing Docker
Follow the installation instructions for your preferred platform:
The Docker Book - An excellent resource for getting started with Docker. This book is quick & easy to read.
Kubernetes
Kubernetes is a portable open-source container orchestration system used to automate deployment, scaling, and management of containerized applications.
Tutorials
There are many good tutorials at kubernetes.io. I recommend you start with either the minikube walkthrough since it will get you a running test cluster quickly, or enable the kubernetes cluster option in Docker Desktop.
If you want to understand everything that is involved in getting a Kubernetes cluster up and running, Kubernetes the Hard Way by Kelsey Hightower is hard to beat.
Have you ever wondered exactly what happens when you type something kubectl run nginx --image=nginx --replicas=3 to make everything happen? What happens when K8s... is a guide that leads you through the full lifecycle of a request from the client to the kubelet, linking off to the source code where necessary to illustrate what's going on.
Utilities
krew - Makes it easy to use kubectl plugins. krew helps you discover plugins, install and manage them on your machine. It is similar to tools like apt, dnf or brew. Today, over 70 kubectl plugins are available on krew.
kubectx - Provides the kubectx command, which makes it easy to switch between clusters specified in your .kube/config, and kubens, which helps you switch between Kubernetes namespaces smoothly.
Monitoring
There are several good projects for monitoring.
Grafana - Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.
OSquery for Windows, linux, OS X, and FreeBSD. Use SQL queries to look into items such as installed programs, running processes, and other events for inventory and monitoring.
Prometheus - Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company.
Articles/Tutorials
Impactful Dashboards - It's easy to make monitoring dashboards that are a jumble of poorly presented information, this article gives guidelines on making good dashboards.
Regular Expressions
Inevitably you're going to find yourself in a situation where you have to look at logs to see what's going wrong with a service. When it's a multi-gigabyte logfile, that can be extremely painful.
Enter regexes and the grep family of tools.
When you have a multi-gigabyte logfile, it's a lot less painful to look at just the entries generated by the service that you got alerted about. Even better to only look at the error messages from the service, and something as basic as grep -i yourservice < log | grep -i errorcode can convert a potentially multi-hour ordeal into a quick minute or two task.
debuggex.com/ will visualize regular expressions graphically.
Regex for Noobs - An illustrated guide to regex that aims to provide a gentle introduction for people who never have fiddled with regex, want to, but are kind of intimidated by the whole thing.
sed and awk Pocket Reference presents a concise summary of regular expressions and pattern matching, and summaries of sed and awk and how to use them to edit files and convert data from one format to another.
Serverless
Serverless doesn't mean no sysadmins, even though there aren't instances to administer. We need to change common processes that we rely on to monitor and manage services that run on serverless platforms. There are not system level metrics to understand how our application is working.
Here are a few resources to help:
Building observability into a serverless application (Video) Yan Cui presents some guidelines to implementing observability into serverless on AWS. The patterns in this talk can be applied to other platforms as well.
Source control
No matter what source control system you use (git, hg, perforce, whatever), you're going to have to write commit messages. Make them good. It may be obvious today why you made the change, but in six months or a year you won't have that context.
Explain why you made the change, not just what you changed. And no, the diff is not an explanation. Always start your commit messages with a single line that explains what you were trying to do in general, then go into more detail in the body. Talk about what you intend the change to do and why more than how you did it. If there's an issue or ticket number, include that in your commit message too, it'll give more context to your coworkers (or you in a year).
Good commit messages help the rest of your team understand what you're trying to do and make it easier for them to find logic errors in your pull requests - the code may be technically correct, but if they understand what you're trying to do, they can see when your code isn't actually doing what you say you want it to do, even when it is syntactically correct.
Here are a few articles that while focused on git apply to any source control system:
Testing is incredibly important and you should undertake this for your infrastructure as well as your applications.
Test Harnesses
Test Kitchenhttps://kitchen.ci - Test your configuration management tooling. Test kitchen was originally written to test chef cookbooks, but can be used for other configuration management systems as well.
Text Editors
Don't get involved in the Editor Wars. Just. Don't. Your choice of tool does not need defending. Nor does anyone else's choice.
However, you should care about your tools. You should be able to use them efficiently.
Vim
vim is a reality of life for SysAdmins. It is the one editor you can be sure is installed in even the most minimal *NIX or linux install. You must be able to do at least basic edits with it. You don't need to love it, but you will have to use it.
One of the biggest problems with emacs is that the defaults present a fairly different experience to what people are used to. Your first stop should be learning the basics using the built-in tutorial, followed by the mini-manual from tuhdo:
There are several excellent starter kits out there, with varying delineations of wizz-bang. Roughly sorted by wizz-bang, here are the starter kits that exist, with spacemacs being the most popular:
Use tools with which you are productive. If you want to use a GUI Text Editor or IDE, don't let anyone give you a hard time about that.
There are GUI versions of vim and emacs that have ardent followers.
Atom is a fairly new editor with significant traction and plugin ecosystem.
Sublime Text is another editor with an extensive plugin ecosystem and arguably one of the inspirations for Atom.
Visual Studio Code is a cross platform editor that is gaining traction in the marketplace.
Blogs and Podcasts
Arrested Devops is hosted by Matt Stratton, Trevor Hess, and Bridget Kromhout. ADO is the podcast that helps you achieve understanding, develop good practices, and operate your team and organization for maximum DevOps awesomeness.
Code as Craft is Etsy's ops blog and is full of well written examples of dealing with real-world problems at scale.
Corecursive - Each episode someone shares the fascinating story behind a piece of software being built.
DevOps'ish - A weekly newsletter assembled by open source contributor, DevOps leader, and Cloud Native Computing Foundation (CNCF) Ambassador Chris Short.
Julia Evans' Blog - Julia writes a great blog where she dives into interesting ops topics and explains them clearly.
Kitchen Soap - John Alspaw is the CTO at Etsy and writes a great blog about web operations and operating at scale and other things that are interesting to ops types.
Last Week in AWS - Corey Quinn's weekly newsletter about the latest goings-on in the world of AWS.
Last Week in Kubernetes Development - Weekly newsletter summarizing code activity in the Kubernetes project: merges, PRs, deprecations, version updates, release schedules, and the weekly community meeting.
Monitoring Weekly - Weekly compilation of curated articles, news and tools related to monitoring.
On the Metal - Bryan Cantrill and Jessie Frazelle host a podcast about all sorts of interesting aspects of computing.
SRE Weekly - SRE Weekly is a newsletter devoted to everything related to keeping a site or service available as consistently as possible.
Online Communities
DevOpsChat Slack is another community of DevOps minded folk with a diverse set of topic specific chat rooms. Home to Arrested DevOps.
Hangops Slack is a community of DevOps minded folk with many subject focused chat rooms.
PowerShell Slack is a community of PowerShell enthusiasts and Windows centric DevOps topics.
Windows Administration
Help wanted here.
Other Resources
Packetlife has some great cheat sheets and posters here for a lot of applications (wireshark and tcpdump for example) and networking principles. Well worth a look, even if you think you know the apps in question.
Free Services
Free-for-Dev is a list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev.
Miscellanea
awesome-scalability - An organized reading list for illustrating the patterns behind scalable, reliable, and performant large-scale systems.
-Alice Goldfuss wrote an excellent article, How to Get Into SRE, about her path to becoming an SRE.-Alice also gave a great presentation - Passing the Console: Fostering the Next Generation of Ops Professionals at LISA16.-Julia Evans has a couple of great resources on making your 1-on-1's with your manager more effective. 1-on-1s should not just be a status report on what you're working on - you should be using them to focus on more big picture goals (both yours and the organizations) and your career. Read her article on 1-on-1 ideas, and I recommend buying her Help, I have a Manager! zine.
Communication
Writing good documentation and design docs is as important as writing code. The more senior you are, the more writing you're going to have to do - communication skills are a must.
Email - Like it or not, you're going to write a lot of email in the course of your work. Lazarus Lazaridis wrote a good article on Composing Better Emails
Patrick McKenzie wrote a great blog post on salary negotiation. Salary negotiation is one of the few times in your life where a five minute conversation can earn you (or cost you!) thousands of dollars - be prepared.
The Holloway Guide to Equity Compensation - Stock options, RSUs, job offers, and taxes — a detailed reference, including hundreds of resources, explained from the ground up.
What I Wish I'd Known About Equity Before Joining A Unicorn - This is an excellent (though USA-centric) summary of how to value stock options and what the tax implications are and how to minimize potential tax. I heartily recommend reading it before you accept any offers involving stock or stock options as part of your compensation.
50 UNIX / Linux Sysadmin Tutorials To wrap this year, I’ve collected 50 UNIX / Linux sysadmin related tutorials that we’ve posted so far. This is lot of reading. Bookmark this article for your futur
sysadmin-utils This repository contains a small collection of scripts that might beuseful to sysadmins. I put it together myself to centralise the smalltools that I find useful, and it seems to be pop
There are three main methods of getting data into a TensorFlow program: Feeding: Python code provides the data when running each step. Reading from files: an input pipeline reads the data from files a
There are three main methods of getting data into a TensorFlow program: Feeding: Python code provides the data when running each step. Reading from files: an input pipeline reads the data from files a