Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
Set up a CI pipeline | CI | Exercise | ||
Deploy to Kubernetes | Deployment | Exercise | Solution |
You can answer it by describing what DevOps means to you and/or rely on how companies define it. I've put here a couple of examples.
Amazon:
"DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market."
Microsoft:
"DevOps is the union of people, process, and products to enable continuous delivery of value to our end users. The contraction of “Dev” and “Ops” refers to replacing siloed Development and Operations to create multidisciplinary teams that now work together with shared and efficient practices and tools. Essential DevOps practices include agile planning, continuous integration, continuous delivery, and monitoring of applications."
Red Hat:
"DevOps describes approaches to speeding up the processes by which an idea (like a new software feature, a request for enhancement, or a bug fix) goes from development to deployment in a production environment where it can provide value to the user. These approaches require that development teams and operations teams communicate frequently and approach their work with empathy for their teammates. Scalability and flexible provisioning are also necessary. With DevOps, those that need power the most, get it—through self service and automation. Developers, usually coding in a standard development environment, work closely with IT operations to speed software builds, tests, and releases—without sacrificing reliability."
Google:
"...The organizational and cultural movement that aims to increase software delivery velocity, improve service reliability, and build shared ownership among software stakeholders"
A couple of examples:
The answer can focus on:
Things to think about:
A few ideas to think about:
This is a more practical version of the previous question where you might be asked additional specific questions on the technology you chose
Things to think about:
A development practice where developers integrate code into a shared repository frequently. It can range from a couple of changes every day or a week to a couple of changes in one hour in larger scales.
Each piece of code (change/patch) is verified, to make the change is safe to merge. Today, it's a common practice to test the change using an automated build that makes sure the code can integrated. It can be one build which runs several tests in different levels (unit, functional, etc.) or several separate builds that all or some has to pass in order for the change to be merged into the repository.
A development strategy used by developers to release software automatically into production where any code commit must pass through an automated testing phase. Only when this is successful is the release considered production worthy. This eliminates any human interaction and should be implemented only after production-ready pipelines have been set with real-time monitoring and reporting of deployed assets. If any issues are detected in production it should be easy to rollback to previous working state.
For more info please read here
There are many answers for such a question, as CI processes vary, depending on the technologies used and the type of the project to where the change was submitted.Such processes can include one or more of the following stages:
An example of one possible answer:
A developer submitted a pull request to a project. The PR (pull request) triggered two jobs (or one combined job). One job for running lint test on the change and the second job for building a package which includes the submitted change, and running multiple api/scenario tests using that package. Once all tests passed and the change was approved by a maintainer/core, it's merged/pushed to the repository. If some of the tests failed, the change will not be allowed to merged/pushed to the repository.
A complete different answer or CI process, can describe how a developer pushes code to a repository, a workflow then triggered to build a container image and push it to the registry. Once in the registry, the k8s cluster is applied with the new changes.
A development strategy used to frequently deliver code to QA and Ops for testing. This entails having a staging area that has production like features where changes can only be accepted for production after a manual review. Because of this human entanglement there is usually a time lag between release and review making it slower and error prone as compared to continous deployment.
For more info please read here
There are multiple approaches as to where to store the CI/CD pipeline definitions:
Both have advantages and disadvantages.With "configuration->deployment" model for example, where you build one image to be used by multiple deployments, there is less chance of deployments being different from one another, so it has a clear advantage of a consistent environment.
In mutable infrastructure paradigm, changes are applied on top of the existing infrastructure and over timethe infrastructure builds up a history of changes. Ansible, Puppet and Chef are examples of tools whichfollow mutable infrastructure paradigm.
In immutable infrastructure paradigm, every change is actually a new infrastructure. So a changeto a server will result in a new server instead of updating it. Terraform is an example of technologywhich follows the immutable infrastructure paradigm.
Read this fantastic article on the topic.
From the article: "Thus, software distribution is about the mechanism and the community that takes the burden and decisions to build an assemblage of coherent software that can be shipped."
Different distributions can focus on different things like: focus on different environments (server vs. mobile vs. desktop), support specific hardware, specialize in different domains (security, multimedia, ...), etc. Basically, different aspects of the software and what it supports, get different priority in each distribution.
Wikipedia: "A software repository, or “repo” for short, is a storage location for software packages. Often a table of contents is stored, as well as metadata."
Read more here
Caching is fast access to frequently used resources which are computationally expensive or IO intensive and do not change often. There can be several layers of cache that can start from CPU caches to distributed cache systems. Common ones are in memory caching and distributed caching.
Caches are typically data structures that contains some data, such as a hashtable or dictionary. However, any data structure can provide caching capabilities, like set, sorted set, sorted dictionary etc. While, caching is used in many applications, they can create subtle bugs if not implemented correctly or used correctly. For example,cache invalidation, expiration or updating is usually quite challenging and hard.
Stateless applications don't store any data in the host which makes it ideal for horizontal scaling and microservices.Stateful applications depend on the storage to save state and data, typically databases are stateful applications.
Reliability, when used in DevOps context, is the ability of a system to recover from infrastructure failure or disruption. Part of it is also being able to scale based on your organization or team demands.
Styling, unit, functional, API, integration, smoke, scenario, ...
You should be able to explain those that you mention.
There are multiple ways to answer this question (there is no right and wrong here):
Wikipedia: "Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions"
Read about Chaos Engineering here
IAC (infrastructure as code) is a declerative approach of defining infrastructure or architecture of a system. Some implementations are ARM templates for Azure and Terraform that can work across multiple cloud providers.
Build artifacts are usually stored in a repository. They can be used in release pipelines for deployment purposes. Usually there is retention period on the build artifacts.
There are several deployment strategies:
* Rolling
* Blue green deployment
* Canary releases
* Recreate strategy
Configuration drift happens when in an environment of servers with the exact same configuration and software, a certain serveror servers are being applied with updates or configuration which other servers don't get and over time these servers becomeslightly different than all others.
This situation might lead to bugs which hard to identify and reproduce.
Declarative - You write code that specifies the desired end stateProcedural - You describe the steps to get to the desired end state
Declarative Tools - Terraform, Puppet, CloudFormationProcedural Tools - Ansible, Chef
To better emphasize the difference, consider creating two virtual instances/servers.In declarative style, you would specify two servers and the tool will figure out how to reach that state.In procedural style, you need to specify the steps to reach the end state of two instances/servers - for example, create a loop and in each iteration of the loop create one instance (running the loop twice of course).
Note: cross-dependency is when you have two or more changes to separate projects and you would like to test them in mutual build instead of testing each change separately.
GitLab: "GitOps is an operational framework that takes DevOps best practices used for application development such as version control, collaboration, compliance, and CI/CD tooling, and applies them to infrastructure automation".
Read more here
Google: "One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel."
Read more about it here
Google: "the SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services"
Read more about it here
Atlassian: "An error budget is the maximum amount of time that a technical system can fail without contractual consequences."
Read more about it here
Wrong. No system can guarantee 100% availability as no system is safe from experiencing zero downtime.Many systems and services will fall somewhere between 99% and 100% uptime (or at least this is how most systems and services should be).
* MTTF (mean time to failure) other known as uptime, can be defined as how long the system runs before if fails.
* MTTR (mean time to recover) on the other hand, is the amount of time it takes to repair a broken system.
* MTBF (mean time between failures) is the amount of time between failures of the system.
Google: "Monitoring is one of the primary means by which service owners keep track of a system’s health and availability"
Read more about it here
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
Jobs 101 | Jobs | Exercise | ||
Remove Jobs | Scripts - Jobs | Exercise | Solution | |
Remove Builds | Scripts - Builds | Exercise | Solution |
Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.
Jenkins integrates development life-cycle processes of all kinds, including build, document, test, package, stage, deploy, static analysis and much more.
This might be considered to be an opinionated answer:
You can report via:
Each has its own disadvantages and advantages. Emails for example, if sent too often, can be eventually disregarded or ignored.
The pipelines will have multiple stages:
Jenkins documentation provides some basic intro for securing your Jenkins server.
You can describe the UI way to add new nodes but better to explain how to do in a way that scales like a script or using dynamic source for nodes like one of the existing clouds.
Cloud computing refers to the delivery of on-demand computing servicesover the internet on a pay-as-you-go basis.
In simple words, Cloud computing is a service that lets you use any computingservice such as a server, storage, networking, databases, and intelligence,right through your browser without owning anything. You can do anything youcan think of unless it doesn’t require you to stay close to your hardware.
Cloud service providers are companies that establish public clouds, manage private clouds, or offer on-demand cloud computing components (also known as cloud computing services) like Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service(SaaS). Cloud services can reduce business process costs when compared to on-premise IT.
IAAS - Infrastructure as a ServicePAAS - Platform as a ServiceSAAS - Software as a Service
In cloud providers, someone else owns and manages the hardware, hire the relevant infrastructure teams and pays for real-estate (for both hardware and people). You can focus on your business.
In On-Premise solution, it's quite the opposite. You need to take care of hardware, infrastructure teams and pay for everything which can be quite expensive. On the other hand it's tailored to your needs.
The main idea behind serverless computing is that you don't need to manage the creation and configuration of server. All you need to focus on is splitting your app into multiple functions which will be triggered by some actions.
It's important to note that:
AWS definition: "AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost"
Read more about auto scaling here
False. Auto scaling adjusts capacity and this can mean removing some resources based on usage and performances.
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
Hello Function | Lambda | Exercise | Solution | |
URL Function | Lambda | Exercise | Solution |
Within each region, there are multiple isolated locations known as Availability Zones. Multiple availability zones ensure high availability in case one of them goes down.
Edge locations are basically content delivery network which caches data and insures lower latency and faster delivery to the users in any location. They are located in major cities in the world.
True.
Note: opinionated answer.
No. There are a couple of factors to consider when choosing a region (order doesn't mean anything):
Full explanation is hereIn short: it's used for managing users, groups, access policies & roles
True
A way for allowing a service of AWS to use another service of AWS. You assign roles to AWS resources.For example, you can make use of a role which allows EC2 service to acesses s3 buckets (read and write).
Policies documents used to give permissions as to what a user, group or role are able to do. Their format is JSON.
There can be several reasons for that. One of them is lack of policy. To solve that, the admin has to attach the user with a policy what allows him to access the s3 bucket.
Only a login access.
"a web service that provides secure, resizable compute capacity in the cloud".Read more here
Amazon Machine Images is "An Amazon Machine Image (AMI) provides the information required to launch an instance".Read more here
"the instance type that you specify determines the hardware of the host computer used for your instance"Read more about instance types here
False. From the above list only compute optimized is available.
"provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices."More on EBS here
On Demand - pay a fixed rate by the hour/second with no commitment. You can provision and terminate it at any given time.Reserved - you get capacity reservation, basically purchase an instance for a fixed time of period. The longer, the cheaper.Spot - Enables you to bid whatever price you want for instances or pay the spot price.Dedicated Hosts - physical EC2 server dedicated for your use.
"A security group acts as a virtual firewall that controls the traffic for one or more instances"More on this subject here
EBS
Standard RI - most significant discount + suited for steady-state usageConvertible RI - discount + change attribute of RI + suited for steady-state usageScheduled RI - launch within time windows you reserve
Learn more about EC2 RI here
AWS Lambda
AWS definition: "AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume."
Read more on it here
False. Charges are being made when the code is executed.
True
Amazon definition: "Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service. Customers such as Duolingo, Samsung, GE, and Cook Pad use ECS to run their most sensitive and mission critical applications because of its security, reliability, and scalability."
Learn more here
Amazon definition: "Amazon Elastic Container Registry (ECR) is a fully-managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images."
Learn more here
Amazon definition: "AWS Fargate is a serverless compute engine for containers that works with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS)."
Learn more here
S3 stands for 3 S, Simple Storage Service.S3 is a object storage service which is fast, scalable and durable. S3 enables customers to upload, download or store any file or object that is up to 5 TB in size.
More on S3 here
An S3 bucket is a resource which is similar to folders in a file system and allows storing objects, which consist of data.
True
Object Durability: The percent over a one-year time period that a file will not be lostObject Availability: The percent over a one-year time period that a file will be accessible
Each object has a storage class assigned to, affecting its availability and durability. This also has effect on costs.Storage classes offered today:
Standard:
Standard-IA (Infrequent Access)
One Zone-IA (Infrequent Access):
Intelligent-Tiering:
Glacier: Archive data with retrieval time ranging from minutes to hours
Glacier Deep Archive: Archive data that rarely, if ever, needs to be accessed with retrieval times in hours
Both Glacier and Glacier Deep Archive are:
More on storage classes here
Glacier Deep Archive
Expedited, Standard and Bulk
False. Unlimited capacity.
"AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage".More on Storage Gateway here
Explained in detail here
Stored Volumes - Data is located at customer's data center and periodically backed up to AWSCached Volumes - Data is stored in AWS cloud and cached at customer's data center for quick access
AWS definition: "Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket"
Learn more here
Amazon definition: "Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources."
Learn more here
"AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS."
Learn more here
RTO - The maximum acceptable length of time that your application can be offline.
RPO - The maximum acceptable length of time during which data might be lost from your application due to an incident.
Lowest - Multi-siteHighest - The cold method
AWS definition: "Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment."
More on CloudFront here
True
A transport solution which was designed for transferring large amounts of data (petabyte-scale) into and out the AWS cloud.
AWS definition: "Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions."
More on ELB here
The shared responsibility model defines what the customer is responsible for and what AWS is responsible for.
More on the shared responsibility model here
False. It is responsible for Hardware in its sites but not for security groups which created and managed by the users.
AWS definition: "apply to both the infrastructure layer and customer layers, but in completely separate contexts or perspectives. In a shared control, AWS provides the requirements for the infrastructure and the customer must provide their own control implementation within their use of AWS services"
Learn more about it here
AWS definition: "AWS Artifact is your go-to, central resource for compliance-related information that matters to you. It provides on-demand access to AWS’ security and compliance reports and select online agreements."
Read more about it here
AWS definition: "Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposure, vulnerabilities, and deviations from best practices.""
Learn more here
AWS definition: "AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS."
Amazon definition: "AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys on the AWS Cloud."
Learn more here
True
AWS definition: "KMS makes it easy for you to create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications."More on KMS here
It describes prohibited uses of the web services offered by AWS.More on AWS Acceptable Use Policy here
False. On some services, like EC2, CloudFront and RDS, penetration testing is allowed.
False.
False. Security key is an example of an MFA device.
Amazon definition: "Amazon Cognito handles user authentication and authorization for your web and mobile apps."
Learn more here
Amazon definition: "AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services and your internal connected resources."
Learn more here
Amazon definition: "You can create on-demand backups of your Amazon DynamoDB tables, or you can enable continuous backups using point-in-time recovery. For more information about on-demand backups, see On-Demand Backup and Restore for DynamoDB."
Learn more here
Amazon definition: "A global table is a collection of one or more replica tables, all owned by a single AWS account."
Learn more here
Amazon definition: "Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds..."
Learn more here
cloud data warehouse
Amazon Elasticache is a fully managed Redis or Memcached in-memory data store.It's great for use cases like two-tier web applications where the most frequently accesses data is stored in ElastiCache so response time is optimal.
A MySQL & Postgresql based relational database. Also, the default database proposed for the user when using RDS for creating a database.Great for use cases like two-tier web applications that has a MySQL or Postgresql database layer and you need automated backups for your application.
Amazon definition: "Amazon DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data."
Learn more here
EBS
AWS definition: "Amazon RDS Read Replicas provide enhanced performance and durability for RDS database (DB) instances. They make it easy to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads."Read more about here
"A logically isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define"Read more about it here.
False
True. Just to clarify, a single subnet resides entirely in one AZ.
"component that allows communication between instances in your VPC and the internet" (AWS docs).Read more about it here
True
docs.aws: "A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses."
False. Only one internet gateway can be attached to a single VPC.
False.
Allows you to connect your corporate network to AWS network.
AWS CodeDeploy
CloudFormation
Cognito
Lightsail
Cost Explorer
Trusted Advisor
AWS Snowball
AWS RedShift
VPC
Amazon Aurora
AWS Database Migration Service (DMS)
AWS CloudTrail
AWS RDS
AWS DynamoDB
AWS Rekognition
AWS X-Ray
SNS
AWS Athena
AWS Glue
Amazon GuardDuty
AWS Organizations
AWS WAF
CloudWatch
AWS Inspector
Route 53
Amazon DocumentDB
AWS Cognito
Simple Queue Service (SQS)
AWS Shield
ElastiCache
Amazon S3 Transfer Acceleration
Route 53
Lambda - to define a function that gets an input and returns a certain string
API Gateway - to define the URL trigger (= when you insert the URL, the function is invoked).
Kinesis
"Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service..."Some of Route 53 features:
More on Route 53 here
AWS definition: "Amazon CloudWatch is a monitoring and observability service..."
More on CloudWatch here
AWS definition: "AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account."
Read more on CloudTrail here
AWS definition: "a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems, and serverless applications."
Read more about it here
AWS definition: "AWS Organizations helps you centrally govern your environment as you grow and scale your workloads on AWS."More on Organizations here
AWS organizations service and the definition by Amazon: "SCPs offer central control over the maximum available permissions for all accounts in your organization, allowing you to ensure your accounts stay within your organization’s access control guidelines."
Learn more here
It mainly works on "pay-as-you-go" meaning you pay only for what are using and when you are using it.In s3 you pay for 1. How much data you are storing 2. Making requests (PUT, POST, ...)In EC2 it's based on the purchasing option (on-demand, spot, ...), instance type, AMI type and the region used.
More on AWS pricing model here
Amazon definition: "Amazon Connect is an easy to use omnichannel cloud contact center that helps companies provide superior customer service at a lower cost."
Learn more here
Amazon definition: "APN Consulting Partners are professional services firms that help customers of all types and sizes design, architect, build, migrate, and manage their workloads and applications on AWS, accelerating their journey to the cloud."
Learn more here
True. You pay differently based on the chosen region.
AWS Definition: "AWS Infrastructure Event Management is a structured program available to Enterprise Support customers (and Business Support customers for an additional fee) that helps you plan for large-scale events such as product or application launches, infrastructure migrations, and marketing events."
Amazon definition: "AWS CodeDeploy is a fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers."
Learn more here
AWS definition: "Lightsail is an easy-to-use cloud platform that offers you everything needed to build an application or website, plus a cost-effective, monthly plan."
AWS definition: "Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use."
Learn more here
Amazon definition: "You can use resource groups to organize your AWS resources. Resource groups make it easier to manage and automate tasks on large numbers of resources at one time. "
Learn more here
Amazon definition: "AWS Global Accelerator is a service that improves the availability and performance of your applications with local or global users..."
Learn more here
Amazon definition: "AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources."
Learn more here
AWS definition: "AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture."Learn more here
Amazon definition: "AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet."
Learn more about it here
"Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL."
Learn more about AWS Athena here
Amazon definition: "Amazon Cloud Directory is a highly available multi-tenant directory-based store in AWS. These directories scale automatically to hundreds of millions of objects as needed for applications."
Learn more here
AWS definition: "AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services...You can simply upload your code and Elastic Beanstalk automatically handles the deployment"
Learn more about it here
Amazon definition: "Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. You can think of Amazon SWF as a fully-managed state tracker and task coordinator in the Cloud."
Learn more on Amazon Simple Workflow Service here
AWS definition: "big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto."
Learn more here
AWS definition: "Quick Starts are built by AWS solutions architects and partners to help you deploy popular technologies on AWS, based on AWS best practices for security and high availability."
Read more here
Amazon definition: "AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS."
Learn more here
Amazon definition: "AWS Professional Services created the AWS Cloud Adoption Framework (AWS CAF) to help organizations design and travel an accelerated path to successful cloud adoption. "
Learn more here
AWS definition: "AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser"
Amazon definition: "AWS Application Discovery Service helps enterprise customers plan migration projects by gathering information about their on-premises data centers."
Learn more here
AWS definition: "The Well-Architected Framework has been developed to help cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications. Based on five pillars — operational excellence, security, reliability, performance efficiency, and cost optimization"
Learn more here
AWS LambdaAWS Athena
AWS definition: "Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications".
Learn more about it here
Ethernet simply refers to the most common type of Local Area Network (LAN) used today. A LAN—in contrast to a WAN (Wide Area Network), which spans a larger geographical area—is a connected network of computers in a small area, like your office, college campus, or even home.
A set of protocols that define how two or more devices can communicate with each other.To learn more about TCP/IP, read here
A MAC address is a unique identification number or code used to identify individual devices on the network.
Packets that are sent on the ethernet are always coming from a MAC address and sent to a MAC address. If a network adapter is receiving a packet, it is comparing the packet’s destination MAC address to the adapter’s own MAC address.
When a device sends a packet to the broadcast MAC address (FF:FF:FF:FF:FF:FF), it is delivered to all stations on the local network. It needs to be used in order for all devices to receive your packet at the datalink layer.
An Internet Protocol address (IP address) is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication.An IP address serves two main functions: host or network interface identification and location addressing.
A Subnet mask is a 32-bit number that masks an IP address, and divides the IP address into network address and host address. Subnet Mask is made by setting network bits to all "1"s and setting host bits to all "0"s. Within a given network, two host addresses are reserved for special purpose, and cannot be assigned to hosts. The "0" address is assigned a network address and "255" is assigned to a broadcast address, and they cannot be assigned to hosts.
For Example
| Address Class | No of Network Bits | No of Host Bits | Subnet mask | CIDR notation |
| ------------- | ------------------ | --------------- | --------------- | ------------- |
| A | 8 | 24 | 255.0.0.0 | /8 |
| A | 9 | 23 | 255.128.0.0 | /9 |
| A | 12 | 20 | 255.240.0.0 | /12 |
| A | 14 | 18 | 255.252.0.0 | /14 |
| B | 16 | 16 | 255.255.0.0 | /16 |
| B | 17 | 15 | 255.255.128.0 | /17 |
| B | 20 | 12 | 255.255.240.0 | /20 |
| B | 22 | 10 | 255.255.252.0 | /22 |
| C | 24 | 8 | 255.255.255.0 | /24 |
| C | 25 | 7 | 255.255.255.128 | /25 |
| C | 28 | 4 | 255.255.255.240 | /28 |
| C | 30 | 2 | 255.255.255.252 | /30 |
You can read more about the OSI model in penguintutor.com
Unitcast: One to one communication where there is one sender and one receiver.
Broadcast: Sending a message to everyone in the network. The address ff:ff:ff:ff:ff:ff is used for broadcasting.Two common protocols which use broadcast are ARP and DHCP.
Multicast: Sending a message to a group of subscribers. It can be one-to-many or many-to-many.
CSMA/CD stands for Carrier Sense Multiple Access / Collision Detection.Its primarily focus it to manage access to shared medium/bus where only one host can transmit at a given point of time.
CSMA/CD algorithm:
A router is a physical or virtual appliance that passes information between two or more packet-switched computer networks. A router inspects a given data packet's destination Internet Protocol address (IP address), calculates the best way for it to reach its destination and then forwards it accordingly.
Network Address Translation (NAT) is a process in which one or more local IP address is translated into one or more Global IP address and vice versa in order to provide Internet access to the local hosts.
A proxy server acts as a gateway between you and the internet. It’s an intermediary server separating end users from the websites they browse.
If you’re using a proxy server, internet traffic flows through the proxy server on its way to the address you requested. The request then comes back through that same proxy server (there are exceptions to this rule), and then the proxy server forwards the data received from the website to you.
roxy servers provide varying levels of functionality, security, and privacy depending on your use case, needs, or company policy.
TCP 3-way handshake or three-way handshake is a process which is used in a TCP/IP network to make a connection between server and client.
A three-way handshake is primarily used to create a TCP socket connection. It works when:
From wikipedia: "the length of time it takes for a signal to be sent plus the length of time it takes for an acknowledgement of that signal to be received"
Bonus question: what is the RTT of LAN?
TCP establishes a connection between the client and the server to guarantee the order of the packages, on the other hand, UDP does not establish a connection between client and server and doesn't handle package order. This makes UDP more lightweight than TCP and a perfect candidate for services like streaming.
Penguintutor.com provides a good explanation.
A default gateway serves as an access point or IP router that a networked computer uses to send information to a computer in another network or the internet.
ARP stands for Address Resolution Protocol. When you try to ping an IP address on your local network, say 192.168.1.1, your system has to turn the IP address 192.168.1.1 into a MAC address. This involves using ARP to resolve the address, hence its name.
Systems keep an ARP look-up table where they store information about what IP addresses are associated with what MAC addresses. When trying to send a packet to an IP address, the system will first consult this table to see if it already knows the MAC address. If there is a value cached, ARP is not used.
It stands for Dynamic Host Configuration Protocol, and allocates IP addresses, subnet masks and gateways to hosts. This is how it works:
Read more here
NAT stands for network address translation. It’s a way to map multiple local private addresses to a public one before transferring the information. Organizations that want multiple devices to employ a single IP address use NAT, as do most home routers.For example, your computer's private IP could be 192.168.1.100, but your router maps the traffic to it's public IP (e.g. 1.1.1.1). Any device on the internet would see the traffic coming from your public IP (1.1.1.1) instead of your private IP (192.168.1.100).
The exact meaning is usually depends on the context but overall data plane refers to all the functions that forward packets and/or frames from one interface to another while control plane refers to all the functions that make use of routing protocols.
There is also "Management Plane" which refers to monitoring and management functions.
Latency. To have a good latency, a search query should be forwarded to the closest datacenter.
Throughput. To have a good throughput, the upload stream should be routed to an underutilized link.
00110011110100011101
The internet refers to network of networks, transferring huge amounts of data around the globe.
The World Wide Web is an application running on millions of server, on top of the internet, accessed through what is know as the web browser
ISP (Internet Service Provider) is the local internet company provider.
A completely free application for testing your knowledge on Linux
Only you know :)
For example:
The -r (or -R in some commands) flag allows the user to run a certain command recursively. For example, listing all the files under the following tree is possible when done recursively (ls -R
):
/dir1/dir2/file1file2dir3/file3
To list all the files, one can run ls -R /dir1
These are files directly not displayed after performing a standard ls direct listing. An example of these files are .bashrc which are used to execute some scripts. Some also store configuration about services on your host like .KUBECONFIG. The command used to list them is, ls -a
myProgram < input.txt > executionOutput.txt
sed -i s/salad/burger/g
Using the mv
command.
rm -rf dir
cat or less
chmod 777 /tmp/x
cd ~
sed -i s/good/great/g /tmp/y
echo hello world
echo "hello world"
The echo command receives two separate arguments in the first execution and in the second execution it gets one argument which is the string "hello world". The output will be the same.
Using a pipe in Linux, allows you to send the output of one command to the input of another command. For example: cat /etc/services | wc -l
sed 's/1/2/g' /tmp/myFile # sed "s/1/2/g" is also fine
find . -iname "*.yaml" -exec sed -i "s/1/2/g" {} \;
history command or .bash_history file
df
you get "command not found". What could be wrong and how to fix it?Most likely the default/generated $PATH was somehow modified or overridden thus not containing /bin/
where df would normally go.This issue could also happen if bash_profile or any configuration file of your interpreter was wrongly modified, causing erratics behaviours.You would solve this by fixing your $PATH variable:
As to fix it there are several options:
PATH="$PATH":/user/bin:/..etc
Note: There are many ways of getting errors like this: if bash_profile or any configuration file of your interpreter was wrongly modified; causing erratics behaviours,permissions issues, bad compiled software (if you compiled it by yourself)... there is no answer that will be true 100% of the time.
You can use the commands cron
and at
.With cron, tasks are scheduled using the following format:
*/30 * * * * bash myscript.sh
Executes the script every 30 minutes.
The tasks are stored in a cron file, you can write in it using crontab -e
Alternatively if you are using a distro with systemd it's recommended to use systemd timers.
/
?The root of the filesystem. The beginning of the tree.
/tmp
folder is cleaned automatically, usually upon reboot.
Using the chmod
command.
777 - You give the owner, group and other: Execute (1), Write (2) and Read (4); 4+2+1 = 7. 644 - Owner has Read (4), Write (2), 4+2 = 6; Group and Other have Read (4). 750 - Owner has x+r+w, Group has Read (4) and Execute (1); 4+1 = 5. Other have no permissions.
chmod +x some_file
True
#!/bin/bash
#!/bin/bash
is She-bang
/bin/bash is the most common shell used as default shell for user login of the linux system. The shell’s name is an acronym for Bourne-again shell. Bash can execute the vast majority of scripts and thus is widely used because it has more features, is well developed and better syntax.
Depends on the language and settings used.When a script written in Bash fails to run a certain command it will keep running and will execute all other commands mentioned after the command which failed.Most of the time we would actually want the opposite to happen. In order to make Bash exist when a specific command fails, use 'set -e' in your script.
echo $0
echo $?
echo $$
echo $@
echo $#
Answer depends on the language you are using for writing your scripts. If Bash is used for example then:
If Python, then using pdb is very useful.
Using the keyword read
so for example read x
will wait for user input and will store it in the variable x.
continue
and break
. When do you use them if at all?x = 2
echo $x
Should be x=2
shuf -i 9999999-99999999 -n 1
:(){ :|:& };:
A short way of using if/else. An example:
[[ $a = 1 ]] && b="yes, equal" || b="nope"
diff <(ls /tmp) <(ls /var/tmp)
|
is not possible. It can be used when a command does not support
STDIN
or you need the output of multiple commands.
https://superuser.com/a/1060002/167769
A daemon is a program that runs in the background without direct control of the user, although the user can at any timetalk to the daemon.
systemd has many features such as user processes control/tracking, snapshot support, inhibitor locks..
If we visualize the unix/linux system in layers, systemd would fall directly after the linux kernel.
Hardware -> Kernel -> Daemons, System Libraries, Server Display.
journalctl
/var/log
tail -f <file_name>
dstat -t
is great for identifying network and disk issues.netstat -tnlaup
can be used to see which processes are running on which ports.lsof -i -P
can be used for the same purpose as netstat.ngrep -d any metafilter
for matching regex against payloads of packets.tcpdump
for capturing packetswireshark
same concept as tcpdump but with GUI (optional).
dstat -t
is great for identifying network and disk issues.opensnoop
can be used to see which files are being opened on the system (in real time).
strace
is great for understanding what your program does. It prints every system call your program executed.
top
will show you how much CPU percentage each process consumesperf
is a great choice for sampling profiler and in general, figuring out what your CPU cycles are "wasted" onflamegraphs
is great for CPU consumption visualization (http://www.brendangregg.com/flamegraphs.html)
top
for anything unusualdstat -t
to check if it's related to disk or network.sar
iostat
The kernel is part of the operating system and is responsible for tasks like:
uname -a
command
The operating system executes the kernel in protected memory to prevent anyone from changing (and risking it crashing). This is what is known as "Kernel space"."User space" is where users executes their commands or applications. It's important to create this separation since we can't rely on user applications to not tamper with the kernel, causing it to crash.
Applications can access system resources and indirectly the kernel space by making what is called "system calls".
Wikipedia Definition: "SSH or Secure Shell is a cryptographic network protocol for operating network services securely over an unsecured network."
Hostinger.com Definition: "SSH, or Secure Shell, is a remote administration protocol that allows users to control and modify their remote servers over the Internet."
An SSH server will have SSH daemon running. Depends on the distribution, you should be able to check whether the service is running (e.g. systemctl status sshd).
Telnet also allows you to connect to a remote host but as opposed to SSH where the communication is encrypted, in telnet, the data is sent in clear text, so it doesn't considered to be secured because anyone on the network can see what exactly is sent, including passwords.
~/.ssh/known_hosts
?It means that the key of the remote host was changed and doesn't match the one that stored on the machine (in ~/.ssh/known_hosts).
ssh-keygen
is used for?ls [XYZ]
matchls [^XYZ]
matchls [0-5]
matchgrep '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}' some_file
grep -E "error|failure" some_file
grep '[0-9]$' some_file
aaabbbccc.aaaaaaaaa
lines 1 and 3.
An exit code (or return code) represents the code returned by a child process to itsparent process.
0 is an exit code which represents success while anything higher than 1 represents error.Each number has different meaning, based on how the application was developed.
I consider this as a good blog post to read more about it: https://shapeshed.com/unix-exit-codes
Another way to ask this: what happens from the moment you turned on the server until you get a prompt
For each file (and directory) in Linux there is an inode, a data structure which stores meta datarelated to the file like its size, owner, permissions, etc.
File name (it's part of the directory file)
Run mount
mount
command but you get no output. How would you check what mounts you have on your system?cat /proc/mounts
Hard link is the same file, using the same inode.Soft link is a shortcut to another file, using a different inode.
False
True
True.
There are many answers for this question. One way is running df -T
du -sh
False. /tmp is cleared upon system boot while /var/tmp is cleared every a couple of days or not cleared at all (depends on distro).
One can use uptime
or top
This article summarizes the load average topic in a great way
pidstat
iostat -xz 1
You can use the commands top
and free
sar -n TCP,ETCP 1
ps -ef
You can achieve that by specifying & at the end of the command.As to why, since some commands/processes can take a lot of time to finishexecution or run forever, you may want to run them in the background instead of waiting for them to finish before gaining control again in current session.
The default signal is SIGTERM (15). This signal kills process gracefully which means it allows it to save current state configuration.
SIGTERM - default signal for terminating a processSIGHUP - common usage is for reloading configurationSIGKILL - a signal which cannot caught or ignored
To view all available signals run kill -l
kill 0
does?kill -0
does?A background process. Most of these processes are waiting for requests or set of conditions to be met before actually running anything.Some examples: sshd, crond, rpcbind.
Running (R) Uninterruptible Sleep (D) - The process is waiting for I/O Interruptible Sleep (S) Stopped (T) Dead (x) Zombie (z)
A process which has finished to run but has not exited.
One reason it happens is when a parent process is programmed incorrectly. Every parent process should execute wait() to get the exit code from the child process which finished to run. But when the parent isn't checking for the child exit code, the child process can still exists although it finished to run.
You can't kill a zombie process the regular way with kill -9
for example as it's already dead.
One way to kill zombie process is by sending SIGCHLD to the parent process telling it to terminate its child processes. This might not work if the parent process wasn't programmed properly. The invocation is kill -s SIGCHLD [parent_pid]
You can also try closing/terminating the parent process. This will make the zombie process a child of init (1) which does periodic cleanups and will at some point clean up the zombie process.
If you mention at any point ps command with arugments, be familiar with what these arguments does exactly.
strace
does? What about ltrace
?find /some_dir -iname *.yml -print0 | xargs -0 -r sed -i "s/1/2/g"
The ls executable is built for an incompatible architecture.
You can use the split
command this way: split -l 25 some_file
In Linux (and Unix) the first three file descriptors are:
This is a great article on the topic: https://www.computerhope.com/jargon/f/file-descriptor.htm
ip link show
The loopback interface is a special, virtual network interface that your computer uses to communicate with itself. It is used mainly for diagnostics and troubleshooting, and to connect to servers running on the local machine.
One of the following would work:
netstat -tnlp | grep <port_number>
lsof -i -n -P | grep <port_number>
False
Technically, yes.
Telnet is a type of client-server protocol that can be used to open a command line on a remote computer, typically a server.By default, all the data sent and received via telnet is transmitted in clear/plain text, therefore it should not be used as it does not encrypt any data between the client and the server.
One way would be ping6 ff02::1
There a couple of modes:
cat /etc/hostname
You can also run hostnamectl
or hostname
but that might print only a temporary hostname. The one in the file is the permanent one.
/etc/resolv.conf
is used for? What does it include?You can specify one or more of the following:
dig
host
nslookup
The answer depends on the distribution being used.
In Fedora/CentOS/RHEL/Rocky it can be done with rpm
or dnf
commands.In Ubuntu it can be done with the apt
command.
Package managers allow you to manage packages lifecycle as in installing, removing and updating the packages.
In addition, you can specify in a spec how a certain package will be installed - where to copy the files, which commands to run prior to the installation, post the installation, etc.
dnf provides /usr/bin/git
Depends on the init system.
Systemd: systemctl enable [service_name]
System V: update-rc.d [service_name]
and add this line id:5678:respawn:/bin/sh /path/to/app
to /etc/inittabUpstart: add Upstart init script at /etc/init/service.conf
ssh 127.0.0.1
but it fails with "connection refused". What could be the problem?Nginx, Apache httpd.
adduser user_name --shell=/bin/false --no-create-home
You can also add a user and then edit /etc/passwd.
su command.Use su - to switch to root
Re-install the OS IS NOT the right answer :)
Using the last
command.
/proc/cpuinfo
dmidecoode
lsblk
True. Only in kernel space they have full access to hardware resources.
True. Inside the namespace it's PID 1 while to the parent namespace the PID is a different one.
False. The opposite is true. Parent PID namespace is aware and has visibility of processes in child PID namespace and child PID namespace has no visibility as to what is going on in the parent PID namespace.
False. Network namespace has its own interfaces and routing table. There is no way (without creating a bridge for example) for one network namespace to reach another.
True
False. In every child user namespace, it's possible to have a separate root user with uid of 0.
In time namespaces processes can use different system time.
awk
command does? Have you used it? What for?From Wikipedia: "AWK is domain-specific language designed for text processing and typically used as a data extraction and reporting tool"
awk '{print $4}' file
awk 'length($0) > 79' file
lsof
command does? Have you used it? What for?Using system calls
fork() is used for creating a new process. It does so by cloning the calling process but the child process has its own PID and any memory locks, I/O operations and semaphores are not inherited.
Not enough memory to create a new process
wait() is used by a parent process to wait for the child process to finish execution.If wait is not used by a parent process then a child process might become a zombie process.
The kernel notifies the parent by sending the SIGCHLD to the parent.
The waitpid() is a non-blocking version of the wait() function.
It also supports using library routine (e.g. system()) to wait a child process without messing up with other children processes for which the process has not waited.
True in most cases though there are cases where wait() returns before the child exits.
It transforms the current running program into another program.
Given the name of an executable and some arguments, it loads the code and static data from the specified executable and overwrites its current code segment and current static code data. After initializing its memory space (like stack and heap) the OS runs the program passing any arguments as the argv of that process.
True
Since a succesful exec replace the current process, it can't return anything to the process that made the call.
fork(), exec() and the wait() system call is also included in this workflow.
Executes a program. The program is passed as a filename (or path) and must be a binary executable or a script.
"Pipes provide a unidirectional interprocess communication channel. A pipe has a read end and a write end. Data written to the write end of a pipe can be read from the read end of the pipe.A pipe is created using pipe(2), which returns two file descriptors, one referring to the read end of the pipe, the other referring to the write end."
ls -l
?Shell reads the input using getline() which reads the input file stream and stores into a buffer as a string
The buffer is broken down into tokens and stored in an array this way: {"ls", "-l", "NULL"}
Shell checks if an expansion is required (in case of ls *.c)
Once the program in memory, its execution starts. First by calling readdir()
Notes:
ls -l *.log
?alias x=y
does?This way provides a lot of flexibility. It allows the shell for example, to run code after the call to fork() but before the call to exec(). Such code can be used to alter the environment of the program it about to run.
The shell figures out, using the PATH variable, where the executable of the command resides in the filesystem. It then calls fork() to create a new child process for running the command. Once the fork was executed successfully, it calls a variant of exec() to execute the command and finally, waits the command to finish using wait(). When the child completes, the shell returns from wait() and prints out the prompt again.
There are a couple of ways to do that:
open("/my/file") = 5
read(5, "file content")
These system calls are reading the file /my/file
and 5 is the file descriptor number.
From wikipedia: a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point
ip a
you see there is a device called 'lo'. What is it and why do we need it?traceroute
command does? How does it works?Another common way to task this questions is "what part of the tcp header does traceroute modify?"
This is a good article about the topic: https://ops.tips/blog/how-linux-creates-sockets
MemFree - The amount of unused physical RAM in your systemMemAvailable - The amount of available memory for new workloads (without pushing system to use swap) based on MemFree, Active(file), Inactive(file), and SReclaimable.
ls, wc, dd, df, du, ps, ip, cp, cd ...
$OLDPWD
X=2
for example. But this will persist to new shells. To have it in new shells as well, use export X=2
It's used in commands to mark the end of commands options. One common example is when used with git to discard local changes: git checkout -- some_file
/dev
GPL v2
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
Fork 101 | Fork | Link | Link | |
Fork 102 | Fork | Link | Link |
From the book "Operating Systems: Three Easy Pieces":
"responsible for making it easy to run programs (even allowing you to seemingly run many at the same time), allowing programs to share memory, enabling programs to interact with devices, and other fun stuff like that".
A process is a running program. A program is one or more instructions and the program (or process) is executed by the operating system.
It would support the following:
False. It was true in the past but today's operating systems perform lazy loading which means only the relevant pieces required for the process to run are loaded first.
Even when using a system with one physical CPU, it's possible to allow multiple users to work on it and run programs. This is possible with time sharing where computing resources are shared in a way it seems to the user the system has multiple CPUs but in fact it's simply one CPU shared by applying multiprogramming and multi-tasking.
Somewhat the opposite of time sharing. While in time sharing a resource is used for a while by one entity and then the same resource can be used by another resource, in space sharing the space is shared by multiple entities but in a way where it's not being transferred between them.
It's used by one entity until this entity decides to get rid of it. Take for example storage. In storage, a file is yours until you decide to delete it.
CPU scheduler
The kernel is part of the operating system and is responsible for tasks like:
True
Buffer: Reserved place in RAM which is used to hold data for temporary purposesCache: Cache is usually used when processes reading and writing to the disk to make the process faster by making similar data used by different programs easily accessible.
Virtualization uses software to create an abstraction layer over computer hardware that allows the hardware elements of a single computer—processors, memory, storage and more - to be divided into multiple virtual computers, commonly called virtual machines (VMs).
Red Hat: "A hypervisor is software that creates and runs virtual machines (VMs). A hypervisor, sometimes called a virtual machine monitor (VMM), isolates the hypervisor operating system and resources from the virtual machines and enables the creation and management of those VMs."
Read more here
Hosted hypervisors and bare-metal hypervisors.
Due to having its own drivers and a direct access to hardware components, a baremetal hypervisor will often have better performances along with stability and scalability.
On the other hand, there will probably be some limitation regarding loading (any) drivers so a hosted hypervisor will usually benefit from having a better hardware compatibility.
Operating system virtualizationNetwork functions virtualizationDesktop virtualization
Yes, it's a operating-system-level virtualization, where the kernel is shared and allows to use multiple isolated user-spaces instances.
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
My First Task | Tasks | Exercise | Solution | |
Upgrade and Update Task | Tasks | Exercise | Solution | |
My First Playbook | Playbooks | Exercise | Solution |
Task – a call to a specific Ansible moduleModule – the actual unit of code executed by Ansible on your own host or a remote host. Modules are indexed by category (database, file, network, …) and also referred to as task plugins.
Play – One or more tasks executed on a given host(s)
Playbook – One or more plays. Each play can be executed on the same or different hosts
Role – Ansible roles allows you to group resources based on certain functionality/service such that they can be easily reused. In a role, you have directories for variables, defaults, files, templates, handlers, tasks, and metadata. You can then use the role by simply specifying it in your playbook.
Ansible is:
True. In immutable infrastructure approach, you'll replace infrastructure instead of modifying it.
Ansible rather follows the mutable infrastructure paradigm where it allows you to change the configuration of different components, but this approach is not perfect and has its own disadvantges like "configuration drift" where different components may reach different state for different reasons.
False. It uses a procedural style.
While it's possible to provision resources with Ansible, some prefer to use tools that follow immutable infrastructure paradigm.Ansible doesn't saves state by default. So a task that creates 5 instances for example, when executed again will create additional 5 instances (unlessadditional check is implemented or explicit names are provided) while other tools might check if 5 instances exist. If only 4 exist (by checking the state file for example), one additional instance will be created to reach the end goal of 5 instances.
ansible-doc -l
for list of modules and ansible-doc [module_name]
for detailed information on a specific moduleAn inventory file defines hosts and/or groups of hosts on which Ansible tasks executed upon.
An example of inventory file:
192.168.1.2
192.168.1.3
192.168.1.4
[web_servers]
190.40.2.20
190.40.2.21
190.40.2.22
A dynamic inventory file tracks hosts from one or more sources like cloud providers and CMDB systems.
You should use one when using external sources and especially when the hosts in your environment are being automatically
spun up and shut down, without you tracking every change in these sources.
- name: Install a package
package:
name: "zlib"
state: present
- name: Install a package
package:
name: "{{ package_name|default('zlib') }}"
state: present
- name: Install a package
package:
name: "zlib"
state: present
use: "{{ use_var }}"
With "default(omit)"
- name: Install a package
package:
name: "zlib"
state: present
use: "{{ use_var|default(omit) }}"
---
- name: Print information about my host
hosts: localhost
gather_facts: 'no'
tasks:
- name: Print hostname
debug:
msg: "It's me, {{ ansible_hostname }}"
When given a written code, always inspect it thoroughly. If your answer is “this will fail” then you are right. We are using a fact (ansible_hostname), which is a gathered piece of information from the host we are running on. But in this case, we disabled facts gathering (gather_facts: no) so the variable would be undefined which will result in failure.
when the environment variable 'BEST_YEAR' is empty or false.
{{ (certain_variable == 1) | ternary("one", "two") }}
{{ some_string_var | bool }}
- hosts: localhost
tasks:
- name: Install zlib
package:
name: zlib
state: present
---
- hosts: all
vars:
mario_file: /tmp/mario
package_list:
- 'zlib'
- 'vim'
tasks:
- name: Check for mario file
stat:
path: "{{ mario_file }}"
register: mario_f
- name: Install zlib and vim if mario file exists
become: "yes"
package:
name: "{{ item }}"
state: present
with_items: "{{ package_list }}"
when: mario_f.stat.exists
- name: Ensure all files exist
assert:
that:
- item.stat.exists
loop: "{{ files_list }}"
I'm <HOSTNAME> and my operating system is <OS>
Replace and with the actual data for the specific host you are running on
The playbook to deploy the system_info file
---
- name: Deploy /tmp/system_info file
hosts: all:!controllers
tasks:
- name: Deploy /tmp/system_info
template:
src: system_info.j2
dest: /tmp/system_info
The content of the system_info.j2 template
# {{ ansible_managed }}
I'm {{ ansible_hostname }} and my operating system is {{ ansible_distribution }
According to variable precedence, which one will be used?
The right answer is ‘toad’.
Variable precedence is about how variables override each other when they set in different locations. If you didn’t experience it so far I’m sure at some point you will, which makes it a useful topic to be aware of.
In the context of our question, the order will be extra vars (always override any other variable) -> host facts -> inventory variables -> role defaults (the weakest).
Here is the order of precedence from least to greatest (the last listed variables winning prioritization):
A full list can be found at PlayBook Variables . Also, note there is a significant difference between Ansible 1.x and 2.x.
Serial
is like running the playbook for each host in turn, waiting for completion of the complete playbook before moving on to the next host. forks
=1 means run the first task in a play on one host before running the same task on the next host, so the first task will be run for each host before the next task is touched. Default fork is 5 in ansible.
[defaults]
forks = 30
- hosts: webservers
serial: 1
tasks:
- name: ...
Ansible also supports throttle
This keyword limits the number of workers up to the maximum set via the forks setting or serial. This can be useful in restricting tasks that may be CPU-intensive or interact with a rate-limiting API
tasks:
- command: /path/to/cpu_intensive_command
throttle: 1
def cap(self, string):
return string.capitalize()
Goku = 9001
Vegeta = 5200
Trunks = 6000
Gotenks = 32
With one task, switch the content to:
Goku = 9001
Vegeta = 250
Trunks = 40
Gotenks = 32
- name: Change saiyans levels
lineinfile:
dest: /tmp/exercise
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
with_items:
- { regexp: '^Vegeta', line: 'Vegeta = 250' }
- { regexp: '^Trunks', line: 'Trunks = 40' }
...
False. Ansible will execute a single task on all hosts before moving to the next task in a play. As for today, it uses 5 forks by default.
This behaviour is described as "strategy" in Ansible and it's configurable.
A strategy in Ansible describes how Ansible will execute the different tasks on the hosts. By default Ansible is using the "Linear strategy" which defines that each task will run on all hosts before proceeding to the next task.
serial
keyword is used for?It's used to specify the number (or percentage) of hosts to run the full play on, before moving to next number of hosts in the group.
For example:
- name: Some play
hosts: databases
serial: 4
If your group has 8 hosts. It will run the whole play on 4 hosts and then the same play on another 4 hosts.
"{{ some_var | type_debug }}"
Terraform.io: "Terraform is an infrastructure as code (IaC) tool that allows you to build, change, and version infrastructure safely and efficiently."
A common wrong answer is to say that Ansible and Puppet are configuration management toolsand Terraform is a provisioning tool. While technically true, it doesn't mean Ansible and Puppet can'tbe used for provisioning infrastructure. Also, it doesn't explain why Terraform should be used overCloudFormation if at all.
The benefits of Terraform over the other tools:
terraform_directoryproviders.tf -> List providers (source, version, etc.)variables.tf -> any variable used in other files such as main.tfmain.tf -> Lists the resources
False. Terraform follows immutable infrastructure paradigm.
A configuration is a root module along with a tree of child modules that are called as dependencies from the root module.
terraform init
terraform plan
terraform validate
terraform apply
terraform init
scans your code to figure which providers are you using and download them.terraform plan
will let you see what terraform is about to do before actually doing it.terraform validate
checks if configuration is syntactically valid and internally consistent within a directory.terraform apply
will provision the resources specified in the .tf files.
HashiCorp: "Terraform uses resource blocks to manage infrastructure, such as virtual networks, compute instances, or higher-level components such as DNS records. Resource blocks represent one or more infrastructure objects in your Terraform configuration."
aws_instance.web_server
True
resource_type.resource_name.attribute_name
. They are set by the provider or API usually.terraform.io: "Terraform relies on plugins called "providers" to interact with cloud providers, SaaS providers, and other APIs...Each provider adds a set of resource types and/or data sources that Terraform can manage. Every resource type is implemented by a provider; without providers, Terraform can't manage any kind of infrastructure."
libvirt
Input variables serve as parameters to the module in Terraform. They allow you for example to define once the value of a variable and use that variable in different places in the module so next time you would want to change the value, you will change it in one place instead of changing the value in different places in the module.
variable "app_id" {
type = string
description = "The id of application"
default = "some_value"
}
Usually they are defined in their own file (vars.tf for example).
They are referenced with var.VARIABLE_NAME
vars.tf:
variable "memory" {
type = string
default "8192"
}
variable "cpu" {
type = string
default = "4"
}
main.tf:
resource "libvirt_domain" "vm1" {
name = "vm1"
memory = var.memory
cpu = var.cpu
}
Using validation
block
variable "some_var" {
type = number
validation {
condition = var.some_var > 1
error_message = "you have to specify a number greater than 1"
}
}
It doesn't show its value when you run terraform apply
or terraform plan
but eventually it's still recorded in the state file.
True
terraform.tfvars
-var
or -var-file
According to varaible precedence, which source will be used first?
The order is:
terraform.tfvars
-var
or -var-file
Using .tfvars
file which contains variable consists of simple variable names assignments this way:
x = 2
y = "mario"
z = "luigi"
terraform.tfstate
file is used for?It keeps track of the IDs of created resources so that Terraform knows what it's managing.
terraform state mv
As such, tfstate shouldn't be stored in git repositories. secured storage such as secured buckets, is a better option.
If there are two users or processes concurrently editing the state file it can result in invalid state file that doesn't actually represents the state of resources.
To avoid that, Terraform can apply state locking if the backend supports that. For example, AWS s3 supports state locking and consistency via DynamoDB. Often, if the backend support it, Terraform will make use of state locking automatically so nothing is required from the user to activate it.
There is no right or wrong here, but it seems that the overall preferred way is to have a dedicated state file per environment.
terraform apply
?You use it this way: variable “my_var” {}
Provisioners used to execute actions on local or remote machine. It's extremely useful in case you provisioned an instance and you want to make a couple of changes in the machine you've created without manually ssh into it after Terraform finished to run and manually run them.
local-exec
and remote-exec
in the context of provisioners?It's a resource which was successfully created but failed during provisioning. Terraform will fail and mark this resource as "tainted".
terraform taint
does?terraform taint resource.id
manually marks the resource as tainted in the state file. So when you run terraform apply
the next time, the resource will be destroyed and recreated.
stringnumberboollist()set()map()object({<ATTR_NAME> = , ... })tuple([, ...])
There are quite a few cases you might need to use them:
aws_iam_policy_document
terraform output
does?remote_state
A Terraform module is a set of Terraform configuration files in a single directory. Modules are small, reusable Terraform configurations that let you manage a group of related resources as if they were a single resource. Even a simple configuration consisting of a single directory with one or more .tf files is a module. When you run Terraform commands directly from such a directory, it is considered the root module. So in this sense, every Terraform configuration is part of a module.
The Terraform Registry provides a centralized location for official and community-managed providers and modules.
remote-exec
and local-exec
terratest
, and to test that a module can be initialized, can create resources, and can destroy those resources cleanly.
.tfvars
files or CLI arguments, how can you inject dependencies from other modules?remote-state
to lookup the outputs from other modules. It is also common in the community to use a tool called terragrunt
to explicitly inject variables between modules.
Terraform import is used to import existing infrastucture. It allows you to bring resources created by some other means (eg. manually launched cloud resources) and bring it under Terraform management.
terraform import RESOURCE ID
eg. Let's say you want to import an aws instance. Then you'll perform following:
resource "aws_instance" "tf_aws_instance" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
tags = {
Name = "import-me"
}
}
terraform import aws_instance.tf_aws_instance i-12345678
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
My First Dockerfile | Dockerfile | Link | Link |
Containers are a form of operating system virtualization. A single container might be used to run anything from a small microservice or software process to a larger application. Inside a container are all the necessary executables, binary code, libraries, and configuration files, making them easy to ship and run with same expected results on different machines.
The primary difference between containers and VMs is that containers allow you to virtualizemultiple workloads on the operating system while in the case of VMs the hardware is being virtualized to run multiple machines each with its own OS.You can also think about it as containers are for OS-level virtualization while VMs are for hardware virtualization.
You should choose VMs when:
You should choose containers when:
Through the use of namespaces and cgroups. Linux kernel has several types of namespaces:
Docker CLI passes your request to Docker daemon.Docker daemon downloads the image from Docker HubDocker daemon creates a new container by using the image it downloadedDocker daemon redirects output from container to Docker CLI which redirects it to the standard output
dockerd - The Docker daemon itself. The highest level component in your list and also the only 'Docker' product listed. Provides all the nice UX features of Docker.
(docker-)containerd - Also a daemon, listening on a Unix socket, exposes gRPC endpoints. Handles all the low-level container management tasks, storage, image distribution, network attachment, etc...
(docker-)containerd-ctr - A lightweight CLI to directly communicate with containerd. Think of it as how 'docker' is to 'dockerd'.
(docker-)runc - A lightweight binary for actually running containers. Deals with the low-level interfacing with Linux capabilities like cgroups, namespaces, etc...
(docker-)containerd-shim - After runC actually runs the container, it exits (allowing us to not have any long-running processes responsible for our containers). The shim is the component which sits between containerd and runc to facilitate this.
In short:
Cgroups = limits how much you can use;namespaces = limits what you can see (and therefore use)
Cgroups involve resource metering and limiting:memoryCPUblock I/Onetwork
Namespaces provide processes with their own view of the system
Multiple namespaces: pid,net, mnt, uts, ipc, user
docker.io/library/busybox:latest resolved to a manifestList object with 9 entries; looking for a unknown/amd64 match
found match for linux/amd64 with media type application/vnd.docker.distribution.manifest.v2+json, digest sha256:400ee2ed939df769d4681023810d2e4fb9479b8401d97003c710d0e20f7c49c6
pulling blob "sha256:61c5ed1cbdf8e801f3b73d906c61261ad916b2532d6756e7c4fbcacb975299fb Downloaded 61c5ed1cbdf8 to tempfile /var/lib/docker/tmp/GetImageBlob909736690
Applying tar in /var/lib/docker/overlay2/507df36fe373108f19df4b22a07d10de7800f33c9613acb139827ba2645444f7/diff" storage-driver=overlay2
Applied tar sha256:514c3a3e64d4ebf15f482c9e8909d130bcd53bcc452f0225b0a04744de7b8c43 to 507df36fe373108f19df4b22a07d10de7800f33c9613acb139827ba2645444f7, size: 1223534
podman run
or docker run
Create a new image from a container’s changes
Docker can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.
COPY takes in a src and destination. It only lets you copy in a local file or directory from your host (the machine building the Docker image) into the Docker image itself.ADD lets you do that too, but it also supports 2 other sources. First, you can use a URL instead of a local file / directory. Secondly, you can extract a tar file from the source directly into the destination.Although ADD and COPY are functionally similar, generally speaking, COPY is preferred. That’s because it’s more transparent than ADD. COPY only supports the basic copying of local files into the container, while ADD has some features (like local-only tar extraction and remote URL support) that are not immediately obvious.
RUN lets you execute commands inside of your Docker image. These commands get executed once at build time and get written into your Docker image as a new layer.CMD is the command the container executes by default when you launch the built image. A Dockerfile can only have one CMD.You could say that CMD is a Docker run-time operation, meaning it’s not something that gets executed at build time. It happens when you run an image. A running image is called a container.
A common answer to this is to use hadolint project which is a linter based on Dockerfile best practices.
Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.
For example, you can use it to set up ELK stack where the services are: elasticsearch, logstash and kibana. Each running in its own container.
docker-compose up
to run the servicesDocker Hub is a native Docker registry service which allows you to run pulland push commands to install and deploy Docker images from the Docker Hub.
Docker Cloud is built on top of the Docker Hub so Docker Cloud providesyou with more options/features compared to Docker Hub. One example isSwarm management which means you can create new swarms in Docker Cloud.
A Docker image is built up from a series of layers. Each layer represents an instruction in the image’s Dockerfile. Each layer except the very last one is read-only.Each layer is only a set of differences from the layer before it. The layers are stacked on top of each other. When you create a new container, you add a new writable layer on top of the underlying layers. This layer is often called the “container layer”. All changes made to the running container, such as writing new files, modifying existing files, and deleting files, are written to this thin writable container layer.The major difference between a container and an image is the top writable layer. All writes to the container that add new or modify existing data are stored in this writable layer. When the container is deleted, the writable layer is also deleted. The underlying image remains unchanged.Because each container has its own writable container layer, and all changes are stored in this container layer, multiple containers can share access to the same underlying image and yet have their own data state.
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
My First Pod | Pods | Exercise | Solution | |
"Killing" Containers | Pods | Exercise | Solution | |
Creating a Service | Service | Exercise | Solution | |
Creating a ReplicaSet | ReplicaSet | Exercise | Solution | |
Operating ReplicaSets | ReplicaSet | Exercise | Solution | |
ReplicaSets Selectors | ReplicaSet | Exercise | Solution |
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.
To understand what Kubernetes is good for, let's look at some examples:
Red Hat Definition: "A Kubernetes cluster is a set of node machines for running containerized applications. If you’re running Kubernetes, you’re running a cluster.At a minimum, a cluster contains a worker node and a master node."
Read more here
metadata, kind and apiVersion
Kubectl is the Kubernetes command line tool that allows you to run commands against Kubernetes clusters. For example, you can use kubectl to deploy applications, inspect and manage cluster resources, and view logs.
A node is a virtual or a physical machine that serves as a worker for running the applications.
It's recommended to have at least 3 nodes in a production environment.
The master coordinates all the workflows in the cluster:
kubectl get nodes
False. A Kubernetes cluster consists of at least 1 master and can have 0 workers (although that wouldn't be very useful...)
A Pod is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers.
Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.
kubectl run my-pod --image=nginx:alpine --restart=Never
If you are a Kubernetes beginner you should know that this is not a common way to run Pods. The common way is to run a Deployment which in turn runs Pod(s).
In addition, Pods and/or Deployments are usually defined in files rather than executed directly using only the CLI arguments.
Pods are usually indeed not created directly. You'll notice that Pods are usually created as part of another entities such as Deployments or ReplicaSets.
If a Pod dies, Kubernetes will not bring it back. This is why it's more useful for example to define ReplicaSets that will make sure that a given number of Pods will always run, even after a certain Pod dies.
A pod can include multiple containers but in most cases it would probably be one container per pod.
A web application with separate (= in their own containers) logging and monitoring components/adapters is one examples.
A CI/CD pipeline (using Tekton for example) can run multiple containers in one Pod if a Task contains multiple commands.
False. By default, pods are non-isolated = pods accept traffic from any source.
False. "Pending" is after the Pod was accepted by the cluster, but the container can't run for different reasons like images not yet downloaded.
kubectl get po
kubectl get pods --all-namespaces
False. A single Pod can run on a single node.
kubectl delete pod pod_name
kubectl get po -o wide
Read more about it here
True.
kubectl run web --image nginxinc/nginx-unprivileged
kubectl describe pods <POD_NAME>
it will tell whether the container is running:Status: Running
kubectl exec web -- ls
kubectl run database --image mongo
you see the status is "CrashLoopBackOff". What could possibly went wrong and what do you do to confirm?"CrashLoopBackOff" means the Pod is starting, crashing, starting...and so it repeats itself.
There are many different reasons to get this error - lack of permissions, init-container misconfiguration, persistent volume connection issue, etc.
One of the ways to check why it happened it to run kubectl describe po <POD_NAME>
and having a look at the exit code
Last State: Terminated
Reason: Error
Exit Code: 100
Another way to check what's going on, is to run kubectl logs <POD_NAME>
. This will provide us with the logs from the containers running in that Pod.
livenessProbe:
exec:
command:
- cat
- /appStatus
initialDelaySeconds: 10
periodSeconds: 5
These lines make use of liveness probe
. It's used to restart a container when it reaches a non-desired state.
In this case, if the command cat /appStatus
fails, Kubernetes will kill the container and will apply the restart policy. The initialDelaySeconds: 10
means that Kubelet will wait 10 seconds before running the command/probe for the first time. From that point on, it will run it every 5 seconds, as defined with periodSeconds
readinessProbe:
tcpSocket:
port: 2017
initialDelaySeconds: 15
periodSeconds: 20
They define a readiness probe where the Pod will not be marked as "Ready" before it will be possible to connect to port 2017 of the container. The first check/probe will start after 15 seconds from the moment the container started to run and will continue to run the check/probe every 20 seconds until it will manage to connect to the defined port.
It wasn't able to pull the image specified for running the container(s). This can happen if the client didn't authenticated for example.
More details can be obtained with kubectl describe po <POD_NAME>
.
TERM
signal is sent to kill the main processes inside the containers of the given PodKILL
signal is used to kill the processes forcefully and the containers as wellLiveness probes is a useful mechanism used for restarting the container when a certain check/probe, the user has defined, fails.
For example, the user can define that the command cat /app/status
will run every X seconds and the moment this command fails, the container will be restarted.
You can read more about it in kubernetes.io
readiness probes used by Kubelet to know when a container is ready to start running, accepting traffic.
For example, a readiness probe can be to connect port 8080 on a container. Once Kubelet manages to connect it, the Pod is marked as ready
You can read more about it in kubernetes.io
Only containers whose state set to Success will be able to receive requests sent to the Service.
A Kubernetes Deployment is used to tell Kubernetes how to create or modify instances of the pods that hold a containerized application.Deployments can scale the number of replica pods, enable rollout of updated code in a controlled manner, or roll back to an earlier deployment version if necessary.
A Deployment is a declarative statement for the desired state for Pods and Replica Sets.
cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
EOF
kubectl edit deployment some-deployment
The pod will terminate and another, new pod, will be created.
Also, when looking at the replicaset, you'll see the old replica doesn't have any pods and a new replicaset is created.
One way is by specifying the deployment name: kubectl delete deployment [deployment_name]
Another way is using the deployment configuration file: kubectl delete -f deployment.yaml
The pod related to the deployment will terminate and the replicaset will be removed.
Using a Service.
"An abstract way to expose an application running on a set of Pods as a network service." - read more here
In simpler words, it allows you to add an internal or external connectivity to a certain application running in a container.
True
The default is ClusterIP and it's used for exposing a port internally. It's useful when you want to enable internal communication between Pods and prevent any external access.
kubctl describe service <SERVICE_NAME>
kubectl expose rs some-replicaset --name=replicaset-svc --target-port=2017 --type=NodePort
It exposes a ReplicaSet by creating a service called 'replicaset-svc'. The exposed port is 2017 and the service type is NodePort which means it will be reachable externally.
Run kubectl describe service
and if the IPs from "Endpoints" match any IPs from the output of kubectl get pod -o wide
spec:
selector:
app: some-app
ports:
- protocol: TCP
port: 8081
targetPort: 8081
Adding type: LoadBalancer
and nodePort
spec:
selector:
app: some-app
type: LoadBalancer
ports:
- protocol: TCP
port: 8081
targetPort: 8081
nodePort: 32412
Ingress
From Kubernetes docs: "Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource."
Read more here
metadata:
name: someapp-ingress
spec:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: someapp-ingress
spec:
rules:
- host: my.host
http:
paths:
- backend:
serviceName: someapp-internal-service
servicePort: 8080
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: someapp-ingress
spec:
rules:
- host: my.host
http:
paths:
- backend:
serviceName: someapp-internal-service
servicePort: 8080
host is the entry point of the cluster so basically a valid domain address that maps to cluster's node IP address
the http line used for specifying that incoming requests will be forwarded to the internal service using http.
backend is referencing the internal service (serviceName is the name under metadata and servicePort is the port under the ports section).
An implementation for Ingress. It's basically another pod (or set of pods) that does evaluates and processes Ingress rules and this it manages all the redirections.
There are multiple Ingress Controller implementations (the one from Kubernetes is Kubernetes Nginx Ingress Controller).
kubectl get ingress
It specifies what do with an incoming request to the Kubernetes cluster that isn't mapped to any backend (= no rule to for mapping the request to a service). If the default backend service isn't defined, it's recommended to define so users still see some kind of message instead of nothing or unclear error.
Create Service resource that specifies the name of the default backend as reflected in kubectl desrcibe ingress ...
and the port under the ports section.
Add tls and secretName entries.
spec:
tls:
- hosts:
- some_app.com
secretName: someapp-secret-tls
True
Network Policies
spec:
replicas: 2
selector:
matchLabels:
type: backend
template:
metadata:
labels:
type: backend
spec:
containers:
- name: httpd-yup
image: httpd
It defines a replicaset for Pods whose type is set to "backend" so at any given point of time there will be 2 concurrent Pods running.
kubernetes.io: "A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods."
In simpler words, a ReplicaSet will ensure the specified number of Pods replicas is running for a selected Pod. If there are more Pods than defined in the ReplicaSet, some will be removed. If there are less than what is defined in the ReplicaSet then, then more replicas will be added.
kubectl delete po ...
?The ReplicaSet will create a new Pod in order to reach the desired number of replicas.
False. It will terminate one of the Pods to reach the desired state of 2 replicas.
kubectl get rs
Yes, with --cascase=false
.
kubectl delete -f rs.yaml --cascade=false
1
kubectl get rs
means?NAME DESIRED CURRENT READY AGEweb 2 2 0 2m23s
The replicaset web
has 2 replicas. It seems that the containers inside the Pod(s) are not yet running since the value of READY is 0. It might be normal since it takes time for some containers to start running and it might be due to an error. Running kubectl describe po POD_NAME
or kubectl logs POD_NAME
can give us more information.
kubectl get rs
and while DESIRED is set to 2, you see that READY is set to 0. What are some possible reasons for it to be 0?False. The Pods can be already running and initially they can be created by any object. It doesn't matter for the ReplicaSet and not a requirement for it to acquire and monitor them.
False. It will take care of running the missing Pods.
The field template
in spec section is mandatory. It's used by the ReplicaSet to create new Pods when needed.
kubectl describe rs <ReplicaSet Name>
It will be visible under Events
(the very last lines)
True (and not only the Pods but anything else it created).
True. When the label, used by a ReplicaSet in the selector field, removed from a Pod, that Pod no longer controlled by the ReplicaSet and the ReplicaSet will create a new Pod to compensate for the one it "lost".
kubernetes.io: "NetworkPolicies are an application-centric construct which allow you to specify how a pod is allowed to communicate with various network "entities"..."
In simpler words, Network Policies specify how pods are allowed/disallowed to communicate with each other and/or other network endpoints.
False. By default pods are non-isolated.
Denied. Both source and destination policies has to allow traffic for it to be allowed.
It has three main parts:
YAML
kubectl get deployment [deployment_name] -o yaml
etcd
True
True
True
Namespaces allow you split your cluster into virtual clusters where you can group your applications in a way that makes sense and is completely separated from the other groups (so you can for example create an app with the same name in two different namespaces)
When using the default namespace alone, it becomes hard over time to get an overview of all the applications you manage in your cluster. Namespaces make it easier to organize the applications into groups that makes sense, like a namespace of all the monitoring applications and a namespace for all the security applications, etc.
Namespaces can also be useful for managing Blue/Green environments where each namespace can include a different version of an app and also share resources that are in other namespaces (namespaces like logging, monitoring, etc.).
Another use case for namespaces is one cluster, multiple teams. When multiple teams use the same cluster, they might end up stepping on each others toes. For example if they end up creating an app with the same name it means one of the teams overriden the app of the other team because there can't be too apps in Kubernetes with the same name (in the same namespace).
False. When a namespace is deleted, the resources in that namespace are deleted as well.
kubectl get namespaces
kubectl config view | grep namespace
It holds information on hearbeats of nodes. Each node gets an object which holds information about its availability.
One way is by running kubectl create namespace [NAMESPACE_NAME]
Another way is by using namespace configuration file:
apiVersion: v1
kind: ConfigMap
metadata:
name: some-cofngimap
namespace: some-namespace
Any resource you create while using Kubernetes.
True. With namespaces you can limit CPU, RAM and storage usage.
kubectl config set-context --current --namespace=some-namespace
and validate with kubectl config view --minify | grep namespace:
OR
kubens some-namespace
kubectl create quota some-quota --hard-cpu=2,pods=2
Service.
No, you would have to create separate namespace in y namespace.
apiVersion: v1
kind: ConfigMap
metadata:
name: some-configmap
data:
some_url: samurai.jack
It's referencing the service "samurai" in the namespace called "jack".
Volume and Node.
kubectl api-resources --namespaced=true
One way is by specifying --namespace like this: kubectl apply -f my_component.yaml --namespace=some-namespace
Another way is by specifying it in the YAML itself:
apiVersion: v1
kind: ConfigMap
metadata:
name: some-configmap
namespace: some-namespace
and you can verify with: kubectl get configmap -n some-namespace
kubectl exec
does?kubectl get all
does?kubectl get pod
does?kubectl get all | grep [APP_NAME]
kubectl apply -f [file]
does?kubectl api-resources --namespaced=false
does?Lists the components that doesn't bound to a namespace.
kubectl describe pod pod_name
kubectl exec some-pod -it -- ls
kubectl expose deploy some-deployment --port=80 --target-port=8080
kubectl run nginx --image=nginx --restart=Never --port 80 --expose
kubectl create deployment kubernetes-httpd --image=httpd
kubectl scale deploy some-deployment --replicas=8
kubectl api-resources --namespaced=false
kubectl delete pods --field-selector=status.phase!='Running'
kubectl logs [pod-name]
command does?kubectl describe pod [pod name] does?
command does?kubectl top pod
kubectl get componentstatus
does?Outputs the status of each of the control plane components.
Minikube is a lightweight Kubernetes implementation. It create a local virtual machine and deploys a simple (single node) cluster.
Start by inspecting the pods status. we can use the command kubectl get pods
(--all-namespaces for pods in system namespace)
If we see "Error" status, we can keep debugging by running the command kubectl describe pod [name]
. In case we still don't see anything useful we can try stern for log tailing.
In case we find out there was a temporary issue with the pod or the system, we can try restarting the pod with the following kubectl scale deployment [name] --replicas=0
Setting the replicas to 0 will shut down the process. Now start it with kubectl scale deployment [name] --replicas=1
They become candidates to for termination.
False. CPU is a compressible resource while memory is a non compressible resource - once a container reached the memory limit, it will be terminated.
Explained here
"Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop."
The process of managing stateful applications in Kubernetes isn't as straightforward as managing stateless applications where reaching the desired status and upgrades are both handled the same way for every replica. In stateful applications, upgrading each replica might require different handling due to the stateful nature of the app, each replica might be in a different status. As a result, we often need a human operator to manage stateful applications. Kubernetes Operator is suppose to assist with this.
This also help with automating a standard process on multiple Kubernetes clusters
It uses the control loop used by Kubernetes in general. It watches for changes in the application state. The difference is that is uses a custom control loop.In additions.
In addition, it also makes use of CRD's (Custom Resources Definitions) so basically it extends Kubernetes API.
True
open source toolkit used to manage k8s native applications, called operators, in an automated and efficient way.
It's part of the Operator Framework, used for managing the lifecycle of operators. It basically extends Kubernetes so a user can use a declarative way to manage operators (installation, upgrade, ...).
It includes:
Secrets let you store and manage sensitive information (passwords, ssh keys, etc.)
kubectl create secret generic some-secret --from-literal=password='donttellmypassword'
kubectl create secret generic some-secret --from-file=/some/file.txt
type: Opaque
in a secret file means? What other types are there?Opaque is the default type used for key-value pairs.
False. Some known security mechanisms like "encryption" aren't enabled by default.
apiVersion: v1
kind: Secret
metadata:
name: some-secret
type: Opaque
data:
password: mySecretPassword
kubectl apply -f some-secret.yaml
spec:
containers:
- name: USER_PASSWORD
valueFrom:
secretKeyRef:
name: some-secret
key: password
False
Persistent Volumes allow us to save data so basically they provide storage that doesn't depend on the pod lifecycle.
True
False
Role
and RoleBinding"
objectsRole
and ClusterRole
objects?Kubernetes.io: "A service account provides an identity for processes that run in a Pod."
An example of when to use one:You define a pipeline that needs to build and push an image. In order to have sufficient permissions to build an push an image, that pipeline would require a service account with sufficient permissions.
The pod is automatically assigned with the default service account (in the namespace where the pod is running).
kubectl get serviceaccounts
Namespaces will allow to limit resources and also make sure there are no collisions between teams when working in the cluster (like creating an app with the same name).
Separate configuration from pods.It's good for cases where you might need to change configuration at some point but you don't want to restart the application or rebuild the image so you create a ConfigMap and connect it to a pod but externally to the pod.
Overall it's good for:
False. Use secret.
Scale the number of pods automatically on observed CPU utilization.
The control plane component kube-scheduler asks the following questions,
View more here
Package manager for Kubernetes. Basically the ability to package YAML files and distribute them to other users and apply them in different clusters.
Sometimes when you would like to deploy a certain application to your cluster, you need to create multiple YAML files/components like: Secret, Service, ConfigMap, etc. This can be tedious task. So it would make sense to ease the process by introducing something that will allow us to share these bundle of YAMLs every time we would like to add an application to our cluster. This something is called Helm.
A common scenario is having multiple Kubernetes clusters (prod, dev, staging). Instead of individually applying different YAMLs in each cluster, it makes more sense to create one Chart and install it in every cluster.
Helm Charts is a bundle of YAML files. A bundle that you can consume from repositories or create your own and publish it to the repositories.
It is useful for scenarios where you have multiple applications and all are similar, so there are minor differences in their configuration files and most values are the same. With Helm you can define a common blueprint for all of them and the values that are not fixed and change can be placeholders. This is called a template file and it looks similar to the following
apiVersion: v1
kind: Pod
metadata:
name: {[ .Values.name ]}
spec:
containers:
- name: {{ .Values.container.name }}
image: {{ .Values.container.image }}
port: {{ .Values.container.port }}
The values themselves will in separate file:
name: some-app
container:
name: some-app-container
image: some-app-image
port: 1991
someChart/ -> the name of the chartChart.yaml -> meta information on the chartvalues.yaml -> values for template filescharts/ -> chart dependenciestemplates/ -> templates files :)
helm search hub [some_keyword]
Or directly on the command line: helm install --set some_key=some_value
Helm allows you to upgrade, remove and rollback to previous versions of charts. In version 2 of Helm it was with what is known as "Tiller". In version 3, it was removed due to security concerns.
"Submariner enables direct networking between pods and services in different Kubernetes clusters, either on premise or in the cloud."
You can learn more here
In static typed languages the variable type is known at compile-time instead of at run-time.Such languages are: C, C++ and Java
An expression is anything that results in a value (even if the value is None). Basically, any sequence of literals so, you can say that a string, integer, list, ... are all expressions.
Statements are instructions executed by the interpreter like variable assignments, for loops and conditionals (if-else).
SOLID design principles are about:
SOLID is:
True
It's a search algorithm used with sorted arrays/lists to find a target value by dividing the array each iteration and comparing the middle value to the target value. If the middle value is smaller than target value, then the target value is searched in the right part of the divided array, else in the left side. This continues until the value is found (or the array divided max times)
The average performance of the above algorithm is O(log n). Best performance can be O(1) and worst O(log n).
access
, search
insert
and remove
for the following data structures:There are multiple ways to detect a loop in a linked list. I'll mention three here:
Worst solution:
Two pointers where one points to the head and one points to the last node. Each time you advance the last pointer by one and check whether the distance between head pointer to the moved pointer is bigger than the last time you measured the same distance (if not, you have a loop).
The reason it's probably the worst solution, is because time complexity here is O(n^2)
Decent solution:
Create an hash table and start traversing the linked list. Every time you move, check whether the node you moved to is in the hash table. If it isn't, insert it to the hash table. If you do find at any point the node in the hash table, it means you have a loop. When you reach None/Null, it's the end and you can return "no loop" value.This one is very easy to implement (just create a hash table, update it and check whether the node is in the hash table every time you move to the next node) but since the auxiliary space is O(n) because you create a hash table then, it's not the best solution
Good solution:
Instead of creating a hash table to document which nodes in the linked list you have visited, as in the previous solution, you can modify the Linked List (or the Node to be precise) to have a "visited" attribute. Every time you visit a node, you set "visited" to True.
Time compleixty is O(n) and Auxiliary space is O(1), so it's a good solution but the only problem, is that you have to modify the Linked List.
Best solution:
You set two pointers to traverse the linked list from the beginning. You move one pointer by one each time and the other pointer by two. If at any point they meet, you have a loop. This solution is also called "Floyd's Cycle-Finding"
Time complexity is O(n) and auxiliary space is O(1). Perfect :)
def find_triplets_sum_to_zero(li):
li = sorted(li)
for i, val in enumerate(li):
low, up = 0, len(li)-1
while low < i < up:
tmp = var + li[low] + li[up]
if tmp > 0:
up -= 1
elif tmp < 0:
low += 1
else:
yield li[low], val, li[up]
low += 1
up -= 1
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
Identify the data type | Data Types | Exercise | Solution | |
Identify the data type - Advanced | Data Types | Exercise | Solution | |
Reverse String | Strings | Exercise | Solution | |
Compress String | Strings | Exercise | Solution |
1. It is a high level general purpose programming language created in 1991 by Guido Van Rosum.
2. The language is interpreted, being the CPython (Written in C) the most used/maintained implementation.
3. It is strongly typed. The typing discipline is duck typing and gradual.
4. Python focuses on readability and makes use of whitespaces/identation instead of brackets { }
5. The python package manager is called PIP "pip installs packages", having more than 200.000 available packages.
6. Python comes with pip installed and a big standard library that offers the programmer many precooked solutions.
7. In python **Everything** is an object.
List
Dictionary
Set
Numbers (int, float, ...)
String
Bool
Tuple
Frozenset
Mutability determines whether you can modify an object of specific type.
The mutable data types are:
List
Dictionary
Set
The immutable data types are:
Numbers (int, float, ...)
String
Bool
Tuple
Frozenset
bool("") -> evaluates to False
bool(" ") -> evaluates to True
[] is not []
? explain the resultIt evaluates to True.
The reason is that the two created empty list are different objects. x is y
only evaluates to true when x and y are the same object.
True-True
?0
True
char = input("Insert a character: ")
if char == "a" or char == "o" or char == "e" or char =="u" or char == "i":
print("It's a vowel!")
char = input("Insert a character: ") # For readablity
if lower(char[0]) in "aieou": # Takes care of multiple characters and separate cases
print("It's a vowel!")
OR
if lower(input("Insert a character: ")[0]) in "aieou": # Takes care of multiple characters and small/Capital cases
print("It's a vowel!")
def sum(a, b):
return (a + b)
In general, first class objects in programming languages are objects which can be assigned to variable, used as a return value and can be used as arguments or parameters.
In python you can treat functions this way. Let's say we have the following function
def my_function():
return 5
You can then assign a function to a variables like this x = my_function
or you can return functions as return values like this return my_function
By definition inheritance is the mechanism where an object acts as a base of another object, retaining all its
properties.
So if Class B inherits from Class A, every characteristics from class A will be also available in class B.
Class A would be the 'Base class' and B class would be the 'derived class'.
This comes handy when you have several classes that share the same functionalities.
The basic syntax is:
class Base: pass
class Derived(Base): pass
A more forged example:
class Animal:
def __init__(self):
print("and I'm alive!")
def eat(self, food):
print("ñom ñom ñom", food)
class Human(Animal):
def __init__(self, name):
print('My name is ', name)
super().__init__()
def write_poem(self):
print('Foo bar bar foo foo bar!')
class Dog(Animal):
def __init__(self, name):
print('My name is', name)
super().__init__()
def bark(self):
print('woof woof')
michael = Human('Michael')
michael.eat('Spam')
michael.write_poem()
bruno = Dog('Bruno')
bruno.eat('bone')
bruno.bark()
>>> My name is Michael
>>> and I'm alive!
>>> ñom ñom ñom Spam
>>> Foo bar bar foo foo bar!
>>> My name is Bruno
>>> and I'm alive!
>>> ñom ñom ñom bone
>>> woof woof
Calling super() calls the Base method, thus, calling super().__init__() we called the Animal __init__.
There is a more advanced python feature called MetaClasses that aid the programmer to directly control class creation.
In the following block of code x
is a class attribute while self.y
is a instance attribute
class MyClass(object):
x = 1
def __init__(self, y):
self.y = y
# Note that you generally don't need to know the compiling process but knowing where everything comes from
# and giving complete answers shows that you truly know what you are talking about.
Generally, every compiling process have a two steps.
- Analysis
- Code Generation.
Analysis can be broken into:
1. Lexical analysis (Tokenizes source code)
2. Syntactic analysis (Check whether the tokens are legal or not, tldr, if syntax is correct)
for i in 'foo'
^
SyntaxError: invalid syntax
We missed ':'
3. Semantic analysis (Contextual analysis, legal syntax can still trigger errors, did you try to divide by 0,
hash a mutable object or use an undeclared function?)
1/0
ZeroDivisionError: division by zero
These three analysis steps are the responsible for error handlings.
The second step would be responsible for errors, mostly syntax errors, the most common error.
The third step would be responsible for Exceptions.
As we have seen, Exceptions are semantic errors, there are many builtin Exceptions:
ImportError
ValueError
KeyError
FileNotFoundError
IndentationError
IndexError
...
You can also have user defined Exceptions that have to inherit from the `Exception` class, directly or indirectly.
Basic example:
class DividedBy2Error(Exception):
def __init__(self, message):
self.message = message
def division(dividend,divisor):
if divisor == 2:
raise DividedBy2Error('I dont want you to divide by 2!')
return dividend / divisor
division(100, 2)
>>> __main__.DividedBy2Error: I dont want you to divide by 2!
Exceptions: Errors detected during execution are called Exceptions.
Handling Exception: When an error occurs, or exception as we call it, Python will normally stop and generate an error message.
Exceptions can be handled using try
and except
statement in python.
Example: Following example asks the user for input until a valid integer has been entered.
If user enter a non-integer value it will raise exception and using except it will catch that exception and ask the user to enter valid integer again.
while True:
try:
a = int(input("please enter an integer value: "))
break
except ValueError:
print("Ops! Please enter a valid integer value.")
For more details about errors and exceptions follow this https://docs.python.org/3/tutorial/errors.html
def true_or_false():
try:
return True
finally:
return False
A lambda
expression is an 'anonymous' function, the difference from a normal defined function using the keyword `def`` is the syntax and usage.
The syntax is:
lambda[parameters]: [expresion]
Examples:
x = lambda a: a + 10
print(x(10))
addition = lambda x, y: x + y
print(addition(10, 20))
square = lambda x : x ** 2
print(square(5))
Generally it is considered a bad practice under PEP 8 to assign a lambda expresion, they are meant to be used as parameters and inside of other defined functions.
x, y = y, x
First you ask the user for the amount of numbers that will be use. Use a while loop that runs until amount_of_numbers becomes 0 through subtracting amount_of_numbers by one each loop. In the while loop you want ask the user for a number which will be added a variable each time the loop runs.
def return_sum():
amount_of_numbers = int(input("How many numbers? "))
total_sum = 0
while amount_of_numbers != 0:
num = int(input("Input a number. "))
total_sum += num
amount_of_numbers -= 1
return total_sum
li = [2, 5, 6]
print("{0:.3f}".format(sum(li)/len(li)))
A tuple is a built-in data type in Python. It's used for storing multiple items in a single variable.
List, as opposed to a tuple, is a mutable data type. It means we can modify it and at items to it.
x = [1, 2, 3]
x.append(2)
len(sone_list)
some_list[-1]
Don't use append
unless you would like the list as one item.
my_list[0:3] = []
Maximum: max(some_list)
Minimum: min(some_list)
sorted(some_list, reverse=True)[:3]
Or
some_list.sort(reverse=True)
some_list[:3]
sorted_li = sorted(li, key=len)
Or without creating a new list:
li.sort(key=len)
sorted(list) will return a new list (original list doesn't change)
list.sort() will return None but the list is change in-place
sorted() works on any iterable (Dictionaries, Strings, ...)
list.sort() is faster than sorted(list) in case of Lists
[['1', '2', '3'], ['4', '5', '6']]
nested_li = [['1', '2', '3'], ['4', '5', '6']]
[[int(x) for x in li] for li in nested_li]
sorted(li1 + li2)
Another way:
i, j = 0
merged_li = []
while i < len(li1) and j < len(li2):
if li1[i] < li2[j]:
merged_li.append(li1[i])
i += 1
else:
merged_li.append(li2[j])
j += 1
merged_li = merged_li + merged_li[i:] + merged_li[j:]
There are many ways of solving this problem:# Note: :list and -> bool are just python typings, they are not needed for the correct execution of the algorithm.
Taking advantage of sets and len:
def is_unique(l:list) -> bool:
return len(set(l)) == len(l)
This one is can be seen used in other programming languages.
def is_unique2(l:list) -> bool:
seen = []
for i in l:
if i in seen:
return False
seen.append(i)
return True
Here we just count and make sure every element is just repeated once.
def is_unique3(l:list) -> bool:
for i in l:
if l.count(i) > 1:
return False
return True
This one might look more convulated but hey, one liners.
def is_unique4(l:list) -> bool:
return all(map(lambda x: l.count(x) < 2, l))
def my_func(li = []):
li.append("hmm")
print(li)
If we call it 3 times, what would be the result each call?
['hmm']
['hmm', 'hmm']
['hmm', 'hmm', 'hmm']
for item in some_list:
print(item)
for i, item in enumerate(some_list):
print(i)
Using range like this
for i in range(1, len(some_list)):
some_list[i]
Another way is using slicing
for i in some_list[1:]:
Method 1
for i in reversed(li):
...
Method 2
n = len(li) - 1
while n > 0:
...
n -= 1
li = [[1, 4], [2, 1], [3, 9], [4, 2], [4, 5]]
sorted(li, key=lambda l: l[1])
or
li.sort(key=lambda l: l[1)
nums = [1, 2, 3]
letters = ['x', 'y', 'z']
list(zip(nums, letters))
[{'name': 'Mario', 'food': ['mushrooms', 'goombas']}, {'name': 'Luigi', 'food': ['mushrooms', 'turtles']}]
Extract all type of foods. Final output should be: {'mushrooms', 'goombas', 'turtles'}brothers_menu = \
[{'name': 'Mario', 'food': ['mushrooms', 'goombas']}, {'name': 'Luigi', 'food': ['mushrooms', 'turtles']}]
# "Classic" Way
def get_food(brothers_menu) -> set:
temp = []
for brother in brothers_menu:
for food in brother['food']:
temp.append(food)
return set(temp)
# One liner way (Using list comprehension)
set([food for bro in x for food in bro['food']])
my_dict = dict(x=1, y=2)ORmy_dict = {'x': 1, 'y': 2}ORmy_dict = dict([('x', 1), ('y', 2)])
del my_dict['some_key']you can also use my_dict.pop('some_key')
which returns the value of the key.
{k: v for k, v in sorted(x.items(), key=lambda item: item[1])}
dict(sorted(some_dictionary.items()))
some_dict1.update(some_dict2)
{'a': {'b': {'c': 1}}}
output = {}
string = "a.b.c"
path = string.split('.')
target = reduce(lambda d, k: d.setdefault(k, {}), path[:-1], output)
target[path[-1]] = 1
print(output)
with open('file.txt', 'w') as file:
file.write("My insightful comment")
import json
with open('file.json', 'w') as f:
f.write(json.dumps(dict_var))
import os
print(os.getcwd())
/dir1/dir2/file1
print the file name (file1)import os
print(os.path.basename('/dir1/dir2/file1'))
# Another way
print(os.path.split('/dir1/dir2/file1')[1])
/dir1/dir2/file1
import os
## Part 1.
# os.path.dirname gives path removing the end component
dirpath = os.path.dirname('/dir1/dir2/file1')
print(dirpath)
## Part 2.
print(os.path.basename(dirpath))
/home
and luig
will result in /home/luigi
Using the re module
While you iterate through the characters, store them in a dictionary and check for every character whether it's already in the dictionary.
def firstRepeatedCharacter(str):
chars = {}
for ch in str:
if ch in chars:
return ch
else:
chars[ch] = 0
x = "itssssssameeeemarioooooo"
y = ''.join(set(x))
def permute_string(string):
if len(string) == 1:
return [string]
permutations = []
for i in range(len(string)):
swaps = permute_string(string[:i] + string[(i+1):])
for swap in swaps:
permutations.append(string[i] + swap)
return permutations
print(permute_string("abc"))
Short way (but probably not acceptable in interviews):
from itertools import permutations
[''.join(p) for p in permutations("abc")]
Detailed answer can be found here: http://codingshell.com/python-all-string-permutations
>> ', '.join(["One", "Two", "Three"])
>> " ".join("welladsadgadoneadsadga".split("adsadga")[:2])
>> "".join(["c", "t", "o", "a", "o", "q", "l"])[0::2]
>>> 'One, Two, Three'
>>> 'well done'
>>> 'cool'
x = "pizza"
, what would be the result of x[::-1]
?It will reverse the string, so x would be equal to azzip
.
"".join(["a", "h", "m", "a", "h", "a", "n", "q", "r", "l", "o", "i", "f", "o", "o"])[2::3]
mario
for i in range(3, 3):
print(i)
No output :)
yeild
? When would you use it?[['Mario', 90], ['Geralt', 82], ['Gordon', 88]]
How to sort the list by the numbers in the nested lists?One way is:
the_list.sort(key=lambda x: x[1])
For the following slicing exercises, assume you have the following list: my_list = [8, 2, 1, 10, 5, 4, 3, 9]
pdb :D
return
returns?Short answer is: It returns a None object.
We could go a bit deeper and explain the difference between
def a ():
return
>>> None
And
def a ():
pass
>>> None
Or we could be asked this as a following question, since they both give the same result.
We could use the dis module to see what's going on:
2 0 LOAD_CONST 0 (<code object a at 0x0000029C4D3C2DB0, file "<dis>", line 2>)
2 LOAD_CONST 1 ('a')
4 MAKE_FUNCTION 0
6 STORE_NAME 0 (a)
5 8 LOAD_CONST 2 (<code object b at 0x0000029C4D3C2ED0, file "<dis>", line 5>)
10 LOAD_CONST 3 ('b')
12 MAKE_FUNCTION 0
14 STORE_NAME 1 (b)
16 LOAD_CONST 4 (None)
18 RETURN_VALUE
Disassembly of <code object a at 0x0000029C4D3C2DB0, file "<dis>", line 2>:
3 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
Disassembly of <code object b at 0x0000029C4D3C2ED0, file "<dis>", line 5>:
6 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
An empty return
is exactly the same as return None
and functions without any explicit returnwill always return None regardless of the operations, therefore
def sum(a, b):
global c
c = a + b
>>> None
li = []
for i in range(1, 10):
li.append(i)
[i for i in range(1, 10)]
def is_int(num):
if isinstance(num, int):
print('Yes')
else:
print('No')
What would be the result of is_int(2) and is_int(False)?
The reason we need to implement in the first place, it's because a linked list isn't part of Python standard library.
To implement a linked list, we have to implement two structures: The linked list itself and a node which is used by the linked list.
Let's start with a node. A node has some value (the data it holds) and a pointer to the next node
class Node(object):
def __init__(self, data):
self.data = data
self.next = None
Now the linked list. An empty linked list has nothing but an empty head.
class LinkedList(object):
def __init__(self):
self.head = None
Now we can start using the linked list
ll = Linkedlist()
ll.head = Node(1)
ll.head.next = Node(2)
ll.head.next.next = Node(3)
What we have is:
| 1 | -> | 2 | -> | 3 |
Usually, more methods are implemented, like a push_head() method where you insert a node at the beginning of the linked list
def push_head(self, value):
new_node = Node(value)
new_node.next = self.head
self.head = new_node
def print_list(self):node = self.headwhile(node):print(node.data)node = node.next
Let's use the Floyd's Cycle-Finding algorithm:
def loop_exists(self):
one_step_p = self.head
two_steps_p = self.head
while(one_step_p and two_steps_p and two_steps_p.next):
one_step_p = self.head.next
two_step_p = self.head.next.next
if (one_step_p == two_steps_p):
return True
return False
PEP8 is a list of coding conventions and style guidelines for Python
5 style guidelines:
1. Limit all lines to a maximum of 79 characters.
2. Surround top-level function and class definitions with two blank lines.
3. Use commas when making a tuple of one element
4. Use spaces (and not tabs) for indentation
5. Use 4 spaces per indentation level
assert
does in Python?assert
in non-test/production code?x = [1, 2, 3]
, what is the result of list(zip(x))?[(1,), (2,), (3,)]
list(zip(range(5), range(50), range(50)))
list(zip(range(5), range(50), range(-2)))
[(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4)]
[]
a.num2
assuming the following code
class B:
def __get__(self, obj, objtype=None):
reuturn 10
class A:
num1 = 2
num2 = Five()
some_car = Car("Red", 4)
assuming the following code
class Print:
def __get__(self, obj, objtype=None):
value = obj._color
print("Color was set to {}".format(valie))
return value
def __set__(self, obj, value):
print("The color of the car is {}".format(value))
obj._color = value
class Car:
color = Print()
def __ini__(self, color, age):
self.color = color
self.age = age
def add(num1, num2):
return num1 + num2
def sub(num1, num2):
return num1 - num2
def mul(num1, num2):
return num1*num2
def div(num1, num2):
return num1 / num2
operators = {
'+': add,
'-': sub,
'*': mul,
'/': div
}
if __name__ == '__main__':
operator = str(input("Operator: "))
num1 = int(input("1st number: "))
num2 = int(input("2nd number: "))
print(operators[operator](num1, num2))
This is a good reference https://docs.python.org/3/library/datatypes.html
def wee(word):
return word
def oh(f):
return f + "Ohh"
>>> oh(wee("Wee"))
<<< Wee Ohh
This allows us to control the before execution of any given function and if we added another function as wrapper,(a function receiving another function that receives a function as parameter) we could also control the after execution.
Sometimes we want to control the before-after execution of many functions and it would get tedious to write
f = function(function_1())
f = function(function_1(function_2(*args)))
every time, that's what decorators do, they introduce syntax to write all of this on the go, using the keyword '@'.
These two decorators (ntimes and timer) are usually used to display decorators functionalities, you can find them in lots oftutorials/reviews. I first saw these examples two years ago in pyData 2017. https://www.youtube.com/watch?v=7lmCu8wz8ro&t=3731s
Simple decorator:
def deco(f):
print(f"Hi I am the {f.__name__}() function!")
return f
@deco
def hello_world():
return "Hi, I'm in!"
a = hello_world()
print(a)
>>> Hi I am the hello_world() function!
Hi, I'm in!
This is the simplest decorator version, it basically saves us from writting a = deco(hello_world())
.But at this point we can only control the before execution, let's take on the after:
def deco(f):
def wrapper(*args, **kwargs):
print("Rick Sanchez!")
func = f(*args, **kwargs)
print("I'm in!")
return func
return wrapper
@deco
def f(word):
print(word)
a = f("************")
>>> Rick Sanchez!
************
I'm in!
deco receives a function -> fwrapper receives the arguments -> *args, **kwargs
wrapper returns the function plus the arguments -> f(*args, **kwargs)deco returns wrapper.
As you can see we conveniently do things before and after the execution of a given function.
For example, we could write a decorator that calculates the execution time of a function.
import time
def deco(f):
def wrapper(*args, **kwargs):
before = time.time()
func = f(*args, **kwargs)
after = time.time()
print(after-before)
return func
return wrapper
@deco
def f():
time.sleep(2)
print("************")
a = f()
>>> 2.0008859634399414
Or create a decorator that executes a function n times.
def n_times(n):
def wrapper(f):
def inner(*args, **kwargs):
for _ in range(n):
func = f(*args, **kwargs)
return func
return inner
return wrapper
@n_times(4)
def f():
print("************")
a = f()
>>>************
************
************
************
class Car:
def __init__(self, model, color):
self.model = model
self.color = color
def __eq__(self, other):
if not isinstance(other, Car):
return NotImplemented
return self.model == other.model and self.color == other.color
>> a = Car('model_1', 'red')
>> b = Car('model_2', 'green')
>> c = Car('model_1', 'red')
>> a == b
False
>> a == c
True
tail
command in Python? Bonus: implement head
as wellGoogle: "Monitoring is one of the primary means by which service owners keep track of a system’s health and availability".
This approach require from a human to always check why the value exceeded and how to handle it while today, it is more effective to notify people only when they need to take an actual action.If the issue doesn't require any human intervention, then the problem can be fixed by some processes running in the relevant environment.
Alerts
Tickets
Logging
From Prometheus documentation: "if you need 100% accuracy, such as for per-request billing".
Prometheus server is responsible for scraping and storing the data
Push gateway is used for short-lived jobs
Alert manager is responsible for alerts ;)
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
My first Commit | Commit | Exercise | Solution | |
Time to Branch | Branch | Exercise | Solution | |
Squashing Commits | Commit | Exercise | Solution |
You can check if there is a ".git" directory inside it.
git pull
and git fetch
?Shortly, git pull = git fetch + git merge
When you run git pull, it gets all the changes from the remote or centralrepository and attaches it to your corresponding branch in your local repository.
git fetch gets all the changes from the remote repository, stores the changes ina separate branch in your local repository
git directory
, working directory
and staging area
The Git directory is where Git stores the meta data and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.
The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.
The staging area is a simple file, generally contained in your Git directory, that stores information about what will go into your next commit. It’s sometimes referred to as the index, but it’s becoming standard to refer to it as the staging area.
This answer taken from git-scm.com
First, you open the files which are in conflict and identify what are the conflicts.Next, based on what is accepted in your company or team, you either discuss with yourcolleagues on the conflicts or resolve them by yourselfAfter resolving the conflicts, you add the files with `git add `Finally, you run `git rebase --continue`
git reset
and git revert
?git revert
creates a new commit which undoes the changes from last commit.
git reset
depends on the usage, can modify the index or change the commit which the branch headis currently pointing at.
Using the git rebase
command
git rebase
?Mentioning two or three should be enough and it's probably good to mention that 'recursive' is the default one.
recursiveresolveourstheirs
This page explains it the best: https://git-scm.com/docs/merge-strategies
`git diff```
git checkout HEAD~1 -- /path/of/the/file
.git
directory? What can you find there?.git
folder contains all the information that is necessary for your project in version control and all the information about commits, remote repository address, etc. All of them are present in this folder. It also contains a log that stores your commit history so that you can roll back to history.
This info copied from https://stackoverflow.com/questions/29217859/what-is-the-git-folder
You delete a remote branch with this syntax:
git push origin :[branch_name]
gitattributes allow you to define attributes per pathname or path pattern.
You can use it for example to control endlines in files. In Windows and Unix based systems, you have different characters for new lines (\r\n and \n accordingly). So using gitattributes we can align it for both Windows and Unix with * text=auto
in .gitattributes for anyone working with git. This is way, if you use the Git project in Windows you'll get \r\n and if you are using Unix or Linux, you'll get \n.
git checkout -- <file_name>
git reset HEAD~1
for removing last commitIf you would like to also discard the changes you `git reset --hard``
git rm
False. If you would like to keep a file on your filesystem, use git reset <file_name>
Probably good to mention that it's:
This is a great article about Octopus merge: http://www.freblogg.com/2016/12/git-octopus-merge.html
Go also has good community.
var x int = 2
and x := 2
?The result is the same, a variable with the value 2.
With var x int = 2
we are setting the variable type to integer while with x := 2
we are letting Go figure out by itself the type.
False. We can't redeclare variables but yes, we must used declared variables.
This should be answered based on your usage but some examples are:
func main() {
var x float32 = 13.5
var y int
y = x
}
package main
import "fmt"
func main() {
var x int = 101
var y string
y = string(x)
fmt.Println(y)
}
It looks what unicode value is set at 101 and uses it for converting the integer to a string.If you want to get "101" you should use the package "strconv" and replace y = string(x)
with y = strconv.Itoa(x)
package main
func main() {
var x = 2
var y = 3
const someConst = x + y
}
Constants in Go can only be declared using constant expressions.But x
, y
and their sum is variable.const initializer x + y is not a constant
package main
import "fmt"
const (
x = iota
y = iota
)
const z = iota
func main() {
fmt.Printf("%v\n", x)
fmt.Printf("%v\n", y)
fmt.Printf("%v\n", z)
}
Go's iota identifier is used in const declarations to simplify definitions of incrementing numbers. Because it can be used in expressions, it provides a generality beyond that of simple enumerations.x
and y
in the first iota group, z
in the second.
Iota page in Go Wiki
It avoids having to declare all the variables for the returns values.It is called the blank identifier.
answer in SO
package main
import "fmt"
const (
_ = iota + 3
x
)
func main() {
fmt.Printf("%v\n", x)
}
Since the first iota is declared with the value 3
( + 3
), the next one has the value 4
package main
import (
"fmt"
"sync"
"time"
)
func main() {
var wg sync.WaitGroup
wg.Add(1)
go func() {
time.Sleep(time.Second * 2)
fmt.Println("1")
wg.Done()
}()
go func() {
fmt.Println("2")
}()
wg.Wait()
fmt.Println("3")
}
Output: 2 1 3
package main
import (
"fmt"
)
func mod1(a []int) {
for i := range a {
a[i] = 5
}
fmt.Println("1:", a)
}
func mod2(a []int) {
a = append(a, 125) // !
for i := range a {
a[i] = 5
}
fmt.Println("2:", a)
}
func main() {
s1 := []int{1, 2, 3, 4}
mod1(s1)
fmt.Println("1:", s1)
s2 := []int{1, 2, 3, 4}
mod2(s2)
fmt.Println("2:", s2)
}
Output:
1 [5 5 5 5]
1 [5 5 5 5]
2 [5 5 5 5 5]
2 [1 2 3 4]
In mod1
a is link, and when we're using a[i]
, we're changing s1
value to.But in mod2
, append
creats new slice, and we're changing only a
value, not s2
.
package main
import (
"container/heap"
"fmt"
)
// An IntHeap is a min-heap of ints.
type IntHeap []int
func (h IntHeap) Len() int { return len(h) }
func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] }
func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *IntHeap) Push(x interface{}) {
// Push and Pop use pointer receivers because they modify the slice's length,
// not just its contents.
*h = append(*h, x.(int))
}
func (h *IntHeap) Pop() interface{} {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}
func main() {
h := &IntHeap{4, 8, 3, 6}
heap.Init(h)
heap.Push(h, 7)
fmt.Println((*h)[0])
}
Output: 3
MongoDB advantages are as followings:
The main difference is that SQL databases are structured (data is stored in the form oftables with rows and columns - like an excel spreadsheet table) while NoSQL isunstructured, and the data storage can vary depending on how the NoSQL DB is set up, suchas key-value pair, document-oriented, etc.
db.books.find({"name": /abc/})
db.books.find().sort({x:1})
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
My First Project | Projects | Exercise | Solution |
OpenShift is a container orchestration platform based on Kubernetes.
It can be used for deploying applications while having minimal management overhead.
It's built on top of Kubernetes while defining its own custom resources in addition to the built ones.
False. OpenShift is a PaaS (platform as a service) solution.
The Scheduler.
Application high availability by spreading pod replicas between worker nodes
A project in OpenShift is a Kubernetes namespace with annotations.
In simpler words, think about it as an isolated environment for users to manage and organize their resources (like Pods, Deployments, Service, etc.).
oc get projects
will list all projects. The "STATUS" column can be used to see which projects are currently active.
oc adm policy add-role-to-user <role> <user> -n <project>
Federation
Management and deployment of services and workloads accross multiple independent clusters from a single API
The time it takes for a disk to reach the place where the data is located and read a single block/sector.
Bones question: What is the random seek time in SSD and Magnetic Disk?Answer: Magnetic is about 10ms and SSD is somewhere between 0.08 and 0.16ms
Master node automatically restarts the pod unless it fails too often.
It's marked as bad by the master node and temporarly not restarted anymore.
oc get po -o wide
oc get serviceaccounts
A route is exposing a service by giving it hostname which is externally reachable
False. It can run on any node.
From OpenShift Docs: "Similar to the way that RBAC resources control user access, administrators can use security context constraints (SCCs) to control permissions for pods".
oc adm policy add-role-to-user view user1 -n wonderland
oc whoami --show-context
Replication Controller responsible for ensuring the specified number of pods is running at all times.
If more pods are running than needed -> it deletes some of them
If not enough pods are running -> it creates more
Few example:
You can have an entirely different answer. It's based only on your experience and preferences.
Note: write them in any language you prefer
EXAMPLE ONE
#!/bin/bash
SERVERIP=<IP Address>
NOTIFYEMAIL=test@example.com
ping -c 3 $SERVERIP > /dev/null 2>&1
if [ $? -ne 0 ]
then
# Use mailer here:
mailx -s "Server $SERVERIP is down" -t "$NOTIFYEMAIL" < /dev/null
fi
EXAMPLE ONE
#! /bin/bash
for x in *
do
if [ -s $x ]
then
continue
else
rm -rf $x
fi
done
Name | Topic | Objective & Instructions | Solution | Comments |
---|---|---|---|---|
Functions vs. Comparisons | Query Improvements | Exercise | Solution |
SQL (Structured Query Language) is a standard language for relational databases (like MySQL, MariaDB, ...).
It's used for reading, updating, removing and creating data in a relational database.
The main difference is that SQL databases are structured (data is stored in the form oftables with rows and columns - like an excel spreadsheet table) while NoSQL isunstructured, and the data storage can vary depending on how the NoSQL DB is set up, suchas key-value pair, document-oriented, etc.
SQL - Best used when data integrity is crucial. SQL is typically implemented with manybusinesses and areas within the finance field due to it's ACID compliance.
NoSQL - Great if you need to scale things quickly. NoSQL was designed with web applicationsin mind, so it works great if you need to quickly spread the same information around tomultiple servers
Additionally, since NoSQL does not adhere to the strict table with columns and rows structurethat Relational Databases require, you can store different data types together.
For these questions, we will be using the Customers and Orders tables shown below:
Customers
Customer_ID | Customer_Name | Items_in_cart | Cash_spent_to_Date |
---|---|---|---|
100204 | John Smith | 0 | 20.00 |
100205 | Jane Smith | 3 | 40.00 |
100206 | Bobby Frank | 1 | 100.20 |
ORDERS
Customer_ID | Order_ID | Item | Price | Date_sold |
---|---|---|---|---|
100206 | A123 | Rubber Ducky | 2.20 | 2019-09-18 |
100206 | A123 | Bubble Bath | 8.00 | 2019-09-18 |
100206 | Q987 | 80-Pack TP | 90.00 | 2019-09-20 |
100205 | Z001 | Cat Food - Tuna Fish | 10.00 | 2019-08-05 |
100205 | Z001 | Cat Food - Chicken | 10.00 | 2019-08-05 |
100205 | Z001 | Cat Food - Beef | 10.00 | 2019-08-05 |
100205 | Z001 | Cat Food - Kitty quesadilla | 10.00 | 2019-08-05 |
100204 | X202 | Coffee | 20.00 | 2019-04-29 |
Select *
From Customers;
Select Items_in_cart
From Customers
Where Customer_Name = "John Smith";
Select SUM(Cash_spent_to_Date) as SUM_CASH
From Customers;
Select count(1) as Number_of_People_w_items
From Customers
where Items_in_cart > 0;
You would join them on the unique key. In this case, the unique key is Customer_ID inboth the Customers table and Orders table
Select c.Customer_Name, o.Item
From Customers c
Left Join Orders o
On c.Customer_ID = o.Customer_ID;
with cat_food as (
Select Customer_ID, SUM(Price) as TOTAL_PRICE
From Orders
Where Item like "%Cat Food%"
Group by Customer_ID
)
Select Customer_name, TOTAL_PRICE
From Customers c
Inner JOIN cat_food f
ON c.Customer_ID = f.Customer_ID
where c.Customer_ID in (Select Customer_ID from cat_food);
Although this was a simple statement, the "with" clause really shines whena complex query needs to be run on a table before joining to another. With statements are nice,because you create a pseudo temp when running your query, instead of creating a whole new table.
The Sum of all the purchases of cat food weren't readily available, so we used a with statement to createthe pseudo table to retrieve the sum of the prices spent by each customer, then join the table normally.
SELECT count(*) SELECT count(*)
FROM shawarma_purchases FROM shawarma_purchases
WHERE vs. WHERE
YEAR(purchased_at) == '2017' purchased_at >= '2017-01-01' AND
purchased_at <= '2017-31-12'
SELECT count(*)
FROM shawarma_purchases
WHERE
purchased_at >= '2017-01-01' AND
purchased_at <= '2017-31-12'
When you use a function (YEAR(purchased_at)
) it has to scan the whole database as opposed to using indexes and basically the column as it is, in its natural state.
An availability set is a logical grouping of VMs that allows Azure to understand how your application is built to provide redundancy and availability. It is recommended that two or more VMs are created within an availability set to provide for a highly available application and to meet the 99.95% Azure SLA.
It's a monitoring service that provides threat protection across all of the services in Azure.More specifically, it:
Azure AD is a cloud-based identity service. You can use it as a standalone service or integrate it with existing Active Directory service you already running.
startap-script
fun fact: Anthos is flower in greek, they grow in the ground (earth) but need rain from the clouds to flourish.
On GCP the kubernetes api-server is the only control plane component exposed to customers whilst compute engine managesinstances in the project.
It is a core component of the Anthos stack which provides platform, service and security operators with a single, unified approach to multi-cluster management that spans both on-premises and cloud environments. It closely follows K8s best practices, favoring declarative approaches over imperative operations, and actively monitors cluster state and applies the desired state as defined in Git. It includes three key components as follows:
It follows common modern software development practices which makes cluster configuration, management and policy changes auditable, revertable, and versionable easily enforcing IT governance and unifying resource management in an organisation.
It is part of the Anthos stack that brings a serverless container experience to Anthos, offering a high-level platform experience on top of K8s clusters. It is built with Knative, an open-source operator for K8s that brings serverless application serving and eventing capabilities.
Platform teams in organisations that wish to offer developers additional tools to test, deploy and run applications can use Knative to enhance this experience on Anthos as Cloud Run. Below are some of the benefits;
You can read about TripleO right here
There are many reasons for that. One for example: you can't remove router if there are active ports assigned to it.
Not by default. Object Storage API limits the maximum to 5GB per object but it can be adjusted.
False. Two objects can have the same name if they are in different containers.
Using:
A list of services and their endpoints
Codefresh definition: "Zero trust is a security concept that is centered around the idea that organizations should never trust anyone or anything that does not originate from their domains. Organizations seeking zero trust automatically assume that any external services it commissions have security breaches and may leak sensitive information"
Authentication is the process of identifying whether a service or a person is who they claim to be.Authorization is the process of identifying what level of access the service or the person have (after authentication was done)
SSO (Single Sign-on), is a method of access control that enables a user to log in once and gain access to the resources of multiple software systems without being prompted to log in again.
Multi-Factor Authentication (Also known as 2FA). Allows the user to present two pieces of evidence, credentials, when logging into an account.
Access control based on user roles (i.e., a collection of access authorizations a user receives based on an explicit or implicit assumption of a given role). Role permissions may be inherited through a role hierarchy and typically reflect the permissions needed to perform defined functions within an organization. A given role may apply to a single individual or to several individuals.
Wikipedia Definition: "SSH or Secure Shell is a cryptographic network protocol for operating network services securely over an unsecured network."
Hostinger.com Definition: "SSH, or Secure Shell, is a remote administration protocol that allows users to control and modify their remote servers over the Internet."
This site explains it in a good way.
A symmetric encryption is any technique where a key is used to both encrypt and decrypt the data/entire communication.
A asymmetric encryption is any technique where the there is two different keys that are used for encryption and decryption, these keys are known as public key and private key.
Wikipedia: "Key exchange (also key establishment) is a method in cryptography by which cryptographic keys are exchanged between two parties, allowing use of a cryptographic algorithm."
False. This description fits the asymmetrical encryption.
True. It is only used during the key exchange algorithm of symmetric encryption.
Hashes used in SSH to verify the authenticity of messages and to verify that nothing tampered with the data received.
Cross Site Scripting (XSS) is an type of a attack when the attacker inserts browser executable code within a HTTP response. Now the injected attack is not stored in the web application, it will only affact the users who open the maliciously crafted link or third-party web page. A successful attack allows the attacker to access any cookies, session tokens, or other sensitive information retained by the browser and used with that site
You can test by detecting user-defined variables and how to input them. This includes hidden or non-obvious inputs such as HTTP parameters, POST data, hidden form field values, and predefined radio or selection values. You then analyze each found vector to see if their are potential vulnerabilities, then when found you craft input data with each input vector. Then you test the crafted input and see if it works.
SQL injection is an attack consists of inserts either a partial or full SQL query through data input from the browser to the web application. When a successful SQL injection happens it will allow the attacker to read sensitive information stored on the database for the web application.
You can test by using a stored procedure, so the application must be sanitize the user input to get rid of the tisk of code injection. If not then the user could enter bad SQL, that will then be executed within the procedure
DNS spoofing occurs when a particular DNS server’s records of “spoofed” or altered maliciously to redirect traffic to the attacker. This redirection of traffic allows the attacker to spread malware, steal data, etc.
Prevention
Stuxnet is a computer worm that was originally aimed at Iran’s nuclear facilities and has since mutated and spread to other industrial and energy-producing facilities. The original Stuxnet malware attack targeted the programmable logic controllers (PLCs) used to automate machine processes. It generated a flurry of media attention after it was discovered in 2010 because it was the first known virus to be capable of crippling hardware and because it appeared to have been created by the U.S. National Security Agency, the CIA, and Israeli intelligence.
Spectre is an attack method which allows a hacker to “read over the shoulder” of a program it does not have access to. Using code, the hacker forces the program to pull up its encryption key allowing full access to the program
Cross-Site Request Forgery (CSRF) is an attack that makes the end user to initate a unwanted action on the web application in which the user has a authenticated session, the attacker may user an email and force the end user to click on the link and that then execute malicious actions. When an CSRF attack is successful it will compromise the end user data
You can use OWASP ZAP to analyze a "request", and if it appears that there no protection against cross-site request forgery when the Security Level is set to 0 (the value of csrf-token is SecurityIsDisabled.) One can use data from this request to prepare a CSRF attack by using OWASP ZAP
HTTP Header Injection vulnerabilities occur when user input is insecurely included within server responses headers. If an attacker can inject newline characters into the header, then they can inject new HTTP headers and also, by injecting an empty line, break out of the headers into the message body and write arbitrary content into the application's response.
A buffer overflow (or buffer overrun) occurs when the volume of data exceeds the storage capacity of the memory buffer. As a result, the program attempting to write the data to the buffer overwrites adjacent memory locations.
MAC address flooding attack (CAM table flooding attack) is a type of network attack where an attacker connected to a switch port floods the switch interface with very large number of Ethernet frames with different fake source MAC address.
CPDoS or Cache Poisoned Denial of Service. It poisons the CDN cache. By manipulating certain header requests, the attacker forces the origin server to return a Bad Request error which is stored in the CDN’s cache. Thus, every request that comes after the attack will get an error page.
The Elastic Stack consists of:
Elasticserach, Logstash and Kibana are also known as the ELK stack.
From the official docs:
"Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents"
Beats are lightweight data shippers. These data shippers installed on the client where the data resides.Examples of beats: Filebeat, Metricbeat, Auditbeat. There are much more.
From the official docs:
"Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch indices. You can easily perform advanced data analysis and visualize your data in a variety of charts, tables, and maps."
The process may vary based on the chosen architecture and the processing you may want to apply to the logs. One possible workflow is:
This is where data is stored and also where different processing takes place (e.g. when you search for a data).
Par of a master node responsibilites:
While there can be multiple master nodes in reality only of them is the elected master node.
A node which responsible for parsing the data. In case you don't use logstash then this node can recieve data from beats and parse it, similarly to how it can be parsed in Logstash.
A Coordinating node responsible for routing requests out and in to the cluser (data nodes).
Index in Elastic is in most cases compared to a whole database from the SQL/NoSQL world.
You can choose to have one index to hold all the data of your app or have multiple indices where each index holds different type of your app (e.g. index for each service your app is running).
The official docs also offer a great explanation (in general, it's really good documentation, as every project should have):
"An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data"
An index is split into shards and documents are hashed to a particular shard. Each shard may be on a different node in a cluster and each one of the shards is a self contained index.
This allows Elasticsearch to scale to an entire cluster of servers.
From the official docs:
"An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in."
Continuing with the comparison to SQL/NoSQL a Document in Elastic is a row in table in the case of SQL or a document in a collection in the case of NoSQL.As in NoSQL a Document is a JSON object which holds data on a unit in your app. What is this unit depends on the your app. If your app related to book then each document describes a book. If you are app is about shirts then each document is a shirt.
Red means some data is unavailable.Yellow can be caused by running single node cluster instead of multi-node.
False.From the official docs:
"Each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees."
In a network/cloud environment where failures can be expected any time, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason.To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.
Term Frequency is how often a term appears in a given document and Document Frequency is how often a term appears in all documents. They both are used for determining the relevance of a term by calculating Term Frequency / Document Frequency.
"The index is actively being written to".More about the phases here
curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'{ "name": "John Doe" }'
It creates customer index if it doesn't exists and adds a new document with the field name which is set to "John Dow". Also, if it's the first document it will get the ID 1.
Bulk API is used when you need to index multiple documents. For high number of documents it would be significantly faster to use rather than individual requests since there are less network roundtrips.
From the official docs:
"In the query context, a query clause answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a relevance score in the _score meta-field."
"In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data"
There are several possible answers for this question. One of them is as follows:
A small-scale architecture of elastic will consist of the elastic stack as it is. This means we will have beats, logstash, elastcsearch and kibana.
A production environment with large amounts of data can include some kind of buffering component (e.g. Reddis or RabbitMQ) and also security component such as Nginx.
A logstash plugin which modifies information in one format and immerse it in another.
The raw data as it is stored in the index. You can search and filter it.
Total number of documents matching the search results. If not query used then simply the total number of documents.
"Visualize" is where you can create visual representations for your data (pie charts, graphs, ...)
False. One harvester harvests one file.
You can generate certificates with the provided elastic utils and change configuration to enable security using certificates model.
DNS (Domain Name Systems) is a protocol used for converting domain names into IP addresses.
As you know computer networking is done with IP addresses (layer 3 of the OSI model) but for as humans it's hard to remember IP addresses, it's much easier to remember names. This why we need something such as DNS to convert any domain name we type into an IP address. You can think on DNS as a huge phonebook or database where each corresponding name has an IP.
The process of translating IP addresses to domain names.
A mapping between domain name and an IP address.
In general the process is as follows:
It's resolved in this order:
A (Address) Maps a host name to an IP address. When a computer has multiple adapter cards and IP addresses, it should have multiple address records.
An AAAA Record performs the same function as an A Record, but for an IPv6 Address.
While an A record points a domain name to an IP address, a PTR record does the opposite and resolves the IP address to a domain name.
DNS uses UDP port 53 for resolving queries either regular or reverse. DNS uses TCP for zone transfer.
True.
According to Martin Kleppmann:
"Many processes running on many machines...only message-passing via an unreliable network with variable delays, and the system may suffer from partial failures, unreliable clocks, and process pauses."
Another definition: "Systems that are physically separated, but logically connected"
According to the CAP theorem, it's not possible for a distributed data store to provide more than two of the following at the same time:
Ways to improve:
It's an architecture in which data is and retrieved from a single, non-shared, source usually exclusively connected to one node as opposed to architectures where the request can get to one of many nodes and the data will be retrieved from one shared location (storage, memory, ...).
If it couldn't find a DNS record locally, a full DNS resolution is started.
It connects to the server using the TCP protocol
The browser sends an HTTP request to the server
The server sends an HTTP response back to the browser
The browser renders the response (e.g. HTML)
The browser then sends subsequent requests as needed to the server to get the embedded links, javascript, images in the HTML and then steps 3 to 5 are repeated.
I like this definition from here:
"An explicitly and purposefully defined interface designed to be invoked over a network that enables software developers to get programmatic access to data and functionality within an organization in a controlled and comfortable way."
Automation is the act of automating tasks to reduce human intervention or interaction in regards to IT technology and systems.
While automation focuses on a task level, Orchestration is the process of automating processes and/or workflows which consists of multiple tasks that usually across multiple systems.
Data about data. Basically, it describes the type of information that an underlying data will hold.
I can't answer this for you :)
Data serialization language used by many technologies today like Kubernetes, Ansible, etc.
True. Because YAML is superset of JSON.
{
applications: [
{
name: "my_app",
language: "python",
version: 20.17
}
]
}
applications:
- app: "my_app"
language: "python"
version: 20.17
someMultiLineString: |
look mama
I can write a multi-line string
I love YAML
It's good for use cases like writing a shell script where each line of the script is a different command.
someMultiLineString: |
to someMultiLineString: >
?using >
will make the multi-line string to fold into a single line
someMultiLineString: >
This is actually
a single line
do not let appearances fool you
They allow you reference values instead of directly writing them and it is used like this:
username: {{ my.user_name }}
Using this: ---
For Examples:
document_number: 1
---
document_number: 2
False. It doesn't maintain state for incoming request.
It consists of:
HTTP is stateless. To share state, we can use Cookies.
TODO: explain what is actually a Cookie
The server didn't receive a response from another server it communicates with in a timely manner.
A load balancer accepts (or denies) incoming network traffic from a client, and based on some criteria (application related, network, etc.) it distributes those communications out to servers (at least one).
L4 and L7
Yes, you can use DNS for performing load balancing.
Recommended read:
Cons:
The maximum timeout value can be set between 1 and 3,600 seconds on both GCP and AWS.
In Copyleft, any derivative work must use the same licensing while in permissive licensing there are no such condition. GPL-3 is an example of copyleft license while BSD is an example of permissive license.
SSHHTTPDHCPDNS...
These are not DevOps related questions as you probably noticed, but since they are part of the DevOps interview process I've decided it might be good to keep them
Tell them how did you hear about them :DRelax, there is no wrong or right answer here...I think.
If you worked in this area for more than 5 years it's hard to imagine the answer would be no. It also doesn't have to be big service outage. Maybe you merged some code that broke a project or its tests. Simply focus on what you learned from such experience.
You know best your order just have a good thought if you really want to put salary in top or bottom....
Bad answer: I don't.Better answer: Every person has strengths and weaknesses. This is true also for colleagues I don't have good work relationship with and this is what helps me to create good work relationship with them. If I am able to highlight or recognize their strengths I'm able to focus mainly on that when communicating with them.
You know the best, but some ideas if you find it hard to express yourself:
You know the best :)
You can use and elaborate on one or all of the following:
A list of questions you as a candidate can ask the interviewer during or after the interview.These are only a suggestion, use them carefully. Not every interviewer will be able to answer these (or happy to) which should be perhaps a red flag warning for your regarding working in such place but that's really up to you.
Be careful when asking this question - all companies, regardless of size, have some level of tech debt.Phrase the question in the light that all companies have the deal with this, but you want to see the currentpain points they are dealing with
This is a great way to figure how managers deal with unplanned work, and how good they are atsetting expectations with projects.
This can give you insights in some of the cool projects a company is working on, and ifyou would enjoy working on projects like these. This is also a good way to see ifthe managers are allowing employees to learn and grow with projects outside of thenormal work you'd do.
Similar to the tech debt question, this helps you identify any pain points with the company.Additionally, it can be a great way to show how you'd be an asset to the team.
For Example, if they mention they have problem X, and you've solved that in the past,you can show how you'd be able to mitigate that problem.
Not only this will tell you what is expected from you, it will also provide big hint on the type of work you are going to do in the first months of your job.
ACID stands for Atomicity, Consistency, Isolation, Durability. In order to be ACID compliant, the database much meet each of the four criteria
Atomicity - When a change occurs to the database, it should either succeed or fail as a whole.
For example, if you were to update a table, the update should completely execute. If it only partially executes, theupdate is considered failed as a whole, and will not go through - the DB will revert back to it's originalstate before the update occurred. It should also be mentioned that Atomicity ensures that eachtransaction is completed as it's own stand alone "unit" - if any part fails, the whole statement fails.
Consistency - any change made to the database should bring it from one valid state into the next.
For example, if you make a change to the DB, it shouldn't corrupt it. Consistency is upheld by checks and constraints thatare pre-defined in the DB. For example, if you tried to change a value from a string to an int when the columnshould be of datatype string, a consistent DB would not allow this transaction to go through, and the action wouldnot be executed
Isolation - this ensures that a database will never be seen "mid-update" - as multiple transactions are running atthe same time, it should still leave the DB in the same state as if the transactions were being run sequentially.
For example, let's say that 20 other people were making changes to the database at the same time. At thetime you executed your query, 15 of the 20 changes had gone through, but 5 were still in progress. You shouldonly see the 15 changes that had completed - you wouldn't see the database mid-update as the change goes through.
Durability - Once a change is committed, it will remain committed regardless of what happens(power failure, system crash, etc.). This means that all completed transactionsmust be recorded in non-volatile memory.
Note that SQL is by nature ACID compliant. Certain NoSQL DB's can be ACID compliant depending onhow they operate, but as a general rule of thumb, NoSQL DB's are not considered ACID compliant
Sharding is a horizontal partitioning.
Are you able to explain what is it good for?
Not much information provided as to why it became a bottleneck and what is current architecture, so one general approach could be
to reduce the load on your database by moving frequently-accessed data to in-memory structure.
Connection Pool is a cache of database connections and the reason it's used is to avoid an overhead of establishing a connection for every query done to a database.
A connection leak is a situation where database connection isn't closed after being created and is no longer needed.
"A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of organisation's decision-making process"
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.
Given a text file, perform the following exercises
Bonus: extract the last word of each line
The ability easily grow in size and capacity based on demand and usage.
The ability to grow but also to reduce based on what is required
Fault Tolerance - The ability to self-heal and return to normal capacity. Also the ability to withstand a failure and remain functional.
High Availability - Being able to access a resource (in some use cases, using different platforms)
Vertical Scaling is the process of adding resources to increase power of existing servers. For example, adding more CPUs, adding more RAM, etc.
With vertical scaling alone, the component still remains a single point of failure.In addition, it has hardware limit where if you don't have more resources, you might not be able to scale vertically.
Horizontal Scaling is the process of adding more resources that will be able handle requests as one unit
A load balancer. You can add more resources, but if you would like them to be part of the process, you have to serve them the requests/responses.Also, data inconsistency is a concern with horizontal scaling.
The load on the producers or consumers may be high which will then cause them to hang or crash.
Instead of working in "push mode", the consumers can pull tasks only when they are ready to handle them. It can be fixed by using a streaming platform like Kafka, Kinesis, etc. This platform will make sure to handle the high load/traffic and pass tasks/messages to consumers only when the ready to get them.
You can mention:
roll-back & roll-forwardcut overdress rehearsalsDNS redirection
Additional exercises can be found in system-design-notebook repository.
A central processing unit (CPU) performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).
RAM (Random Access Memory) is the hardware in a computing device where the operating system (OS), application programs and data in current use are kept so they can be quickly reached by the device's processor. RAM is the main memory in a computer. It is much faster to read from and write to than other kinds of storage, such as a hard disk drive (HDD), solid-state drive (SSD) or optical drive.
An embedded system is a computer system - a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is embedded as part of a complete device often including electrical or electronic hardware and mechanical parts.
Raspberry Pi
As defined by Doug Laney:
DataOps seeks to reduce the end-to-end cycle time of data analytics, from the origin of ideas to the literal creation of charts, graphs and models that create value.DataOps combines Agile development, DevOps and statistical process controls and applies them to data analytics.
An answer from talend.com:
"Data architecture is the process of standardizing how organizations collect, store, transform, distribute, and use data. The goal is to deliver relevant data to people who need it, when they need it, and help them make sense of it."
Wikipedia's explanation on Data WarehouseAmazon's explanation on Data Warehouse
Responsible for managing the compute resources in clusters and scheduling users' applications
A programming model for large-scale data processing
In general, Packer automates machine images creation.It allows you to focus on configuration prior to deployment while making the images. This allows you start the instances much faster in most cases.
A configuration->deployment which has some advantages like:
If you are looking for a way to prepare for a certain exam this is the section for you. Here you'll find a list of certificates, each references to a separate file with focused questions that will help you to prepare to the exam. Good luck :)
Thanks to all of our amazing contributors who make it easy for everyone to learn new things :)
Logos credits can be found here
1.Devops Team的要求:非临时构造,对小区域负责,全职,跨职能,小的,多样化的专家,自组织,搭配,对正在使用的工具负责。 2.可视化工作的优点: 发现已经接收的工作 发现潜在存在容量缺乏的领域 哪里的资源已经或即将耗尽 被阻塞的任务 未完成的任务 如果没有时间完成本迭代接收的所有工作,其中哪些值得尝试去完成,以便达到最大化有用的结果 3. Kanban创建可以拉动系统:提升工作流,降低故
DevOps(英文Development和Operations的组合)是一组过程、方法与系统的统称,用于促进开发(应用程序/软件工程)、技术运营和质量保障(QA)部门之间的沟通、协作与整合。它的出现是由于软件行业日益清晰地认识到:为了按时交付软件产品和服务,开发和运营工作必须紧密合作。 可以把DevOps看作开发(软件工程)、技术运营和质量保障(QA)三者的交集。 传统的软件组织将开发、IT运营和
此章节占考试的百分之20. 1.可用性(百分之5) (1)哪些企业不需要考虑Devops?企业只有价值流的一部分参与进来;企业不认可IT是关键的业务; 希望快速降低累计技术债务或者消除IT基础设施脆弱性的企业 (2)以下这些条件可以考虑Devops: 核心业务高度依赖IT IT高速变化的企业 主体业务要求快速变化以测试新的业务想法的假设 无法接受IT相关的核心业务风险 已经尝试过其他的提升效率或者
DevOps(英文Development和Operations的组合)是一组过程、方法与系统的统称,用于促进开发(应用程序/软件工程)、技术运营和质量保障(QA)部门之间的沟通、协作与整合。它的出现是由于软件行业日益清晰地认识到:为了按时交付软件产品和服务,开发和运营工作必须紧密合作。 可以把DevOps看作开发(软件工程)、技术运营和质量保障(QA)三者的交集。 传统的软件组织将开发、IT运营
DevOps是什么? 以下是来自维基百科的定义: DevOps(开发 Development 与运维 Operations 的组合词)是一种文化、一场运动或实践,强调在自动化软件交付流程及基础设施变更过程中,软件开发人员与其他信息技术(IT)专业人员彼此之间的协作与沟通。它旨在建立一种文化与环境,使构建、测试、软件发布得以快速、频繁以及更加稳定地进行。 DevOps的前世今生 我们知道软件工程的开
概述 DevOps 是 开发 和 运维 这两个词的缩写。DevOps 是一套最佳实践方法论,旨在应用和服务的生命周期中促进 IT 专业人员(开发人员、运维人员和支持人员)之间的协作和交流,最终实现: 持续整合 - 从开发到运维和支持的轻松切换 持续部署 - 持续发布,或尽可能经常的发布 持续反馈 - 在应用和服务生命周期的各个阶段寻求来自利益相关方的反馈 DevOps 改变了员工的工作思维方式;D
上节课和大家介绍了Gitlab CI 结合 Kubernetes 进行 CI/CD 的完整过程。这节课结合前面所学的知识点给大家介绍一个完整的示例:使用 Jenkins + Gitlab + Harbor + Helm + Kubernetes 来实现一个完整的 CI/CD 流水线作业。 其实前面的课程中我们就已经学习了 Jenkins Pipeline 与 Kubernetes 的完美结合,我们
翻译:Ranger Tsao 简介 Docker 是一个可以将应用部署在其中的轻量级、隔离的容器。应用程序并行运行在隔离的 Linux 容器中。如果从未使用过 Docker ,可以根据官方教程,轻松入门 Vert.x 提供两个 Docker 镜像给开发人员运行部署程序,分别是: vertx/vertx3 基础镜像,需要进行一些扩展才可以运行程序 vertx/vertx3-exec 给宿主系统提供
ks-devops 是基于 Kubernetes 的 DevOps 平台。 特性 开箱即用的 CI/CD 管道 用于使用 Kubernetes 进行 DevOps 的内置自动化工具包 使用 Jenkins Pipelines 在 Kubernetes 之上实现 DevOps 通过 CLI 管理管道
Learn DevOps Learn the craft of "DevOps" (Developer Operations)to easily/reliably deploy your App and keep it Up! Why? You should learn more "advanced" DevOps if: You / your team have "out-grown"Herok
Docker Docker Concepts Docker Notes Docker Tutorial Kubernetes Kubernetes Concepts Kubernetes Commands Prometheus Prometheus Concepts Prometheus Tutorial Git Git concepts Git Advanced Ansible Ansible