当前位置: 首页 > 软件库 > 数据库相关 > >

awesome-system-design

授权协议 CC0-1.0 License
开发语言 Java
所属分类 数据库相关
软件类型 开源软件
地区 不详
投 递 者 程胡非
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

If you appreciate the content �� , support projects visibility, give �� | | ��

A curated list of awesome System Designing articles, videos and resources for distributed computing, AKA Big Data.

Whether you're preparing for an interview or you want to design a distributed/microservice oriented application, this list will definitely help you achieve that.

Attention: Stars on GitHub does not reflect usage or popularity for every item here listed.

Inspired By Awesome-BigData

Started By Gabriel Leon de Mattos

Contents

Articles

Books

Videos

Tools

Bonus

Articles

Introduction / Interviews

Advanced


Books

  • Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services - [Paid �� ] - Book that talks about disitributed systems as well as lightly demonstrating some code of what it looks like.

  • Designing Data Intensive Applications - [Paid �� ] - Goes in depth to explain various resources we use when working with distributed systems, as well as how it came to be and what problems it aims to solve.

  • The System Design Manual - [Paid �� ] - Covers the core aspects of distributed systems, like: network fundamentals, the theory underpinning distributed systems, architectural patterns of scalable systems, stability patterns that harden systems against failures and operational best-practices on how to maintain large-scale systems with a small team.

  • Building Microservices - [Free �� ] - Awesome book that talks about designing sytem architecture with microservices in depth, includes most relevant topics in this regard.

  • Monolith to Microservices - [Free �� ] - Written by the same author as the one above, this book will cover the migration from Monolith to Microservices, it's recommended you start by the previous book.

  • Distributed Systems (3rd Edition) - [Free �� ] - Great overview of and in-depth introduction to distributed systems. Recommended for intermediate level readers.


Videos

A collection of videos based on distributed systems.

Introduction / Interviews

Advanced

Tools

  • A collection of most commonly used tools for distributed systems

Relational Database Management System

  • MariaDB - MariaDB is a fork of MySQL server.

  • MySQL - Widely used relational database.

  • PostgresSQL - Relational database that has been gaining popularity.

  • SQLite - Another widely used database that is built into all mobile phones and most computers.

  • Sql Server - Widely used relational database.

NoSQL

Cache (Key-Value)

  • Apache Ignite - [3.3k ] - In memory caching with ACID properties.

  • Couchbase - Inspired by memcached, adding features such as replication and persistance.

  • Oracle Coherence - [126 ] - High scaling, low latency in-memory caching.

  • Memcached - [10.2k ] - One of the first in-memory caching database, high performing and multi-threaded.

  • Redis - [44k ] - Widely used in-memory caching database with many added features such as persistent storage and supporting strings, lists, sets, hashses, streams, bitmaps, etc.

Store (Key-Value)

  • Apple FoundationDB - [10k ] - Multi-model (many data types in a single database), ACID key-value store. Easily scalable and fault tolerant.

  • Cosmos DB - Microsoft's globally distributed, multi-model database service. Eastically and independently scale throughput and storage. SQL, MongoDB, Cassandra, Tables, Gremlin, and Spark APIs.

Document Store

  • CouchDB - [4.6k ] - ACID compliant NoSQL document-store DB, provides a RESTful HTTP API for reading and updating database documents.

  • MongoDB - One of the most popular 'NoSQL' database for general purpose.

  • RethinkDB - [23.8k ] - Document-store DB.

  • ElasticSearch - [49.9k ] - Widely popular 'NoSQL' database for fast and scalable search engines.

  • Cosmos DB - Microsoft's globally distributed, multi-model database service. Eastically and independently scale throughput and storage. SQL, MongoDB, Cassandra, Tables, Gremlin, and Spark APIs.

Wide Column Store

  • Amazon DynamoDB - Key-Value and Document database, highly performant, scalable and secure.

  • Google Bigtable - Scalable and performant 'NoSQL' database for large analytical and operational workload.

  • Cassandra - Facebook-born project very fast, easily scalable, with option to include consistency with each operation.

  • Scylla - [4.9k ] - 'NoSQL' data store using seastar framework, compatible with Cassandra.

  • HBase - [3.6k ] - Modeled after Google's Bigtable and written in Java. Developed as a part of Apache Hadoop project and runs on top of HDFS or Alluxio. (See Hadoop Related)

  • Cosmos DB - Microsoft's globally distributed, multi-model database service. Eastically and independently scale throughput and storage. SQL, MongoDB, Cassandra, Tables, Gremlin, and Spark APIs.

Graph

  • Amazon Neptune - Fast, reliable and fully managed graph database service.

  • ArangoDB - [10k ] - Flexible database for documents, key-value, graphs. Uses its own query language, AQL.

  • Neo4j - [7.9k ] - Good support for a graph db, ACID compliant and flexible.

  • Cosmos DB - Microsoft's globally distributed, multi-model database service. Eastically and independently scale throughput and storage. SQL, MongoDB, Cassandra, Tables, Gremlin, and Spark APIs.

Distributed File Systems

  • HDFS - Hadoop File System is a a widely popular choice among its big data competitors, providing high throughput access.

  • Lustre - File system for computer clusters.

  • CephFS - Unified, distributed storage system.

  • GlusterFS - Scale-out NAS file system.

  • MooseFS - POSIX-compliant distributed file system.

  • XtreemFS - Fault tolerant file system.

Resource Management

  • Kubernetes - Highly popular way to deploy, manage and automatically scale a cluster of containers on bare-metal or virtual servers.

Stream Processing

  • Apache Samza - Build stateful applications that process data in real time from multiple sources, including Kafka. Easy and inexpensive multi-subscriber model, can eliminate backpressure and has reliable persistency with low latency.

  • Apache Flink - Based on the concept of streams and transofrmations. Uses maven, handles batch tasks as data streams with finite boundaries. Low latency, high throughput.

  • Amazon Kinesis Streams - Durable, scalable, real-tme service. Collects gigabytes of data per second from hundreds of thousands of sources, including database event streams, website clickstreams, financial transactions, etc.

  • Azure Stream Analytics - Real-time analytics service that is designed for mission-critical workloads.

Message Broker

  • Amazon MQ - Open source message broker from Amazon.

  • Apache ActiveMQ - It's a multi-protocol, java based messaging server.

  • Apache Kafka - Widely popular message broker with low latency for data streaming.

  • RabbitMQ - Widely popular lightweightmessage broker written in erlang that also supports multiple messaging protocols.

  • IronMQ - Very fast and highly scalable messaging broker. (not open source)

  • Apache Pulsar - Created by yahoo, also highly scalable, low latency, geo-replication and multi-tenacy.

  • Kestrel - Written in Scala and speaks the memcached protocol. It works much like Kafka.

  • Azure Service Bus - A fully managed enterprise integration message broker.

Load Balancers

Open Source Software

  • SeeSaw - [5.1k ] - Used by Google, developed in Go, linux-based virtual load balancer server.

  • HAProxy - Widely popular option, provides high-availability, proxy, TCP/HTTP load balancing. Used by Reddit, Imgur, MaxCDN, GitHub, AirBNB.

  • Zevenet - Supports L3, L4 and L7. Easy install with a docker repo. Supports advanced health-check monitorining.

  • Neutrino - Used by eBay, built with Scala and Netty. Supports round-robin and least-connection algorithms.

  • Nginx - Wait, isn't Nginx a web server? Yes, the open source does support basic level of content switching and request routing. Plus edition supports load balancing, WAF, monitoring, etc.

  • Openresty - Nginx + Lua, perfect combination.

Hardware

  • F5 - Robust hardware load balancer option, supporting multiple protocols (IP, TCP, FTP, UDP, HTTP).

  • TP-Link - Cheaper alternative that works as a load balancer.

  • Barracuda - One of the top choices for load balancing when it comes to in-house servers. Top security measures built in, comprehensive reports and monitoring outbound traffic for data loss prevention.

Cloud

  • Amazon Elastic Load Balancing - Popular choice for amazon customers, supports lambda functions, highly scalable.

  • Google Load Balancing - Popular choice for google customers, comes with auto-scaling feature, very fast, has intergrated CDN.

  • Cloudflare Load Balancing - Scalable load balancing by Cloudflare, feature fast failover and a dashboard.

  • DigitalOcean Load Balancing - If you're a digitalocean customer, this is a good option, very cheap, regional availability, scalable, easy to deploy among your other droplets.

  • Azure Load Balancing - Popular choice for Microsoft's Azure customers. Supports internal and external traffics, ipv6, monitorining and the standard load balancing set of features.

Hadoop Ecosystem

Dashboard

  • Ambari - Dashboard that integrates most of hadoop related technologies for easy management and executions.

Data Ingestion

  • Sqoop - Efficiently transfer data between Hadoop and structured datastores such as relational databases.

  • Flume - Distributed, highly available and efficient in collecting, aggregating and moving large amounts of log data.

  • Apache Kafka - Widely popular message broker with low latency for data streaming.

Workflow Scheduler

  • Oozie - Create workflows in xml to execute jobs (from other hadoop-ecosystem applications) in steps, allows for parallel execution as well.

Query

  • Hive - Query hadoop stored data in SQL.
  • Pig - Scriping language that looks like SQL to query hadoop data.

Processing

  • Tez - Solves a similar problem to Spark and MapReduce, it's more efficient than MapReduce because it calculates the most efficient way of doing it.
  • Map Reduce - MapReduce, as the name implies, maps data and reduce the results.
  • Spark - Powerful data processing to not only process data like Tez (and MapReduce), it can process streams of data in real time, apply regression analysis algorithms in ML and much more.
  • Apex - *Retired project, it's a YARN-native platform that unifies stream and batch processing.

DB

  • HBase - [3.6k ] - Modeled after Google's Bigtable and written in Java. Developed as a part of Apache Hadoop project.

Resource Management

  • YARN - 'Yet Another Resource Negotiator', works like a kernel to manage computer resources across the clusters.
  • MESOS - Works like a Linux Kernel by managing CPU, memory, storage and other resources across the cluster.

REST Framework

  • Gin - [40.6k ] - Blazingly fast microservice framework using Golang, high throughput capacity.

  • Phoenix - [15.5k ] - Distributed processing, easily scalable, support for channels and live chat. This framework - written in Elixir, uses BEAM and Erlang, very efficient for large scale systems and supports high throughput.

  • Express.js - [49.6k ] - Fast node.js rest api that can perform well under many scenarios.

  • Rails - [46.2k ] - Written in Ruby, Rails delivers quick apis from prototype to production in an efficient manner.

  • Play Framework - [11.6k ] - Very fast, high throughput framework written in Scala/Java that is RESTful by default.

  • Flask - [51.6k ] - A lightweight Python Microframework for fast prototyping and production.

  • FastAPI - [22.7k ] - A lightweight Python Microframework inspired in Flask but more modern, using Python async.

  • Django REST - [18.4k ] - Written in Python, Django Rest is a powerful and flexible REST API. The efficiency and time to market resembles Rails.

  • ASP.NET Core MVC - A rich framework for building web apps and APIs using the Model-View-Controller design pattern in C# or F#. Number 6 on TechEmpower Composite Benchmarks for web frameworks.

  • Fastify - [15.4k ] - A Node.js web framework highly focused on providing the best developer experience with the least overhead and a powerful plugin architecture.

 相关资料
  • Awesome Awesome Node.js A curated list of awesome lists that are about or related to Node.js. Inspired by the awesome list thing, going deeper down the rabbit hole. �� Meta stuff about this awesome li

  • 下表列出了各种系统调用及其说明。 类别 系统调用 描述 General 打开() 此系统调用将打开现有文件或创建并打开新文件。 General creat() 创建并打开一个新文件。 General 读() 将文件的内容读入所需的缓冲区。 General 写() 将缓冲区的内容写入文件。 General 关 () 关闭文件描述符。 General stat() 提供有关该文件的信息。 Pipes

  • 描述 (Description) 此函数执行PROGRAM指定的命令,将LIST作为参数传递给命令。 返回值是wait函数返回的程序的退出状态。 要获得实际退出值,除以256。 语法 (Syntax) 以下是此函数的简单语法 - system PROGRAM, LIST system PROGRAM 返回值 (Return Value) 此函数返回wai返回的程序的退出状态 例子 (Exampl

  • system(执行shell 命令) 相关函数 fork,execve,waitpid,popen 表头文件 #include<stdlib.h> 定义函数 int system(const char * string); 函数说明 system()会调用fork()产生子进程,由子进程来调用/bin/sh-c string来执行参数string字符串所代表的命令,此命令执行完后随 即返回原调用的

  • system 执行shell 命令 相关函数 fork,execve,waitpid,popen 表头文件 #include<stdlib.h> 定义函数 int system(const char *string); 函数说明 system()会调用fork()产生子进程,由子进程来调用/bin/sh-c string来执行参数string字符串所代表的命令,此命令执行完后随即返回原调用的进

  • 下表列出了System V IPC和POSIX IPC之间的差异。 系统五 POSIX AT&T(1983)推出了三种新形式的IPC设施,即消息队列,共享内存和信号量。 IEEE规定的可移植操作系统接口标准,用于定义应用程序编程接口(API)。 POSIX涵盖了IPC的所有三种形式 SYSTEM V IPC涵盖所有IPC机制,即管道,命名管道,消息队列,信号,信号量和共享内存。 它还包括socke