当前位置: 首页 > 工具软件 > QuickSand > 使用案例 >

Building on Quicksand - Pat Helland, Dave Campbell 阅读笔记

郭瀚海
2023-12-01

ABSTRACT

 

"Reliable systems have always been built out of unreliable components"

 

"There are two implications of asynchronous state capture:

1) Everything promised by the primary is probabilistic.

2) Applications must ensure eventual consistency."

 

2. An Abstraction for Fault Tolerance


2.1 Modeling “The System”

 

"In considering interactions with a fault tolerant “system”, we want to look at its behavior as a black-box. From the outside, requests are sent into the system for processing. In years past, these requests looked like block mode screen input. Nowadays, they typically take the form of XML, SOAP, and/or other web-style requests."

 

"To be robust, these incoming requests are retried by their source. In classic fashion, a request is issued and if a timer expires, it is reissued. The fault tolerant server system had better make this work idempotent or the retries would occasionally result in duplicative work."

 

2.2 Transparent Fault Tolerance

 

"We have observed the pattern in which a fault tolerant algorithm is broken into idempotent sub-algorithms. By capturing sufficient information between the idempotent steps and sending it across the failure boundary, the overarching algorithm can tolerate faults.

From this perspective, you can imagine stepping across a river from rock to rock, always keeping one foot on solid ground. It is important to realize this provides a linear sequence of steps marching forward through the work."


3. Preserving Transparency While Growing


3.2 Example 2: Tandem NonStop circa 1986

 

"This scheme meant that a processor failure would result in more transactions aborting. This was a very rare event and was completely within the system rules which allowed transactions to abort without cause."

出错代价更大, 但考虑到出错概率很低, 因此, 由这样的牺牲换来性能的提高, 还是很划算.

 

4. The Creeping Arrival of Asynchrony

 

"In this section, we see the first example of acknowledging the incoming request BEFORE ensuring the work is sent to the backup. This is asynchronous checkpointing to the backup."

 

4.1 Example 3: Log Shipping


"The change from a synchronous transfer of state to an asynchronous transfer is an interesting erosion of the basic abstraction and is another example of where the cost for “consistency at a distance” is too high just as it was when tried to stretch 2PC beyond resource managers in the same room."

 

4.2 Log Shipping and Takeover Semantics

 

"In most deployments of log-shipping, this is not considered in the application design. It is assumed that this window is rare and that it is unnecessary to plan for it. Bad stuff just happens if you get unlucky."

 

4.3 Revisiting the Abstraction

 

"The close proximity of the components allowed for practical use of synchronous state copying. In the log shipping example, the delay is considered impractical and the transfer of the state is asynchronous. This results in “faults” in the fault tolerance provided for data center failures."

 

5. Loosening the Abstraction

 

"In this new world, history cannot be exactly replayed and we must count on the ability to reorder the work. This means that we cannot completely know the accurate state of the system. It also means we must move the correctness and reordering semantics up from being based on system properties (i.e. READ and WRITE) to application based business operations."

正确性的保证必须由程序实现级别转到系统级别.

 

"Section 5 examines a number of different aspects of asynchronous checkpointing and how it impacts application design."

容错抽象模型的变化会对系统的设计产生影响.

 

"Finally, we summarize the abstraction by observing that either you have synchronous checkpoints to your backup or you must sometimes apologize for your behavior…"

 

5.1 Asynchrony and the Truth

 

"The deeper observation is that two things are coupled:
1) The change from synchronous checkpointing to asynchronous to save latency, and
2) The loss of a notion of an authoritative truth."

 

5.2 Probabilistic Business Rules

 

"If a primary uses asynchronous checkpointing and applies a business rule on the incoming work, it is necessarily a probabilistic rule. The primary, despite its best intentions cannot know it will be alive to enforce the business rules."

 

"Distribution + Asynchrony -> Probabilities of Enforcement"


5.4 Idempotence and Partitioned Workflow

 

"It is essential to ensure that the work of a single operation is idempotent. This is an important design consideration in the creation of an application that can handle asynchrony as it tolerates faults and as it allows loose-coupling for latency, scale, and offline."

"idempotent" 对于容错的重要性.

 

"The unique identifier of the work (the “uniquifier”) has two very important roles:
1) The uniquifier provides the key for partitioning the work in a scalable system.
2) The uniquifier allows the system to recognize multiple executions of the same request. In this fashion, they can be collapsed and the work becomes idempotent."

实现 "idempotent" 的一种手段, 即 "uniquifier"

 

5.5 What’s Your Stomach for Risk?

 

"Note that that it is possible to have multiple business rules with different guarantees. Some operations can choose classic consistency over availability (i.e. they will slow down, eat the latency, and make darn sure before promising). Other operations can be more cavalier."

根据不同要求采用不同策略, 要懂得灵活.

 

"The major point is that availability (and its cousins offline and latency-reduction) may be traded off with classic notions of consistency. This tradeoff may frequently be applied across many different aspects at many levels of granularity within a single application."

 

5.6 Fussing and Whining (but Not Too Often)

 

"The best model for coping with violations of the business rule is:
1. Send the problem to a human (via email or something else),
2. If that’s too expensive, write some business specific software to reduce the probability that a human needs to be involved."

出错时怎么办? 交给人来处理!

首先, 没有完美的自我修复系统. 其次, 引入人的因素, 代价可能远低于设计一个非常完善的防错系统. 因此, 通过权衡, 可以考虑人的介入, 以简化系统的设计. 当然, 系统需要提供尽可能多的信息及帮助, 以方便维护者的工作.

 

5.7 Memories, Guesses, and Apologies

 

"Arguably, all computing really falls into three categories: memories, guesses, and apologies[16, 19]. The idea is that everything is done locally with a subset of the global knowledge. You know what you know when an action is performed. Since you have only a subset of the knowledge, your actions are really only guesses."

 

"Memories: Your local replica has seen what it has seen and (hopefully) remembers it. The cost of spreading that knowledge includes bandwidth, computation, and latency (in the case where you are waiting for the backup to acknowledge your memory of an operation).

 

Guesses: Any time an application takes an action based upon local information, it may be wrong.  ... It is simply a matter of business choice as to the quality of the guess.

 

Apologies: When a mistake is made, you apologize. Every business includes apologies."

 

"In a loosely coupled world choosing some level of availability over consistency, it is best to think of all computing as memories, guesses, and apologies."

 

5.8 Synchronous Checkpoints OR Apologies!

 

"So, section 5 is pointing out that there are design options:
1) You can synchronously checkpoint and incur the latency, or
2) You can asynchronously checkpoint, save the latency, and experience modified application semantics."

 

"In summary, all of these choices depend on their business value!"

 

7. Managing Resources with Asynchrony


7.1 Over-Booking versus Over-Provisioning

 

"As we consider a system with asynchronous checkpointing, we are considering a system with a probability that two or more replicas will be allocating resources to their users. Since these replicas will sometimes be incommunicado, we must consider the policy used for allocating resources while not in communication. There are two approaches:
1) Over-Provisioning. In this approach, each replica has a fixed subset of the resources that it may allocate.
2) Over-Booking. Unlike over-provisioning, over-booking allows for the possibility that the disconnected replicas will occasionally promise something they cannot deliver."

 

7.2 Computing versus Reality

 

"This is true in that the computational resource will not show an allocation for which there are no resources. Unfortunately, the real world is not always accurately modeled in the computers (and cannot always be)."

现实固然存在很多可能性, 但是计算机似乎也没必要做出精确的保证? 即只要要达到通用性, 偶尔出错并不可怕(当然, 相对某些业务而言)? 毕竟引入计算机后带来的效率提高非常巨大.


7.4 The Quest for Fungibility

 

"As you look at functions in the computing world, you see an ever increasing categorization of things into fungible buckets."

 

"The real world is rife with algorithms for idempotence, commutativity, and associativity. They are part of the lubrication of real world business and of the applications we must support on our fault tolerant platforms. A major trick is to look for mechanisms to create equivalence of the operation or resource."

 

7.5 The Importance of Uniquifiers

 

"One important pattern in the management of asynchrony is the usage of the unique identifier in tracking the request through the distributed system. ... The detection of the redundant work is made possible by the uniquifier on the request."

 

"So, even as we look at the topic of managing resources under asynchrony, we see the importance of having uniquely identified requests so we can create idempotent behavior."

 

7.6 Eventually We’ll Talk and Be Consistent

 

"When an application is built to support eventual consistency, the design should ensure that the order of the work’s arrival at the node is not the determining factor in the outcome."

保证一致性的关键即在于此.

 

"As mentioned above, sometimes the operations accumulated by different replicas result in a violation of the application’s business rules. ... This level of violation of the business rules becomes a probabilistic analysis with the application designers choosing their stomach for risk."

但是, 冲突依然无法避免, 这种情况下, 只能由系统设计者权衡, 做出合适的取舍了.

 

7.7 Back to the Future

 

"Whenever the authors struggle with explaining how to implement loosely-coupled solutions, we look to how things were done before computers."

人类处理松耦合系统的经验很丰富, 远在计算机出现之前, 所以设计系统的时候, 要善于学习古老的智慧.

 

8. CAP and ACID2.0

 

"As mentioned above, the CAP Theorem states that with Consistency, Availability, and Partition tolerance you can have any two at once but not three."

 

"Consider the new ACID (or ACID2.0). The letters stand for: Associative, Commutative, Idempotent, and Distributed. The goal for ACID2.0 is to succeed if the pieces of the work happen:
 At least once,
 Anywhere in the system,
 In any order.
This defines a new KIND of consistency."

搜了下, ms 是作者自己提出的说法, 针对的是分布式的环境.


8.1 Fault Tolerance on ACID

 

"To maintain serializability, classic algorithms do one thing at a time. All the concurrency mechanisms we know and love work hard to provide an appearance that one thing happens at a time."

 

8.2 Fault Tolerance on ACID2.0

 

"When the application is constrained to the additional requirements of commutativity and associativity, the world gets a LOT easier."

 

"Surprisingly, we find that many common business practices comply with these constraints. Looking at the business operations from the standpoint of how work has traditionally been performed shows many examples supportive of this approach. It appears we in database-land have gotten so attached to our abstractions of READ and WRITE that we forgot to look at what normal people do for inspiration."

 

个人认为, 采用串行抑或并发方式取决于业务逻辑, 如果业务允许, ACID2.0 看起来似乎更美好些. 因此, 系统设计前, 对业务模式, 逻辑的分析也是非常有必要的.

 

9. Future Work

 

"Our forefathers were VERY smart and were dealing with loosely coupled systems to implement their businesses."

 

10. Conclusion

 

"For years, the state of the art in fault tolerant systems provided crisp transactional behavior by synchronously checkpointing state across the failure boundaries. As the size of the failure unit has increased, the latency involved in synchronous checkpointing has grown to be punitive."

关键还是在于 Availability, 高延迟带来低可用, 因此需要牺牲一致性换来可用性. 当然, 这个说法只适用于某些系统, 也存在很多一致性优先级更高的系统.

 

"It is the reorderability of work and repeatability of work that is essential to allowing successful application execution on top of the chaos of a distributed world in which systems come and go when they feel like it."

 

 

 类似资料: