Lecture 7. Asynchrony, BA consensus and the FLP impossibility.

HW:

Problem 1 (20 points) The FLP impossibility result (Lectures 4 and 5) concerns the Byzantine agreement (BA) problem. But what about the Byzantine broadcast (BB) problem discussed in Lecture 2?

(a) (5 points) Show that, in the asynchronous model with f < n/3, BA reduces to BB. (That is, given a BB protocol that satisfies validity and agreement, build a BA protocol that satisfies validity and agreement.) In light of the FLP result, what does this imply about the BB problem in the asynchronous model?

(b) (5 points) In previous HW you showed that BB reduces to BA in the synchronous model (no matter what f is). Does your reduction work also in the asynchronous model? If not, why not?

(c) (5 points) Give a simple and direct proof that, in the asynchronous model, no deterministic BB protocol always terminates while satisfying both validity and agreement (even for f = 1). [Hint: Consider first the case of an adversarial sender who never sends any messages. What must the other nodes do?]

(d) (5 points) Show that the FLP impossibility result implies that there is no protocol for SMR that satisfies both consistency and liveness in the asynchronous model, even for f = 1. [Hint: like in (a), use a suitable reduction.]

Problem 2 (35 points) Recall again the FLP impossibility result for BA in the asynchronous model. (a) (5 points) The proof given in lecture is somewhat nonconstructive. Give a direct proof for the n = 2 (and f = 1) case, with an explicit description of the relevant adversary strategies. (b) (5 points) A very restricted type of adversary is one who only uses crash faults. This means that the only misbehavior allowed by an adversarial node is: at some point in the protocol’s execution (at a time decided by the adversary), the node never sends a message again (no matter how long the protocol runs). Explain why the proof from lecture does not immediately apply to crash-fault adversaries. (c) (15 points) Modify the proof from lecture so that it holds also for crash-fault adversaries. [If you must, you can rewrite the whole proof. Better would be to explain exactly which steps of the proof need to be changed, and what changes need to be made in each case.] (d) (10 points) Suppose we assume that each node has its own public key-private key pair, and that all nodes’ public keys are common knowledge prior to the start of the protocol. (I.e., the same PKI trusted setup assumption as for the Dolev-Strong protocol.) You can assume that the amount of computation performed by each adversarial node is polynomial (in the number n of nodes); message delivery continues to be completely arbitrary. Does the proof of the FLP impossibility result continue to hold? Give a compelling justification for your answer.

Problem 3 (30 points) The point of this problem is to see how randomization can help mitigate the FLP impossibility result for BA in the asynchronous model. Throughout, we assume a known upper bound f on the number of faulty (Byzantine) nodes, and assume that f < n/5. (Recall that the FLP result applies for deterministic protocols even when f = 1.)

Consider a protocol in which each node has a local notion of “rounds.” Each round will have two phases (with messages sent and received in each phase), followed by a local update step. By “phase 17(b),” for example, we mean the second phase of the 17th round. Every message sent by an honest node i is annotated with the phase (from i’s local perspective) to which it belongs.

Each node i maintains a bit xix_i, initially set to its private input.

First phase. When node i reaches the (local) phase r(a) (i.e., the first phase of round r), it sends its current bit xi to all nodes (including itself), annotated as usual with the current phase r(a). Node i then idles in phase r(a) until it receives r(a)-phase messages from n − f distinct nodes. (If it receives more than one r(a)-phase message from the same (dishonest) node, it ignores all but the first of these.) [The node discards any messages it receives that are associated with earlier rounds that the node has already moved on from. The node remembers any messages it receives that are associated with later rounds, to be taken into account once the node catches up and reaches the relevant round.]

Second phase. If more than (n+f)/2 of the phase-r(a) messages received agree on a common bit v, then send v to all nodes. Otherwise, send a null message ⊥ to all nodes. Node i then idles in phase r(b) until it receives r(b)-phase messages from n − f distinct nodes.

Local update step. • If more than (n+f)/2 of the phase-r(b) messages received agree on a common bit v, then commit to v as the final output. • Otherwise, if at least f + 1 of the received phase-r(b) messages agree on a common bit v, set xi to v. • Otherwise, set xi to 0 or 1, with 50/50 probability

After the round-r local update step, a node proceeds to the first phase of round r + 1. (a) (7 points) Prove that the protocol is well defined. Specifically, prove that (no matter what the Byzantine nodes and the adversary controlling message delivery do), for every round r, no node will ever receive at least f + 1 phase-r(b) messages for two different values v. (b) (7 points) Prove that, whenever the protocol terminates, it satisfies validity. (c) (9 points) Prove that, whenever the protocol terminates, it satisfies agreement. (d) (7 points) Prove that, with probability 1 (over the protocol’s random coin flips), the protocol terminates in a finite number of steps.

Recap

Synchrony
Asynchrony - ?
Permissioned
• PKI, any f<n ⇒ solution for BB • no PKI, f≥n/3 ⇒ no BB protocol
?

Relaxing the Synchronous Assumption

Recall 4 assumptions: - still in permissioned setting, later will move to permissionless consensus (big innovation in Bitcoin) - PKI will not be important for the next couple of lectures - will move to asynchronous model - the number of Byzantine nodes for us will be “at least 1”, i.e. f≥1

Synchronous model: (i) shared global clock (ii) every message sent at time t arrives at time t+1

However, (ii) is not OK for modeling internet, since there are outages and denial-of-service (DoS) attacks

If there is a known bound △ on the maximum message delays (even very large, like 1000 seconds), by the inflating time trick we are still in the synchronous model. Similarly if a prior bound △ on max difference between shared global clock.

These inflating time tricks to deal with failure of (i) and (ii) are very unsatisfying, because: (1) they don’t force us to deal with outages and DoS attacks; (2) they produce silly protocols in which nodes mostly idle.

The Asynchronous model

The basic idea: (i) no shared clock; (ii) no bound on the max message delay (iii) minimal assumption: every message arrives, eventually (otherwise things are uninteresting)

Precise model:

  • pool M of outstanding not-yet-delivered messages
  • while(TRUE):
    • one message (r,m) is delivered (to recipient r) [useful to think of each message and their order as being chosen by an “adversary” whose sole goal is to foil the consensus protocol]
    • r can add any number of messages to M

Assume: - M is initialized to be {(i,)}i=1n\{(i,\perp)\}_{i=1}^n (dummy messages to everyone to get started) - every message is delivered, eventually - each node, if receives a message, always injects a dummy message to itself into the pool M, in order to be able to speak later

Note that we now have two adversaries: - Byzantine nodes - Adversarial message delivery (which is probably not adversarial, but since we want to prove a theorem, we assume that it may go in any way, in particular worst possible adversary way)

Why we have made all these strange assumptions and models? It doesn’t reflect the internet, but it has much weaker assumptions then the internet, so if we would design a protocol under these assumptions, we would be in a great shape!

Byzantine Agreement problem

Before: - SMR (many outputs ⇒ multi-shot consensus problem) - BB (one output ⇒ single-shot consensus problem)

Now: - Byzantine Agreement, another single-shot consensus protocol.

Protocol: specifies what messages a node sends upon receipt of a new message (as a function of what the node knows: its private input + messages it has seen thus far)

The Byzantine Agreement problem. Setup: every node ii has its own private input viVv_i^*\in V (for FLP we can take V={0,1}V=\{0,1\}) (as opposed to BB problem, where only sender has a private input) Goals: 1. Termination: each node eventually halts with some output viv_i; 2. Agreement: all honest nodes output same viv_i (safety property); 3. Validity: if all honest nodes have the same private input vv^*, their output viv_i is also equal to vv^*(liveness property).

(again solving 1&2 or 1&3 is pretty easy)

Proposition 1: BB reduces to BA, i.e. BA solution ⇒ BB solution. Proof: […]

Proposition 2: as long as f ≤ n/2, BA reduces to BB, i.e. BB solution ⇒ BA solution. Proof: […]

The FLP Impossibility result

Theorem. [Fischer-Lynch-Paterson 85’] For every n≥2, even for f=1 no deterministic protocol for Byzantine agreement satisfies Termination+Agreement+Validity in the asynchronous permissined model.

Note: this formally separates what’s possible in synchronous vs asynchronous settings.

Remark 1: the type of fault requred is just a crash fault (no need for Byzantine behavior!). But we will be proving this theorem using the Byzantine behavior, to make our lifes easier.

Remark 2: impossibility for BA implies impossibility for BB and SMR.

Remark 3: goal of all these impossibility result is to understand what fundamental trade-offs we need to make.

Workarounds: - allow randomized protocols (nodes flip coins) - strengthen assumptions: partially synchronous model (see Lecture 8)

very long but beautiful proof:

Start of the proof: Configurations.

Suppose such a protocol exists, call it π\pi. We will prove that there are situation where it runs forever, contradicting Termination.

Definition: a configuration C records - the state of every node (private input + sequence of msgs received thus far; - and the message pool M.

Note that delivery of a msg (r,m) causes state transition C→C’. Therefore a sequence of message deliveries can be thought of as a path in a directed graph where - vertices = configurations; - edges = state transitions (message deliveries)

image

Because deliveries of messages are picked by an adversary, we can think of adversary as a person choosing a path in the graph. Our goal is to show that an adversary can find an infinite path (whose edges contain infinite number of edges with messages delivered to honest nodes)

Rememer that we assume that all messages are either 0 or 1. This is fine, because having 2 choices is worse than more.

Definition: 0-configuration = no matter what adversary does from here (which node is byzantine, what it does, and the order of message deliveries), all honest nodes output 0. 1-configuration = no matter what adversary does from here, all honest nodes output 1. Ambiguous configuration: adversary has options of forcing all 0’s or forcing all 1’s.

Remark: the agreement property of π\pi guarantees that these are the only possibilities for configurations.

Proof plan: show there exists an infinite sequence C0C1C2C_0\to C_1\to C_2\to \ldots of ambiguos configurations (i.e. π\pi might run forever, contradicting termination). In Lemma 1 we will prove that C0C_0 exists, and in Lemma 2 we will show how to find next ambiguous configurations on each step.

Lemma 1: Initial Ambiguous Configuration

Lemma 1 For any allegedly correct (satisfying BA) deterministic protocol π\pi there exist private inputs (0 or 1 for every node) such that the corresponding initial configuration is ambiguous.

Proof: let XiX_i = configuration where private inputs of first ii nodes are 1, and last nin-i nodes are 0: [11……1110000…….00]

Note: (i) X0X_0 is a 0-configuration (because of the validity property of π\pi) (ii) X1X_1 is a 1-configuration (because of the validity property of π\pi)

Claim: one of the XiX_i’s is ambiguous.

Proof: continuity idea. Let’s flip bits from 0 to 1, starting from the left. At some point we need to flip from 0-configuration to something else.

Let ii be such that (1) Xi1X_{i-1} is a 0-configuration; (2) XiX_{i} is not a 0-configuration;

We will argue that XiX_i must be ambiguous (i.e. cannot be 1-configuratiom).

(A) adversary has a strategy that forces all 1’s output, simply because we know that XiX_i is not a 0-configuration.

(B) suppose node ii is Byzantine (note that we have not chosen the Byzantine node in advance before). Its input is 1 in XiX_i, but suppose it executes π\pi as if it is an honest node with private input 0. Because Xi1X_{i-1} is a 0-configuration, we deduce that in this case π\pi will halt with all honest nodes outputting 0, as required (because honest nodes cannot distinguish this situation (XiX_i with Byzantine node) from Xi1X_{i-1} configuration).

qed

qed

Lemma 2: Extending the Ambiguous Sequence of Configurations

Lemma 2. Let CiC_i be ambiguous, and (r,m) a message in CiC_i’s message pool. Then there exists a sequence of message deliveries such that: (i) last step = deliverry of (r,m); (ii) leads to an ambiguous configuration.

🟣
[CiC_i, ambiguous] → … → … → [CiC'_i] —(r,m)→ [Ci+1C_{i+1},ambiguous]

Remark 1. Since CiC_i and Ci+1C_{i+1} are both ambiguous, all configurations in between are also ambiguous.

Remark 2. Why single out the message (r,m) in the statement of the lemma? Well, because every message needs to be delivered (by the axiom of the asynchronous model), and while we find a path of ambiguous configurations, we may violate this property by leaving out one message undelivered forever. For example, we may take dummy messages from the node ii and keep delivering them to node ii over and over.

Lemmas 1 + Lemma 2 ⇒ FLP Theorem

  • Let C0C_0 be the ambiguous configuration promised by Lemma 1.
  • To define Ci+1C_{i+1} from CiC_i: - let (r,m) be the oldest message in CiC_i’s pool - let Ci+1C_{i+1} be the result of applying Lemma 2 to CiC_i and (r,m)
  • Want to deliver the oldest message (FiFo: first in first out; guarantees messagee delivery for all messages), but not immediately (since it may break the ambiguity), but by invoking Lemma 2 in order to retain the ambiguity.

  • ⇒ All of C0,C1,C2,C_0,C_1,C_2,\ldots are ambiguous, and π\pi never halts.
  • Note: each message gets delivered eventually because if (r,m) is added to pool at or before CiC_i (denote by MiM_i the message pool of CiC_i), then it gets delivered at or before Ci+MiC_{i+|M_i|}.

Start of the proof of Lemma 2

This is the heart of the argument of the theorem.

Lemma 2. Let CiC_i be ambiguous, and (r,m) a message in CiC_i’s message pool. Then there exists a sequence of message deliveries such that: (i) last step = delivery of (r,m); (ii) leads to an ambiguous configuration.

🟣
[CiC_i, ambiiguous] → … → … → [CiC'_i] —(r,m)→ [Ci+1C_{i+1},ambiguous]

Proof:

Easy case: delivering (r,m) immediately at CiC_i yields an ambiguous configuration, in which case we would be done.

(*) So we assume that delivering (r,m) immediately at CiC_i leads to a 0-configuration. Some more terminology:

Definition: a configuration CC is a 0*-configuration (predecessor of 0-configuration) if: (i) CC is reachable from CiC_i without delivering (r,m); (ii) if (r,m) is delivered at CC, then CC transitions to a 0-configuration [C]—(r,m)→[0-configuration]

Similarly, we can define 1*- and ambiguous* configurations.

Note: - Lemma 2 <=> there exists an ambiguous* configuration CiC’_i. - Assumption (*) <=> CiC_i is a 0*-configuration

Hunting for a non-0*-configuration.

Goal: find an ambiguous* configuration. Assumption: CiC_i is a 0*-configuration.

Let’s do breadth-first search from CiC_i, without delivering message (r,m)

In the breadth-first search we must encounter 1* or ambigous* configurations, because: - CiC_i is ambiguous; - if all the encountered configurations are 0*, after the eventual delivery of (r,m) the configuration becomes 0-configuration, making the initial CiC_i into a 0-configuration, contradicting its ambiguity. (note that if in the process the protocol halts with all honest 1’s, then it enters 1* configuration by definition as needed; if it halts with all honest 0’s, then its an example of a 0* configuration)

What we want is to encounter ambiguous* configuration. Let YY = non-0* configuration closest to CiC_i (i.e. fewest of message deliveries). Let XX by YY’s predecessor on shortest C→Y path:

🟣
[CiC_i, ambig, 0*] → [ , 0*] → … → [X, 0*] —(r’,m’)→ [Y, non-0*] (remark: (r,m) stays in the msg pool, and because the path is shortest, all the preceding configurations to Y are 0*.)

Claim. Y cannot by a 1*-configuration. So Y must be ambiguous*, finishing Lemma 2. (The final output of Lemma 2 is precisely the configuration Y)

proof: suppose, for contradiction, Y is a 1* configuration. Let’s now focus on the following step:

[X, 0*] —(r’,m’)→ [Y, 1*]

Let’s imaging we deliver (r,m) to Y ⇒ we transition to 1-configuration:

[X, 0*-config] —(r’,m’)→ [Y, 1*-config] —(r,m)→ [Z, 1-config]

What if we deliver the two messages in the reverse order:

[X, 0*-config] —(r,m)→ [W, 0-config] —(r’,m’)→ [V, 0-config]

Point: (i) msg order (r,m), (r’,m’) ⇒ 0-config V (ii) msg order (r’,m’), (r,m) ⇒ 1-config Z

Case 1: r≠r’. All nodes see same sequence of received messages in orders (i) and (ii) (from their viewpoint they don’t know which came first) ⇒ all honest nodes behave identically in (i) and (ii) ⇒ V and Z cannot be different configurations, contradicting the above.

Case 2: r=r’. If this r=r’ node is Byzantine, then honest nodes again cannot tell the difference, and as above V and Z cannot be different configurations, contradicting the above. In more detail: - scenario 0: r recieves m before m’ (⇒ final outputs =0), r folllows π\pi honestly. - scenario 1: r receives m’ before m (⇒ final outputs =1), r acts as if received m before m, and then follows π\pi honestly. In this case honest nodes are deceived, and need to output all 0’s as in scenario 0, but this contradicts the fact that Z is a 1-configuration.

qed

(Q: I might have misunderstood something, but isn't the proof suspicious in the sense that the lemma 1 and lemma 2 might need different nodes to be Byzantine? In mathematical terms, the configuration C_i might only be reachable if another node r'' is Byzantine. In such cases, it might not be possible to assign r=r' to be a Byzantine node since f=1. A: The protocol doesn't know which (if any) node is Byzantine. The mere threat that a particular node might be Byzantine is enough to prolong ambiguity.)

qed

Conclusions after the FLP impossibility result

  • Impossibility results make clear the compromises + trade-offs required (i.e. Safety vs Liveness is a fundamental trade-off with unbounded msg delays)
  • They also clarify which assumptions matter (PKI in synchronous model, synchronous vs asynchronous model)
  • They also guide you to the right model for consensus/blockchains porotocols (e.g., partially synchronous model)
  • The proofs are beautiful, especially the hexagon argument