Two phase commit Failures in a distributed system

Yüklə 489 Kb.

Two phase commit

Failures in a distributed system

Example

Strawman solution

Reasoning about correctness

Correctness first

Performance Issues

Handling timeout on A/B

Handling timeout on TC

Possible termination protocol

Handling crash and reboot

Handling crash and reboot

Recovery upon reboot

Summary: two-phase commit

A Case study of 2P commit in real systems

What problem is Sinfonia addressing?

Sinfonia architecture

Sinfonia mini-transactions

Mini-transaction details

Sinfonia uses 2P commit

Potential uses of mini-transactions

Sinfonia’s 2P protocol

Sinfonia’s 2P protocol

Sinfonia applications

SinfoniaFS architecture

Example use of mini-transaction

General use of mini-transaction in SinfoniaFS

More examples: append to file

Sinfonia’s mini-transaction is fast

Yüklə 489 Kb.

Dostları ilə paylaş:

Two phase commit Failures in a distributed system

Two phase commit

Failures in a distributed system

Consistency requires agreement among multiple servers

Achieving agreement w/ failures is hard

This class:

Example

Clients want all-or-nothing transactions

Strawman solution

Strawman solution

What can go wrong?

Reasoning about correctness

TC, A, B each has a notion of committing

Correctness:

Performance:

Correctness first

Performance Issues

What about timeouts?

What about reboots?

Handling timeout on A/B

TC times out waiting for A (or B)’s “yes/no” response

Can TC unilaterally decide to commit?

Can TC unilaterally decide to abort?

Handling timeout on TC

If B responded with “no” …

If B responded with “yes” …

Possible termination protocol

Execute termination protocol if B times out on TC and has voted “yes”

B sends “status” message to A

Handling crash and reboot

Nodes cannot back out if commit is decided

TC crashes just after deciding “commit”

A/B crashes after sending “yes”

Handling crash and reboot

All nodes must log protocol progress

What and when does TC log to disk?

What and when does A/B log to disk?

Recovery upon reboot

If TC finds no “commit” on disk, abort

If TC finds “commit”, commit

If A/B finds no “yes” on disk, abort

If A/B finds “yes”, run termination protocol to decide

Summary: two-phase commit

All nodes that decide reach the same decision

No commit unless everyone says "yes".

No failures and all "yes", then commit.

If failures, then repair, wait long enough for recovery, then some decision.

A Case study of 2P commit in real systems

Sinfonia (SOSP’07)

What problem is Sinfonia addressing?

Targeted uses

Sinfonia: a shared data service

Goal: reduce development efforts for system programmers

Sinfonia architecture

Sinfonia mini-transactions

Provide all-or-nothing atomic operations

Trade off expressiveness for efficiency

Result

Mini-transaction details

Mini-transaction

Example:

Sinfonia uses 2P commit

Potential uses of mini-transactions

1. atomic swap operation

2. atomic read of many data

3. try to acquire a lease

4. try to acquire multiple leases atomically

5. change data if lease is held

6. validate cache then change data

Sinfonia’s 2P protocol

Transaction coordinator is at application node instead of memory node

Problems: crashed TC blocks transaction progress

Sinfonia’s 2P protocol

TC keeps no log

A transaction is committed iff all participants have “yes” in their logs

Recovery coordinator cleans up

Transaction blocks if a memory node crashes

Sinfonia applications

SinfoniaFS

SinfoniaFS exports a NFS interface