Operating-Systems-Notes

Distributed File Systems

DFS models

Remote File Service : Extremes

extremes

extremes2

  1. Extreme1 : Upload/Download
    • like FTP, SVN
    • + local read/writes at client
    • - entire file download/upload evn for small accesses
    • - server gives up contro;
  2. Extreme2 : True Remote File Access
    • Every access to remote file, nothing done locally
    • + file access centralized, easy to reason about consistency
    • - every file operation pays network cost, limits server scalablity

Remote File Service : A compromise

A more practical Remote File access (with Caching)

  1. Allow clients to store parts of files locally (blocks)
    • + low latency on file operations
    • + server load reduces => more scalable
  2. Force clients to interact with server (frequently)
    • + server has insights into what clients are doing
    • + server has control into which accesses can be permitted => easier to maintain consistency
    • - server more complex, requires different file sharing semantics

Stateless vs Stateful File server

Stateless Stateful
Keeps no state; Okay with extreme models, but can’t support ‘practical’ model Keeps client state needed for ‘practical’ model to track what is cached/accessed
- Can’t support caching and consistency management + Can support locking, caching, incremental operations
- Every request self-contained. => more bits transferred - Overheads to maintain state and consistency. Depends on caching mechanism and consistency protocol.
+ No resources are used on server side (CPU, MM). On failure just restart - On failure, need checkpoining and recovery mechanisms

Caching state in a DFS

cachingstate.png

System How When
SMP Write-update/Write-invalidate On write
DFS Client/Server-driven On demand, periodically, on open..

Replication vs Partitioning

Replication Partitioning
Each machine holds all files Each machine has subset of files
Advantages Load balancing, availibility, fault tolerance Availibility vs single server DFS;
Scalability with file system size;
single file writes simpler
Disadvantages Write becomes more complex
- Synchronous to all
- or, write to one, then propagate to others
replicas must be reconciled e.g. Voting
On failure, lose portion of data
load balancing harder, if not balanced, then hot-spots possible