Computer Science 432
Operating Systems

Williams College
Fall 2006

Lecture 23: Encryption, Distributed Systems
Date: November 30, 2006

Agenda

Announcements
Encryption (see notes from last time)
Distributed Systems
- situation: a collection of computers connected by a network, which allows for
  - resource sharing
    - share files
    - remote hardware devices: tape drives, printers
  - computational speedup
  - reliability - if a node fails, others are available while repairs are made
- The OS for a distributed system could be:
  - Network Operating System - users are aware of multiple nodes - access resources explicity:
    - remote login to appropriate machine (telnet, rlogin, ssh)
    - transferring files explicitly (ftp, scp)
    - just use standard OS with network support
    - simpler, more burden on the user
  - Distributed Operating System - users need not know where things are on the network
    - system handles the transparency
    - data migration - automated file transfer (scp), file sharing (NFS, AFS, Samba)
    - computation migration - transfer computation across the system
    - remote procedure call (RPC) - a process on one node makes a function call (essentially) that runs on a different node
    - process migration - execute an entire process, or parts of it, on different nodes
    - more complex, but more automated
- Design Issues for A Distributed OS: Transparency
  - access transparency - ability to access local and remote resources in a uniform manner
  - location transparency - users have no awareness of object locations
  - migration transparency - object can be moved without changing name or access method
  - concurrency transparency - share objects without interference
  - replication transparency - consistency of mutiple instances (or partitioned) files and data
  - parallelism transparency - parallelism without need to have users aware of the details
  - failure transparency - graceful degradation rather than system disruption, minimize damage to users - fault tolerance
  - performance transparency - consistent and predictable performance when the system structure or load changes
  - size transparency - system can grow without users' knowledge - scalability - difficult issue, as bottlenecks may arise in large systems
Distributed File Systems
- share disk files over a network
- Terminology:
  - Service - software running on one or more machines providing some function to other machines
  - Server - service software running on a single machine
  - Client - process that can invoke a service
  - Client interface - set of client operations provided by a service
- For a DFS, the client interface most likely includes a set of file operations, much like those available to access local disks (create, delete, read, write)
- DFS Issues:
  - remote and local files should be accesible through the same client interface
  - performance is important (access time, service time, latency)
  - naming - file names need to have enough information to find the location
  - replication - are there replicas of some or all files? How are they kept synchronized?
  - caching - use local disks to cache remote files, use memory cache on client or server
- Naming Schemes
  - location transparency - file name does not reveal the file's physical storage location
  - location independence/migration transparency - file name remains the same when the physical location changes
  - Approach 1: file names include a location and a path
    - bull:/home/faculty
    - \\ntserver\path\to\file
    no location transparency or independence here
  - Approach 2: remote directories are attached to local directories, in much the same way local filesystems are included
    - Sun Network File System (NFS) (more soon)
    - Windows "attach network drive"
    - Mac "connect to server"
    File names do not include the server name, but the server name is needed to make the initial connection of "mount"
  - Approach 3: total integration of file systems - a single global name structure spans all files
    - Example: AFS - worldwide pathnames! /afs/rpi.edu/home/85/teresj
Network File System (NFS)
- means to share filesystems (or part of filesystems) among clients transparently
- a remote directory is mounted over a local file system directory, just as we mount local disk partitions into the directory hierarchy
- mounting requires knowledge of physical location of files - both hostname and local path on the server
- usage, however, is transparent - works just like any filesystem mounted from a local disk
- mount mechanism is separate from file service mechanism; each using RPC (remote procedure call)
- interoperable - can work with different machine architectures, OS, networks
- mount mechanism requires that the client is auhorized to connect (see /etc/exports in most Unix variants, /etc/dfs/dfstab in Solaris) - mountd process services mount requests
- when a mount is completed, a file handle (essentially just the inode of the remote directory) is returned that the client OS can use to translate local file requests to NFS file service requests - nfsd process services file access requests
- NFS fits in at the virtual file system (VFS) layer in the OS - instead of translating a path to a particular local partition and file system type, requests are converted to NFS requests
- NFS servers are stateless - each request has to provide all necessary information - server does not maintain information about client connections
- two main ways for clients to know what to request and from where: entries in file system table (/etc/fstab or /etc/vfstab), or automount tables (see /etc/auto_* on bullpen, for example). fstab entries are mounted when the system comes up, automount entries are mounted on demand, and are unmounted when not active
- many NFS implementations include an extra client-side process, nfsiod, that acts as an intermediary, allowing things like read-ahead and delayed writes. This improves performance, though the delayed writes add some danger (see below).
Caching Remote File Access
- Caching is important - can use the regular buffer cache on the client, could use client disk space as well. Server automatically will use its buffer cache for NFS requests as well as any local requests
- Advantages of caching on disk:
  - reliability (non-volatile)
  - can remain across reboots
  - can be larger
- Advantages of caching in memory:
  - can be used for diskless workstations
  - faster
  - already have memory cache for local access, and putting a remote file cache there allows reuse of that mechanism
- Cache Update Policy - mostly the same issues we have seen in other contexts
  - Write-through - write data through to disk as soon as write call is made - reliable, but performance is poor
  - Delayed-write - write to cache now, to server "later" - much faster write, but dangerous!
- Cache Consistency: need to know if the copy of a disk block in our cache is consistent with the master copy
  - client-initiated: client that wants to reuse a block checks with the server
  - server-initiated: server notifies clients of any changes from other processes
- Stateless vs. Stateful File Service
- Replication
  - reliability/availability: one server goes down, use another that has a replica
  - efficiency: use the closest or least-busy server
  - technique used by web servers - a request for a file at www.something.com may be silently redirected to one of a number of servers, like www2.something.com, www28.something.com, etc.
  - main issue: keeping replicas up to date when one or more is changing
  - if there is a "master copy" we can use caching ideas - just treat replicas like cached copies
  - if there is no master, any change must be made to all replicas
- AFS - a globally distributed file system.
  - we saw the naming convention that includes a site name in the path
  - use of a file cache on local disks - important as network latency is now (potentially) over the internet
  - the system caches entire files locally, not individual disk blocks
  - file permissions are now very important, as many users can browse - AFS supports more complicated file permissions, including ACLs
  - files can move among servers in the same cell without its name changing
Exam Out

Computer Science 432 Operating Systems

Williams College Fall 2006

Agenda

Computer Science 432
Operating Systems

Williams College
Fall 2006