Yury A Izrailevsky


Comparative Analysis of Common

Approaches in Distributed File Systems

(Based on WebNFS, AFS and CIFS)


CS576 - Distributed Operating Systems Seminar

Professor John Carter



Department of Computer Science

University of Utah

Winter Quarter, 1997


Introduction

The explosive growth of the World Wide Web, along with the evolution of the HTTP, HTML, CGI, and other Internet standards, has enabled many applications which would not have been feasible just a few years ago. There is little doubt that this evolution will eventually lead to support for general-purpose distributed computing over the Web. Such computation minimally requires access to local and remote files, private and public data, and local and remote computation. Issues like transparency, architectural and operating system compatibility, speed, and security (as well as other current Internet problems) will need to be addressed to support a robust and secure distributed computing environment, presented in a form of a distributed file system.

In the recent years, this problem has received a lot of attention in the academic community. It resulted in several research projects, most notably, the Andrew File System (AFS) at CMU, and the Web File System (WebFS) at Berkeley. Without undermining the practical significance of the latter projects, I would like to concentrate on the commercial implementations of distributed file systems, specifically, Sun's WebNFS, TransArc's AFS, and Microsoft's CIFS. Currently, these are the most important players on the market, each striving to establish their system as a standard for file management across the Web.

In this paper I will attempt to analyze particular approaches adopted by each system, their strengths and weaknesses, and point out some differences and similarities in the methods they adopt. This task is complicated by the fact that these file systems differ significantly not just in their implementation, but also in their functionality and even purposes. While AFS accentuates security and "extreme" transparency at the cost of losing its UNIX compatibility, WebNFS is a mere extension of UNIX-based NFS 3.0, and CIFS simply builds upon native Windows file-sharing protocols. Nonetheless, there exists a common denominator to all these systems, which is the ability to carry out Internet-wide file transactions and file sharing. In the next few sections, I will summarize most important aspects of each of the file systems and compare them based on their effectiveness, speed, security, compatibility with other common Internet protocols, and other relevant issues.


WebNFS

Origins:

WebNFS is an enhanced version of SunSoft's NFS (Network File System) 3.0 protocol. Taking NFS as a basis, SunSoft extended it to be applicable over the intranet and the Internet, allowing to read from and write to files over the Web instead of just viewing them through a browser.

There are currently about 10 to 12 million nodes connected to various NFS servers throughout the world. WebNFS allows access to the information on any one of these servers. Instead of having to use a Web browser to retrieve the files, switch tasks, and cut and paste the data into applications, WebNFS allows for a direct and transparent access to Web data from within the applications themselves, compatible with the way applications now access local disks.

NFS was originally designed at Sun Microsystems as a file access protocol for local area networks. The basic idea behind NFS is that clients can "mount" a server filesystem to appear as if on the local hard drive. Instead of manually transferring entire files across the network, the user can rely on NFS to move the data as needed (originally in 8K blocks, now as negotiated between the client and the server), and take advantage of local caching.

Initially, NFS used UDP transport protocol, which performs well on local area networks (LANS), but is terrible on low bandwidth and high latency WANs like the Internet. TCP, on the other hand, is well suited for WANs, and its recent implementations had a fair performance on LANs as well. Thus, WebNFS has adopted TCP as a transport, making it more suitable for widely distributed data transactions. UDP is also supported and can be used in local network transactions.

Another limitation of the earlier versions of NFS was the restriction of file offsets to 32 bit quantities (thus confining the maximum file size to 4.2 GB). WebNFS extends file offsets and a number of other parameters to 64 bits which (temporarily?) eliminates the problem.

Firewalls:
For a long time, UDP based NFS could not get though with most firewalls since UDP datagrams are considered insecure due to their vulnerability to replay attacks. Utilization of TCP allows WebNFS to avoid this problem.

Traditionally, NFS implementations use port 2049 for TCP or UDP connection. However, as an RPC based protocol, NFS connection must first get a port number by first registering at port 111, where Portmap service maps a port dynamically. In the majority of cases, NFS still gets the same port 2049 as it is designated for NFS transactions. WebNFS tries to avoid this extra step by skipping Portmap whenever possible and going directly to port 2049. This improvement, however, is relative, since it poses some potential danger by eliminating the the extra security step.

Optimizing file access:

MOUNT protocol is used to generate filehandles to uniquely identify files and directories on the server. Originally, all filehandles were represented as 32 bit sequences. WebNFS uses variable filehandle size - between 0 and 64 bits - which allows for more precision, yet saves memory and reduces network traffic in most cases.

Another WebNFS addition was the introduction of Public Filehandles and Milti-Component Lookup. Public filehandle is a zero-length filehandle that is usually associated with the root directory that is open to the outside NFS connections. Thus, instead of having to "mount" in that directory to receive its handle, all potential clients already have this handle and can eliminate the MOUNT step altogether. Another slow-down for the remote NFS calls has been path evaluation. First, the component had to be mounted on the server, getting the root's filehandle. Then, for each step in the path, it had to issue a separate LOOKUP request. WebNFS protocol has the server do all the successive lookups once it receives a handle and a path to a file or a directory with a single Multi-Component lookup request.

All these optimizations allow to significantly reduce the amount of network traffic and time. Consider the following example: a client needs to get a handle for file "/foo/bar/yury.file" from the server. Originally, this would require the following requests:

PORTMAP, MOUNT, PORTMAP, LOOKUP foo, LOOKUP bar, LOOKUP yury.file.

With WebNFS we only need to issue one request:

LOOKUP /foo/bar/yury.file

Considering high latency for some network connections, this seems to be a very convenient feature.

Another optimization adopted by WebNFS is using the sliding window approach in sending requests (similar to batching used by CIFS). Instead of waiting for response on each particular requests, WebNFS client sends several requests based on the size of the sliding window, as negotiated with the server. This allows to minimize network latency delays.

WebNFS URL:

NFS requests can be used to browse files on the World Wide Web. The syntax of the NFS URL requests is similar to those of HTTP and FTP protocols and has the following form:

nfs://server[:port #]/[/]path - corresponds to 'Lookup 0x0 "[/]path"'

The default for port number is 2049. The second (optional) slash before the path corresponds to the absolute path on the server specified. Its omission implies a relative path to the directory with the public filehandle (0x0). The following is an example of NFS URL:

nfs://www.cs.utah.edu//foo/bar

which requests a filehandle for "/foo/bar", given the absolute (relative to the root) path.

WebNFS vs. FTP:

Both protocols support similar functionality, although WebNFS seems to be more efficient for most tasks. Consider a simple task that involves transferring data from N files. FTP protocol would require establishing N+1 TCP connections to the server (one request to a control connection, plus a GET request for each file) and transferring N entire files, even though only a portion of their data is necessary. If a TCP connection is broken while transferring file data, another one is to be established anew, and the file is transferred from scratch.

WebNFS, just like FTP, also establishes a control TCP connection once it connects to the server. However, all subsequent file transfers go through the same connection, which eliminates the overhead of opening and closing connections for each file. WebNFS also keeps track of file offsets, so that only the needed data from the files is transferred across the network. Also, if the TCP connection is broken, once it is restarted the retransmission resumes based on the most recent file offset received, so that no data has to be transmitted more than once.

WebNFS vs. HTTP 1.0:
HTTP1.0 is a simple network protocol to transfer documents, designed primarily for Web browsing. It is very similar to FTP, and poses the same limitations: multiple TCP connections for control and each subsequent file requested. Neither does it support file offsets. It should be noted, however, that HTTP1.1 protocol has tackled most of this issues.

HTTP offers support for MIME headers for the browser to distinguish document format. WebNFS does not support MIME tags, only raw binary data.

WebNFS seems to perform better than HTTP, especially under heavy loads. The following graph shows the effects of increasing clients load (measured in Web operations per second) on the server's response time.

As we can see from the graph above, HTTP server's response time hits its limits at about 200 WebOps/sec, while WebNFS server performs reasonably even under the 600 WebOps/sec load. (Source: WebNFS white paper, for more information please see the online references).

Security:

The basis of WebNFS security is the Remote Procedure Call (RPC) layer. Currently, the following authentication flavors are supported:

File and directory access is controlled by using UNIX permission bits.

Proxy and Caching:
NFS clients usually cache data in memory. Disk caching is also quite common for some applications. As yet, WebNFS servers do not have any built in proxy or proxy/cache mechanisms. Definitely, some room for improvement.

Summary:
WebNFS may turn out to be a very helpful solution for many Internet and intranet based applications, whenever data has to be shared across distant servers. It offers a relatively fast, convenient, and somewhat transparent way of data communication and may significantly facilitate the way many problems are approached and solved in the modern computer world. Besides, it is fully compatible with the earlier versions of NFS servers, and so are its usage and maintenance. It also offers a number of advantages over using other common Internet protocols.

However, there is still a number of actual and potential problems associated with it. For example, it does not support server proxy/caching. The data is recopgnized in raw binary form only. Perhaps, the biggest problem of WebNFS is its flawed security, exaggerated even further by eliminating a number of steps for efficiency reasons, such as skipping PORTMAP or doing everything through one persistent TCP connection. Although WebNFS supports higher-level identification and encryption protocols, it usually defaults to lower level ones.


AFS

Origins:

AFS is a distributed UNIX-like filesystem for both Local Area and Wide Area Networks. Originally, it was developed at CMU as a part of Andrew project. Later, it was sold to and is currently marketed by Transarc Corp. of Pittsburg, PA. Although the system has significantly evolved since its days at CMU, Transarc decided to maintain its name and global root (/afs).

As a distributed file system, AFS enables cooperating hosts (both client and server) to chare filesystem resources across the network. Currently it is installed on and used by approximately 1500 nodes, which is four orders of magnitude less than that of NFS. There are less than 100 AFS cells (assigned according to Internet domain names), mostly corresponding to universities and research laboratories.

Global Structure and Naming Convention:

The root of the global AFS tree is /afs. AFS supports global naming convention, which means that any file on any server is seen the same from any other AFS location. In fact, requesting client or server does not even need to know the specific server the file belongs to. Example:

/afs/yurycell/yurypath/yuryfile

Moving the entire filesystems within the same cell is easy as never before - no need to update "/etc/filesystems" file on each client that uses it, because all filesystems mounted in the same cell are transparent.

If the string `@sys' appears in a file name to be used in AFS, it is automatically replaced with AFS's concept of the type of the machine that the file name is being expanded on. For example, on a sun 4 running SunOS 4.1.3, pathname "/afs/cs.utah.edu/@sys/bin/" becomes "afs/cs.utah.edu/sun4c_413/bin/".

This feature may be used in pathnames or symbolic links to allow them to be machine independent by choosing the correct path according to the machine's architecture.

Cells:

The next level of hierarchy in the AFS are cells. An AFS cell is a collection of servers grouped together administratively and presenting a single filesystem. Usually, a cell corresponds to a set of hosts under the same Internet domain name, such as:

However, it's up to the central AFS administration (i.e. Transarc) to assign you to a particular cell.

If one of the file servers crashes, its clients would not be affected as much since they can still operate with other data on the same cells and the remote cells it has permissions upon. Moving files from one server to another within the same cell is transparent and does not affect filenames.

Volumes and Replication:

AFS volumes represent the next level in the AFS hierarchy. UNIX divides disks into partitions. AFS further subdivides them into volumes. Volumes are limited in size (usually, between several megabytes and several hundred megabytes) atomic structures for server replications, usually representing a directory subtree belonging to a single user. Command `fs listquota` can be used to obtain information about the name, quota, and usage for a particular user.

If a user needs to temporarily exceed the limit of your volume, she cannot simply grab some more available space from the disk. Instead, she would either have to store it on some other volume, or be assigned another volume for herself. It's great for enforcing disk quotas, but causes a lot of grief for someone who has to exceed space limit in a given sub tree, causing users to overestimate their needs and waste disk space.

AFS replicates its data by volumes. Secure data is typically replicated within the same cell. Publicly accessible frequently requested data is replicated in multiple locations across the Internet. This allows to still be able to access the data even if the owner's server is down or the network is slow. For convenience, some volumes are declared read-only. This saves a lot of replication effort since read-only volumes can not be routinely modified.

AFS vs. Common Internet Protocols:

AFS is a TCP-based protocol that handles network communication in a manner very similar to that of WebNFS (i.e. it utilizes a persistent TCP connection for both control and data, only recognizes raw binary data, etc.). Initial connection overhead is slightly higher for AFS than it is for WebNFS since AFS has to go through a number of security checks that WebNFS usually skips. However, once the connection is established, AFS takes advantage of its caching mechanism and usually performs faster than NFS (25% faster than NFS 3.0 based on Andrew benchmark; no comparison is available between AFS and WebNFS).

AFS seems to perform a lot better than FTP or HTTP1.0 (although I could not find any numbers to support this) when transferring data across the network, since it offers similar advantages to that of WebNFS when compared to these protocols. They involve maintaining singe connection, keeping track of file offsets, etc.

Caching:

AFS supports client caching. All client machines run a Cache Manager process, which maintains information about the users logged in, finds and requests data on their behalf, keeps chunks of retrieved data on local disks. Unlike WebNFS, which is stateless, AFS is stateful, i.e. it maintains the current state for each client.

Client caching reduces network traffic and speeds up "warm reads." After a client is done using a file, if any writes have been performed, it will be automatically updated on all replicated volumes.

To maintain consistency on read-write files, AFS uses the Callback mechanism, which ensures that the cached copy of a file is up-to-date. When a file is modified the fileserver breaks the callback. When the user accesses the file again the Cache Manager fetches a new copy if the callback has been broken.

Scalability:

Scalability claimed by AFS is rather impressive. It ranges between 1 server/client and 1:200 server/client (AFS goal) ratios, with the recommended ratio of 1:50. Dynamic cells (easy to add or remove clients and servers) can host tens of servers and thousands of clients. However, it is rather difficult to estimate the true (versus claimed) limits of AFS scalability, since the entire installed base for AFS is currently only about 1500 clients.

Security:

AFS uses Kerberos encryption to secure data traveling across the network. Mutual authentication supported (both service provider and requester identify themselves). No less or more secure encryption algorithms are provided, no plug-ins allowed. This means that one can not browse AFS files across the web, since no web browsers (to the best of my knowledge) currently support Kerberos encryption framework. On the other hand, this could be viewed as an advantage, since (unlike WebNFS) AFS does not default to lower levels of security under any circumstances.

AFS utilizes three system-defined protection groups:

AFS supports Access control lists (ACL's). ACL contains a list of groups and users (up to 20 total) authorized to this directory. ACLs work on directory (not file) level. Subdirectories automatically get a copy of the parent's ACL upon creation, but can be modified later independently.

There is a total of 7 access rights: lookup, read, insert, write, delete, lock, and administer(change the ACL) per each directory. Only administrators can execute `chgrp` and `chown` commands. The traditional UNIX permission bits can be ignored for all practical purposes.

Summary:

Using AFS offers a number of benefits. It supports client caching and volume replication, which significantly improves AFS performance, especially under heavy network loads. Besides, it increases the independence from the data originated on the remote cells (or even other servers within the same cell), since these data can be cached or replicated locally.

Global naming convention and location independence allow the AFS server to support transparency when accessing data both locally and remotely.

High security standards, uniform across all AFS cells, grants the ability to maintain high levels of security in any transaction, whether initiated locally or across the network.

Yet, there exists a number of notable disadvantages. Perhaps the greatest one is that AFS is NOT a UNIX filesystem, and is not compatible with any other filesystems. People who are used to administering UNIX file systems like NFS often find AFS installation and maintenance cumbersome and inconvenient.


CIFS

Origins:

CIFS, or Common Internet file system, is Microsoft's protocol that enables data and resource sharing and communication across both local and wide area networks. It is an extension of Windows native Server Message Block (SMB) protocol that allows file and printer sharing on local networks, optimized for the Internet.

CIFS is claimed to be a "platform independent" protocol. However, it is heavily adjusted to fit communication standards built into Microsoft Windows. This, however, is also true of WebNFS and AFS favoring UNIX-based standards. It is noteworthy that SMB has been successfully adopted and utilized across various platforms in the past, and there is a strong reason to believe that CIFS will be just as auspicious in this respect.

There are currently several different versions of CIFS protocol. Server and client can negotiate which particular version is to be used in their communication.

CIFS can be run over both TCP and UDP, although TCP has recently been prevailing on LANs, and is predominant for the Internet communication. In this respect, CIFS is similar to WebNFS and AFS, which also support both TCP and UDP as local network transports, but use TCP exclusively for communications across the Internet.

File and Printer Access:

CIFS supports all standard file operations: open, close, read, write, etc. Printers are treated just like files: they are opened and written to, causing a print job to be queued.

File and record locking is also supported by CIFS. Once a file has been locked by an application, it can not be accessed by non-locking applications.

Caching and Data Consistency:
CIFS supports caching, read-ahead and write-behind for all files, including the unlocked ones. This schema is used whenever there is only one client accessing a file, or several clients reading from a file. If several clients are accessing a file and one (or several) are requesting writes, this is considered unsafe. The server then notifies all clients of the unsafe state, and other (non-caching) safer method are adopted.

Applications can register with the server to be notified whenever a file or directory is changed. Such updates help to avoid the problem of clients having to constantly poll the server in case they need consistent information.

It is clear that CIFS, just like AFS and unlike WebNFS, is a stateful file sharing protocol. Instead of simply processing requests as they come along, CIFS servers keep track of the state of each client. This determines which methods should be used and when, and allows for higher efficiency and security.

Extended Attributes:

CIFS allows non-file system attributes, such as content description, author name, or expiration date, be added to files. Extended attributes are optional and supplement standard file attributes like filename, length, creation time, etc. This feature somewhat resembles MIME tags used by the HTTP protocol. CIFS is a pioneer in this respect, since neither WebNFS nor AFS support this property, rather treating all files as binary data.

Directory Subtree Mounting and Replication:
CIFS supports mounting multiple servers and disk volumes to subtrees in clients directory hierarchy to appear as if residing on the same server and volume. This is done in a very similar fashion with the way WebNFS and AFS mount remote servers. Changes in the physical location of the data caused by server reconfiguration are made transparent to the client (as long as their names remain consistent).

Just like AFS, CIFS supports subtree replication (which is not limited by discrete volumes, as it is done in AFS). Such replications are transparent to the client and help improve network load and fault tolerance (in case a remote file server decides to crash). However, obvious problems arise in trying to maintain consistency among replicated data. In general, it is dealt with similar to the way servers treat locally cached copies of single files.

Global Name Resolution:
The necessity to mount remote servers is reduced due to the possibility of using global file names. The syntax is the same as that for addressing local files. Consider the following example - a remote client wants to access file "yury.doc" residing in the "\home\ugrad\izrailev\" directory on server "labnt0.cs.utah.edu". There are two different ways the client could approach this task.

If it plans to access the server on a regular basis, it would be a good idea to add an index to a table containing server names a file prefixes, say, mapping Z to "home" on "labnt0.cs.utah.edu". The call would then look as following:

Z:\ugrad\izrailev\yury.doc

This is similar to the way AFS handles global name resolution, except instead of adding an index to a table, AFS would create a symbolic link.

The URL format for remote file access is also available. For the example above, it would be the following:

file://labnt0.cs.utah.edu/home/ugrad/izrailev/yury.doc

Security and Authentication:

A set of resources are available for clients to access on each server, which may include files, subdirectories, printers, etc. CIFS supports to different ways of controlling access to these resources - share level and user level.

Share level method assigns passwords to each particular resource. Several passwords for the same resource may denote different level of access privileges for that unit. Any client on the network who can identify server name, resource name, and password will be granted access.

User level method, instead of assigning passwords for to resources, keeps track of all user ID's and passwords. When requesting a resource, a client must identify the server, the resource, its ID and password. User level offers a convenient way of maintaining and modifying a list of trusted clients without having to reassign a password to a resource each time this list has to change. It is also more convenient to keep track of resource utilization statistics based on client's IDs.

Therefore, user level mechanism is preferred to share level whenever possible. However, a number of servers and clients who use older versions of SMB do not support user level security. To maintain compatibility, a server has to default to whatever mechanism was requested by the client. This is a major security problem for CIFS, similar to that of WebNFS, which has to default to the level of security requested by the client, and is natural to all protocols that try to maintain backward compatibility. For AFS, on the other hand, this is not an issue since it deliberately makes itself incompatible with other protocols to maintain higher security levels.

To encrypt data traveling across the net CIFS uses DES encryption protocol. To verify each other, both client and server use an 8-bit key humbly provided by Microsoft upon request (see CIFS Internet-draft, section 2.10.1).

Summary:
CIFS protocol has common features with both WebNFS and AFS, as well as a number of unique characteristics. Like AFS, it allows for data caching and replication, which is crucial under heavy network loads. The way it mounts data from remote servers into the local hierarchy is similar to that of WebNFS. Global naming resembles somewhat the AFS approach (although it can hardly be called location independent), while CIFS URL is a clone of WebNFS's URL notation.

High levels of security are not required when connecting to a CIFS server, which makes it just as vulnerable as WebNFS (or perhaps even more so). Both WebNFS and CIFS use DES encryption, while AFS utilizes a lot more secure Kerberos framework. Although WebNFS could potentially use Kerberos, it almost always defaults down to DES anyway.

Resource access control is handled differently in every system. CIFS uses either passwords for the resource (NOT a good idea), or user ID's and passwords. CIFS also provides access to printers by treating them as files. WebNFS uses traditional UNIX permission bits that heavily limit the combination of users and the types of access to that file. AFS uses ACLs to allow any combination of users (up to 20 groups total) and seven types of access privileges to each subdirectory (while WebNFS and CIFS do it on per-file basis). Yet, both CIFS and WebNFS maintain compatibility with their predecessors, something that AFS lacks.

All three systems use "standard" optimizations over the "veteran" Internet protocols HTTP and FTP. They all batch their requests, maintain a persistent TCP connection, and keep track of file offsets for the data that has already been transferred. While both WebNFS and AFS treat all files as binary data, CIFS appends extended file attributes. They may not be as explicit as MIME tags, but they are definitely better than plain file-system-type file attributes.

CIFS supports stateful servers in a manner similar to AFS, something WebNFS is in need for. Both CIFS and AFS keep track of all the clients cached and replicated data and (try to) maintain consistency among parallel copies.
All three protocols seem to have comparable fault tolerance levels, expressed in resuming operations terminated by network problems. WebNFS uses a sliding window approach for batching requests, which seems to work rather well under changing network conditions. Local server crashes would halt both WebNFS and CIFS clients, while AFS clients could be switched transparently to a different file server under the same cell.

It is extremely difficult to draw any performance comparisons since very little information is available. AFS showed a 25% better performance than NFS 3.0 (WebNFS predecessor) based on Andrew benchmark, probably caused by the lack of disk caching on the part of NFS, but this could be easily compensated for by the optimizations that WebNFS introduced into connection setup and maintenance. It is hard to estimate performance without actual numbers, but based on the available information and my experience with NFS and SMB, WebNFS and CIFS should perform comparably.


Conclusion

As has hopefully been shown in this paper, each of the file sharing protocols presents certain advantages over the other ones, and is likely to be selected based on customer's particular needs. While AFS is suited better for data sharing protected by higher levels of security, WebNFS and CIFS simply extend already existent protocols to be more efficient for Internet-wide transactions.

The predominant distributed file system solution will emerge based upon many conditions, where the outcome of the current architectural and operating systems competition will play a decisive role. Nevertheless, I am positive that the combined design and trial experience of many different systems will help to identify the right approaches to solving the problem of widely distributed general-purpose computing.


Online References

WebNFS:

Introduction to WebNFS

Press Release

Auspex Press Release

Spyglass Press Release

White Paper

Short report on WebNFS

AFS:

AFS description and FAQ

AFS newsgroup

AFS Beginners Guide

CIFS:

CIFS Resource Center

CIFS product description

Internet draft on CIFS



izrailev@eng.utah.edu

HOME