Distributed File System - File Search over LAN - File Storage Management over LAN.



The concept is to develop a system which is able to monitor and keep track of files saved on different desktops over a Local Area Network and make these files available in a reliable and robust fashion, with strong search ability for files saved on hard drives of desktop computers connected via an Ethernet LAN.


The need is borne out by a practical difficulty faced in the Microwave CAD lab by desktop users. Since users generally use multiple machines for their work, they tend to save their files on the current local machine they are using. Later, when they desire some particular files, they are forced to search for those files over the machines they previously worked on, making it a wasteful and time consuming exercise. We intend to have a system in place which can alleviate these issues and in the process also provide saving of storage space and efficient retrieval of user files.


The Microwave CAD laboratory is equipped with 24 desktop workstations, about 10 of which run the Linux operating system and the rest use Microsoft Windows. All the desktops are interconnected with Ethernet LAN. The challenge is to develop a tool, preferably running on Linux, which can accomplish the following tasks:

  1. Monitor user activity.

The tool must monitor continuously the activities of file storage (possibly, users can be urged to save their files in a commonly shared folder on the network) on all the desktop computers and keep track of which file is saved on which hard drive. This would allow it to migrate frequently used files across hard drives to achieve reliability and robustness.

  1. Actively migrate frequently used files to increase retrieval ratio.

Since some desktops are heavily used (i.e. they are frequently turned ON), the reliability of file retrieval can be increased if frequently used files can be stored on desktops which are generally ON. This allows users to save their files on any machine and be assured by the system to fetch their files regardless of the fact that the original machine is currently powered down or not.

  1. Highly developed search abilities to search files over the network.

Since users would want to access their files working from their current desktop, the system must provide reliable search abilities. This can be accomplished by using tools like Google’s desktop search which can be suitably integrated to be used over a distributed environment, as in the present case.

  1. Maintaining redundancy to avoid data loss.

The system must also incorporate some form of redundancy generating mechanism so that user files are not destroyed by a hard drive crash. This could mean storing most frequently used data on more than one hard drive so prevent data loss due to hard drive crash or system failure.


The proposed features must be implemented for a heterogeneous environment and in this respect the heterogeneous environment of lab presents a unique opportunity to develop such a system which has the above mentioned features.


The basic tasks required to implement would be as follows:

  1. Programming to implement the system.
  2. Implementing network file systems.
  3. Implementing SAMBA sharing utility to share files across Linux and Windows environment.
  4. Developing algorithms and policies to introduce redundancy and availability of frequently used files.
  5. Integrating Google desktop search with the system.


As a critique of this system, one could argue that users can be encouraged to use a common shared storage area accessible from all machines. But this would not provide search abilities and no redundancy whatsoever. Also since the actual physical host for this shared space might be powered down, it makes absolutely no sense to thus implement it. As an advantage of this method, there could be a net saving of storage space since users will not tend to make multiple copies of their data on different machines.


Conceived By:

Kshitij Sudan