Clusterfile

Parallel Distributed File System

Summary

Clusterfile is a parallel file system for clusters of computers. In 2002, we focused on broadening the applicapable areas. The goal of our earlier design was to efficiently utilize internal parallelism of applications. Internal parallelism is created by I/O access of multiple processes of the same application. External parallelism, on the other side, arises from simultaneous access of different applications. Our extensions address external parallelism by introducing not only application-specific, but also system-wide optimizations.

We implemented the file system partially in user level, partially in the Linux kernel. Based on a kernel module supporting the VFS (Virtual File System) interface, Clusterfile may be mounted in the local directory tree of any cluster node. Meta data is managed by cooperation of the kernel module with a central manager instance. We introduced collective I/O operations for optimizing simultaneous access from many nodes to the same file. Furthermore, we implemented an MPI I/O interface for Clusterfile, which we currently compare with other MPI I/O implementations.

For the future, we plan to increase application performance and scalability by introducing cooperatice cacheing. This extension will be done in cooperation with subproject Scalable Servers on Cluster Computers. Furthermore, we will improve scalability of meta data management by decentralization. Currently, we are studying various policies, such as distributing or replicating meta data on the cluster nodes.