Directory Tree Organization and Accessing in Large-Scale Distributed File Systems

Background

The directory tree is a logical organization of files in distributed file systems, which is easy to understand and can be implemented efficiently. However, the directory is difficult to be divided into sub-trees and then managed in a distributed manner for bulk files without performance degradation. There are several methodologies of handling huge number of files, but none of them can perfectly achieve the target that all operations over directory are atomic without performance degradation. The three state-of-the-art implementations are:

1. Statically split the directory tree into sub-trees and use the sub-tree routing table for locating service. (HDFS federation)

2. Make use of a distributed database or BigTable to store the directory tree structure, and use SQL for directory tree management. (Microsoft Data Lake Store)

3. Use the distributed Key-Value system to store the directory tree structure, and apply getting and putting Key-Value for directory tree management. (Ceph)

 

Target

  • All directory operations on directory are atomic.
  • The logical directory can support at least 1 trillions dentries.
  • The OPS supports linear capacity expansion by adding more servers.
  • All the operations on directory finish in milliseconds maximum, except some operations like list bulk of dentries.

Related Research Topics

  • File system interface with POSIX semantics.
  • Directory splitting and merging.

Scan QR code
关注Ali TechnologyWechat Account