Sunday, August 3, 2008

Scalable directories

While working on my current project, I felt the need of having big, scalable directories. The problem with directories in a traditional system is that they are not really scalable. The directory search is linear and it incurs a heavy penalty when the directory is big. There's gotta be some way to reach to the needed directory entry given a name. The traditional UNIX convention of directory being a regular file is just not good enough. Well, directory being a file, eases a few things for sure but it does not help in itself being the directory. We will brief few short comings of traditional directories.
1. Directory is a regular file.
2. Directory does not scale well.
3. Directory search is linear. Bigger the directory, bigger the performance hit.

This problem gets worse for distributed file systems. If number of clients are writing to the same directory, there would be a lot of IOs until the cache is warm. Of course, the size of cache does matter while dealing with large number of clients. So what we need is -
1. Do something extra for the directory.
2. Make the directory search efficient.
3. If the underlying file system is a distributed file system with large number of clients writing to the same directory, use range locking so as to have fine grained locking primitives to parallelize the directory access.
Few projects show indeed such initiatives are underway like this -
These things make sure that we are on the right track to scale directories.
A directory should have some kind of indexing which helps to locate the required data set in constant time. Insertion and deletion should be fast enough to accommodate multiple clients simultaneously. Directory deletion which includes deletion of all its contents should be faster.
Going further ahead, this topic leads to the relation between a file system and a database. Are they similar? Are they different?
IMHO, file system and databases are totally different. File system is more of a father figure for a database or for any application for that matter. A file system does much more than storing and retrieving data as compared to a database. Both of them try to efficiently store and retrieve data, both of them flaunt transaction logging mechanism and both of them try to be consistent all the time.
The solution could be to blend the best of both. And I would prefer blending database techniques in a file system. So a file system can gain from blending a relational database schemas with it. A simple example of this could be tagging. A file system can sport tags that are equivalent to a directory with respect to grouping things together. A file system can have database like transaction semantics (all or nothing) while doing updates. Essentially all of this would help to make the file system scalable as well.


Anoop Kunchukuttan said...

Post after a long time! good to hear from you :)

Anand said...

I am going to start blogging on a full throttle now on :)