Sunday, August 3, 2008

Scalable directories

While working on my current project, I felt the need of having big, scalable directories. The problem with directories in a traditional system is that they are not really scalable. The directory search is linear and it incurs a heavy penalty when the directory is big. There's gotta be some way to reach to the needed directory entry given a name. The traditional UNIX convention of directory being a regular file is just not good enough. Well, directory being a file, eases a few things for sure but it does not help in itself being the directory. We will brief few short comings of traditional directories.
1. Directory is a regular file.
2. Directory does not scale well.
3. Directory search is linear. Bigger the directory, bigger the performance hit.

This problem gets worse for distributed file systems. If number of clients are writing to the same directory, there would be a lot of IOs until the cache is warm. Of course, the size of cache does matter while dealing with large number of clients. So what we need is -
1. Do something extra for the directory.
2. Make the directory search efficient.
3. If the underlying file system is a distributed file system with large number of clients writing to the same directory, use range locking so as to have fine grained locking primitives to parallelize the directory access.
Few projects show indeed such initiatives are underway like this - http://www.youtube.com/watch?v=2N36SE2T48Q&e
These things make sure that we are on the right track to scale directories.
A directory should have some kind of indexing which helps to locate the required data set in constant time. Insertion and deletion should be fast enough to accommodate multiple clients simultaneously. Directory deletion which includes deletion of all its contents should be faster.
Going further ahead, this topic leads to the relation between a file system and a database. Are they similar? Are they different?
IMHO, file system and databases are totally different. File system is more of a father figure for a database or for any application for that matter. A file system does much more than storing and retrieving data as compared to a database. Both of them try to efficiently store and retrieve data, both of them flaunt transaction logging mechanism and both of them try to be consistent all the time.
The solution could be to blend the best of both. And I would prefer blending database techniques in a file system. So a file system can gain from blending a relational database schemas with it. A simple example of this could be tagging. A file system can sport tags that are equivalent to a directory with respect to grouping things together. A file system can have database like transaction semantics (all or nothing) while doing updates. Essentially all of this would help to make the file system scalable as well.

Thursday, July 17, 2008

UNIX privilege protection

Recently I am developing a feature where I need to protect a file from all users including the root. Most of the kernels do this by restricting the permissions stuff and/or attaching some special flags with the files notifying that this file should not be touched. While shaving this morning, I had a weird idea.
I want to protect a file, so restricting the permissions to root is an obvious thing to do here. But still root can modify this file. If I want to block root user as well, what if I had an another internal root like user? This would also be a root user, say a root cousin which is invisible to outside world. So in this case, root can read the file but can not modify it. I know this breaks the standard UNIX legacy we have that there is only one powerful god and that is root. What if we shatter this? There would be more than one god, so would there be a clash of titans? No, the rule is one god does not interfere with other god. The gods are read-only while interfering with another god. In this case, it gives an impression that root user is not able to modify certain file. Since internally this file is owned by root cousin, root is not able to change/modify it. So the change of behavior is quite noticeable. I don't know if this would be acceptable.
Since this root cousin is invisible to end user, he/she can not inherit root cousin privileges. And a root user can not change/modify the files that need to be protected.
This is a very simple idea and might have occurred to a lot of people. The moment I struck it, I felt like noting it down somewhere and thats why I am posting it here. I still need to figure out how to implement this. Will update about it soon.
Few basics on how to implement this. This root cousin needs a dedicated uid and gid at least for UNIX. This uid and gid can not be used by end user. Kernel uses this identifiers for its own protection. So there is no way an end user can make a file owned by root cousin. Only the kernel is able to use these dedicated uid and gid in order to protect some files even from root. So it is quite obvious that we should not use these root cousin privileges everywhere. Rather, its use should be kept to minimal. For files, we don't want even root to change/modify, we should make them own by root cousin. While displaying the file properties (ls command), should we display the dedicated uid and gid for root cousin or should we show root uid and gid instead? If we show root uid and gid, the end user will still have the illusion that these files are owned by root but they can not be changed/modified. Great!! This goes well with the UNIX methodology. Lots of questions are popping in my head. Will update again as I have more answers.

Thursday, July 10, 2008

This is not easy


Today we had the pleasure to listen to a talk from a reputed personality Dr Deepak Phatak from IIT-Bombay at our company. For those who don't know, Dr Phatak is an honorable name in Computer Science in India and abroad. He is known for his contribution to Databases, Information Systems and Software Engineering. Today's topic was
Innovation. In precise words, Dr Phatak told the basics of living life. Few key things of his talk are summarized here. The things mentioned below are as per my perspective. If I have misinterpreted Dr Phatak somehow, that is totally a fault of mine. Please step up to correct things if so.

1. Always keep learning : Since early schooling, the students are groomed to mug up the study materials and throw it up in the exams. In this way, we kill the creativity in ourselves. Another important point he raised is to ask questions. Even I have observed that most of us are scared to ask a question in public or more over scare to question something that is going on for ages. Dr Phatak strongly recommended to question the orthodox, conventional methods. Even if it is right, that will satisfy your conscious. Statistics does not always interpret a person completely. Each person is unique in its own way. What matters is his/her thinking line and not his statistics.
2. Don't kill the child (curiosity) in you : As a child is always curious about everything, he is eager to know why certain things are the way they are. Dr Phatak suggested that we should keep alive the curiosity in ourselves. This curiosity will help us find answers to questions that we face. And sometimes, it helps us find better answers that can be categorized as Innovation. So we need to make a habit of being curious so that at least we can answer our own questions. in my opinion, every person has a fear of unknown and this curiosity will definitely help in overcoming this fear. Another point Dr Phatak raised is not to compartmentalize knowledge. Most of us, even I do the same, we try to categorize problems as per their domains like automobiles belong to mechanical engineering, operating systems belong to computer science and so forth. This would have a negative impact on our learning approach. Dr Phatak encouraged to treat every thing as same and not categorize them as per their domains. This will help us to learn about everything around us and not only our domains.
3. Think different : According to Dr Phatak, most of the individuals tend to behave like ox-cart bullocks in a sense that they only know to walk a known path. They do not dare to venture on an unknown path. Came a certain problem, this is the way to solve it, this is their approach. Dr Phatak strongly encouraged to align your thinking patterns on different lines. Conventional wisdom is not always right or rather appropriate to solve certain problems. People who go out of the way, those who think different can make things happen. This does not apply only to Computer Science but to each and every field.
4. Passion for work : In order to innovate, one must have passion that is the driving factor for your thing to work. Without passion, there would be no energy, no enthusiasm in your venture. Only passion can keep alive your venture in rough times. In short, without passion, your venture will be like a body with no soul.
5. Persistence and commitment : Started a venture, persistence and commitment are another properties that can lead it to completion. Like they say, starting a venture is very easy but grooming it, maintaining it is very difficult. Hardwork is definitely a part of life but it is only with persistence and commitment, it can survive your venture.

After listening to Dr Phatak's speech, I realized that he has pointed the very basic things that we have conveniently forgotten. As for myself, by this time, I know what my flaws, my weak points are (not all of them) and I am trying to over come them. But it has been very difficult so far. With the points Dr Phatak mentioned, I think I need to go back to basics. If you build a strong foundation, the structure will thrive.

Saturday, May 31, 2008

Deduplication anyone?

De duplication is one of the hot topics in storage world. With tons of vendors offering de dup products and biggies like Netapp offering integrated de dup solutions with their NAS products, the competition is fierce. How does du duplication help when actually the consumer is trying to keep redundant data in order to facilitate disaster recovery?
Effectively de dup is doing opposite of what RAID, replication, snap shots do. This is not exactly how it sounds. The de dup essentially tries to take a whole different approach in order to save disk space on a file system. The granularity of de dup could be a file or a file system block. If done at file level, it would de dup less data since very few times are two files are entirely identical. But blocks could be identical very often and it would definitely save more space. We would discuss block based de dup here.
- De dup calculates a kind of identity signature for each block on the file system and stores it with a data base. Now the blocks containing the same data will generate same signature and can be detected to be a duplicate of an existing block. A cryptographic hash algorithm like MD5 or SHA1 can be used to generate this signature.
- How to store these signatures, that is the layout of database storing these signatures is highly platform dependent. The main requirement from this database is to give a list of blocks generating same signature (having same data), something like a hash bucket storing all elements generating same hash value.
- Another important requirement is that de dup should work while the file system is online. Putting the file system off line is not an option. Hence if write comes on a block for which signature has been stored with the data base needs to regenerate the signature in order to keep up with the latest data. This definitely needs a trap in the IO path but it should be such that it should have a minimal impact on the IO performance.
- Please keep in mind that de dup will only de dup the data blocks and not the meta data. Meta data is duplicated on purpose and should not be touched.
- So for the very first time de dup is started, it will generate signatures for all the data blocks in the file system. Once this pass is finished, we have all the information in the data base. Traversing this data base will give us a list of blocks bearing same data.
- For these blocks, only one copy can be kept while other blocks will be freed and the meta data of freed blocks will point to the one copy.
- One side effect of de duplication is that the next time a write comes on some block, we need to know if this block is sharing data with some other data. If it is, we need to do a copy-on-write here. Basically we allocate new block, write data on the new block and update meta data of the block. This way, the writes might have to bear a read penalty.
- For file systems which have copy-on-write in IO path like WAFL and ZFS, this would be not a problem. Other file systems would have to bear this penalty.
- Getting the data base in core is another problem. Either it needs to be implemented as cache otherwise it will occupy a lot of space. This would be very implementation specific.
Any more thoughts?
Update : Curtis Preston explains this in a very simple manner. Have a look at this - http://www.backupcentral.com/content/view/175/47/

Monday, February 25, 2008

What would you prefer?

Was just thinking about what a person would chose in case he/she wants to chose certain thing? The choices are
- a clean design, sturdy, effective system but with limited functionality,
- a system clogged with features but not that effective (sewn bits n pieces together)
Lets call the first system system1 and next one system2.
The answer clearly depends on a number of factors. The need of consumers is the driving factor. Thus even if most of us would favor the clean design system, in practical life, the system2 might be useful/effective. I would favor the clean design system as well. The ideology is to build a good foundation and use the outputs of this foundation to develop new things. The same way a building is constructed or code is (should be) written. Because such systems are easier to maintain, easy to reuse and sturdy. On the other hand, the system2 is usually not good as a standalone system, neither can it scale very well. It can lead to unused code/resources and clog up your workspace. It can make your system heavy, verbose and might lose the very cause of its own existence.
Even after starting with a clean design, keeping it clean is the toughest job. A focused team of engineers can do the job better than a thousand people be it software or any other field for that matter. In short, keeping integrity of the project while dealing large number of people really becomes hard. It takes hard work, enthusiasm and innovative mind to keep up the integrity of the project. Thats why startups succeed and are able to do things that might take big corporations a lot of time and a lot of money.