Saturday, December 22, 2007

How about a cluster File System???

Was just speculating about what exactly would I need, if i have to write a cluster file system? First, let us throw some light on the requirements. The file system needs to provide reliable storage, should support multiple read writes simultaneously, good performance and one of the most important fault tolerant.
A cluster friendly file system basically needs a fine grained and efficient Distributed Lock Manager, transport level protocols that support range locking, namely NFSv4 and CIFS and moreover it needs a cluster protocol to manage operations across nodes. Usually this cluster protocol runs over high speed networks like Infiniband, Gigabit ethernet (still slower), fiber channel so as to enhance the throughput.
A cluster is built for the purpose of high performance computing. So if the IO throughput is not good, it serves no purpose to build a cluster. How I see a cluster is a collection of bunch of machines working together under certain code to gain higher throughputs and thus reflecting one entity as a whole. So this would fit perfectly to a collection of commodity computers. I am not talking about custom build rack mounted clusters.
Basically IO throughput is enhanced using striping the data across all nodes in the cluster. This way we can do a parallel read/writes and read aheads. Even if this increases the amount of meta data that needs to be maintained for a file, it increases the throughput on a larger scale. Most of the existing clusters including the google file system, luster implement same techniques.
A good light weight Distributed Lock Manager would help in minimizing the locking periods across files. Major files operations lock the parent directory. So fine grained locking would help to keep contention at the lowest possible level.
But making the cluster fault tolerant is one heck of a task. Say, while watching a movie over the cluster, if the next frame is unavailable, the movie player will halt waiting for the frame. The catch is to get the data within limits of application time out. If the backend is built with RAID, the application has to wait until RAID rebuild is complete. And this time is too big and the application will time out for sure. So how to solve this problem? There is no definite answer here. Only thing we can possibly do is to take a top down approach and build a framework to support data losses. These strategies include RAID (for data regeneration in case of losses), CRCs to detect corruptions etc. Still not fool proof :(
Managing data losses/outages is a tricky question and not completely answered. If the cluster is serving a data intensive application with utter need of uptime, probably keeping a copy of data would serve the purpose. This is a hell lot space inefficient but would save you time for sure. The other option is going the RAID way. RAID is built to be space efficient but the RAID rebuilds are really inefficient in practical life.

Monday, October 29, 2007

Virtualization wave???

Server virtualization is on the rise. Virtual Machines (VM) does hold their ground when compared to high performing hardware. Just add some more main memory and some disk space to your existing machine and you are able to run a new instance of another operating system. How about running vmware on a big massive hardware like Sun Niagara T2 or a rack of Intel Quad-core CPUs?
How about an organization that wants to scale up the operations and needs bigger hardware to support it? It would definitely go for a combination of both, buying fewer racks of hardware than needed and may be running VMs on it. Suppose it buys few big big n-core CPUs with support for virtualization. Still fulfilling all your needs from one physical machine gives rise to single point of failure. May be spreading your VMs logically over a farm of servers would help. It eliminates single point of failure, still availing all benefits of a VM. The biggest benefits I see are power savings, back up, security and seamless upgrades. VM running from one machine would not consume power beyond a limit. The host images could be backed up regularly. By nature, VMs are secure and would not spread the defect to the host if contaminated by worms/virus. And the most important of it, upgrades!!!!!
Upgrades seem to be essential part of IT operations of an organization. To keep up with the pace of ongoing progress of hardware, an organization would buy new hardware every few years. For hardware whose support has been discontinued, it is really hard to find a replacement. And data needs to be migrated manually which is a cumbersome process. Here a VM wins hands on with a hardware machine. Moreover, I think if multiple VMs are running same image of operating system, it is quite possible to share disk space from a nas exported file system. Essentially, force all VMs running the same operating system to mount a nas exported file system and share the same image with all other VMs. I suppose, a file system with a good distributed lock manager and support for snapshots would be able to support such circumstances. Just imagine how much space we would be saving!! Moreover, this is somewhat like deduplication (in essence, not by implementation), where the common data is shared while individual changed data is there and kept track with the help of file system snapshoting mechanism. What we have here is essentially a lethal combination!!! :)
There is in-built NFS support from vmware but I don't know what other features do they provide along with. I guess, both Vmware and Xen supports most of these facilities. Vmware owned by EMC recently released 10% of its shares and it almost doubled within few hours. Xen was also acquired by Citrix. I am wondering about the upcoming trends in virtualization!!!
Update : Intel recently announced their own virtualization initiative based on Xen. And Vmware stock dropped on the news while Oracle's rose!!!

Thursday, August 16, 2007

Sun Niagara T2

Sun recently launched a new chip, named Niagara UltraSparc T2. T2 posses tremendous processing power packed in. It has 8 cores on die, each capable of holding 8 threads at a time. There is a dedicated FPU per core. Some features worth mentioning are - bigger L2 cache, on die 10Gbps NIC, virtualization support, on die PCI Express lanes, a crypto unit per core and 4 memory controllers. Since each core can hold 8 threads simultaneously, where other CPUs would make a context switch, T2 would save time. Those features make T2 a pretty obvious target for computing power hungry applications.
Sun has been pushing this chip to market as a commodity so that it can be used for general purposes like - big data centers, financial services (will benefit from the on-die crypto unit), web services, big telcos and proprietary solutions. It is quite evident that Sun MicroElecronics division has traded higher CPU cycles for low power consumption. T2 runs on a rather low CPU frequency of 1.4 GHz where the competition is running processors worth 3-4GHz fast. Clearly, Sun has pushing parallel computing as much as possible since this approach consumes low power by having a low frequency CPU while using all possible benefits of parallel computing.
As of now, this approach seems to be a winner. T2 is safely ahead of competition in terms of throughput, power consumption and efficiency. And it is currently leading the SPEC benchmarks. Interesting thing is to see if Sun can keep this lead. Intel n AMD are in the process of rolling out their quad-core CPUs to market shortly. Also running a low frequency CPU, it would be interesting to see single thread performance on T2.
Sun has to really push this chip so as to pursue customers to adopt this Sparc chip instead of their usual x86 setups.
Sun is planning to introduce the next processor named Rock in next year. It would be an excitement to see what improvements does Sun bring into the new processor over T2. Sun also claims Solaris has been successfully run on T2. Using Solaris along with ZFS on T2 would be a killer combination as of now. Eagerly waiting for the first official Sun launch of a storage box based on T2.
P.S. A new MIT startup has announced its new chip Tilera, a 64 core processor with features same as US T2, likewise, on chip DIMM, gbps NICs, PCIe lanes etc. It provides a mesh of cores, which serves as an alternative for high speed bus interconnect. The clock cycles are within the range 600MHz - 900MHz. It usually takes one cycle to move data from one core to another. Would be nice to see a comparision benchmark of Tilera against US T2.

Saturday, May 26, 2007

Storage Upcoming!!!

With ideas like thin provisioning taking shape, storage market is bumping with new ideas. And now comes pNFS - parallel NFS which enables multiple NFS servers to contribute to become a single server and share a clustered file system.
NAS though cheap has a fundamental limit in scaling and performance. Here the pNFS chips in. It can reuse your commodity servers to form a cluster out of them and can really scale linearly. It almost works like a multi path file system where a regular NFS server is replaced by a NFS4.1 Meta Data server(MDS). The MDS is moved out of band and now clients can directly talk to the array behind.
The best part is - this effort has been standardized through IETF and on its way to become a part of NFS4.1 draft. So storage vendors know which direction to drive their efforts to. As of now only Panasas provides a pNFS solution and chances are others will follow. It would be really interesting to see next few months on the market :)
Sun is likely to come up with the next implementation. For some good tutorials about pNFS, watch this space .... http://opensolaris.org/os/project/nfsv41/pnfsdemos/basics
An html version of the NFS 4.1 draft is available here
http://www.nfsv4-editor.org/draft-10/draft-ietf-nfsv4-minorversion1-10.html

Tuesday, May 8, 2007

You conscious?

I strictly believe, one should not mix emotions with his/her profession. At least a developer should not. One should be only professional about his/her work and not sentimental. Then only you can enjoy perks of both.
But there is this certain sense of selfness which often stumbles upon while working, can be called as your conscious. When you are working with an upset mind, you are very much aware of the fact that it is you who is working on something which makes you nervous and you are not able to give out your best. You may take hours to complete things that would have been done in minutes otherwise. Basically you mess up anything you do.
Its quite obvious to be nervous if your boss yells at you or passes signals that he/she is not happy with you. It will definitely affect your work.
In my case, if I am upset, i take a break and have food :). Just spend few minutes not thinking about the thing just happened and get back to work.
I think only experience can fix those things. If you are getting conscious, work more often, you will lose it eventually.

Bored???

Another software guy got bored of his job? Doesn't know how to spend time? Or the worst part, are you on bench? Don't worry, we have got a good news for you folks:). A recent Slashdot post tells about ideas that were born of the spare time. Web 2.0, Ajax, Linux kernel development to mention a few.
Check this out - http://www.longtail.com/the_long_tail/2007/05/the_awesome_pow.html
So if you are suffering from the same problems, cheer up and try to work on a spare-time project. Chances are you would be able to do better whatever you do, software:) No harm. Plus you can gain a little recognition in the web world.
Believe me, I am trying to work on a spare time project myself. Its hard for sure :). But I won't quit. Let me know if you complete yours :)

System layering

Recently Andrew Morton, a lead Linux kernel developer called ZFS as layering violation. And you can see a fantastic reply from Jeff Bonwick, the lead of ZFS project here -
http://blogs.sun.com/bonwick/entry/rampant_layering_violation
IMHO, even if ZFS violates the conventional layering stack, it will thrive as long as it is performing better than other file systems. ZFS is the future of file systems and has introduced many path breaking features which no other file systems can claim as of now. Few mentions would be detecting silent data corruption, numerous snapshots, scalable infrastructure, in short end to end data integrity which is what matters at the end of the day.
I might sound like anoother ZFS fan but the truth is out there. ext2/ext3 apart from doing what they are supposed to do are prone to silent corruptions and are doing basically what FFS was doing few years ago. Since ZFS guys were brave enough to throw away years old conventions, they have been able to aggregate the control to only one entity, the file system which can manage from raid groups to backup to clones.
I personally believe Linux is the best thing happened with the computers in last few years. But ZFS took over almost everyone in File Systems development. I think both OpenSolaris and Linux would thrive and Linux might adapt the new features offered by ZFS.

Thursday, May 3, 2007

Hello world...

Hi!!! My name is Anand Vidwansa. This is my first post for the world. A typical beginning would be to tell about myself, my hobbies, etc etc. And I am not gonna be a path breaker here :)
I am just one of the software guys you find in bulk these days. My interests vary from Music, bikes, hanging out and finally to the damn computers. I am no different than any other software guy out there. Still I want to tell you about myself. After all, I am a human being, looking out for people with similar interests and I guess computers would be the most common one.
So you would be seeing a lot of crap here about Music, philosophy, computers blah blah blah :)
My computer intestest typically fall within Storage, Systems, *nix kernels. I have been working in the industry for past 2 and half years. I have primarily worked with File Systems, VFS, physical file system, Quotas, Archiving, NFS, CIFS and LVM.