Monday, August 20, 2007

OpenAFS, Acopia, and Panasas

AFS (as the Andrew File System, IBM AFS, and, now, OpenAFS) has been around for a long time. That longevity has brought it some distinct advantages: the userbase is both broad and deep; the product is also stable. There are lots of competitors to OpenAFS, though, with money to be made in the storage market. Witness the recent acquisition of Acopia by F5 Networks. Another vendor, Panasas, is clearly viewed as a potential good business (e.g., take a look at their Board of Directors -- venture capitalists would not be on the Board if they did not think the company would be profitable).

Those two companies, Acopia and Panasas, represent two different market segments that have historically been in the sweet spot of AFS usage. AFS is still strong in one of those areas, but it has soured a bit in the other.

Acopia's claim-to-fame is virtualization, the ability to keep a namespace constant while changing the back ends around. They also do data migrations. They export via NFS or CIFS, so virtually any modern operating system can access data through their systems. This is very nice. The downside, though, is that Acopia is a hardware solution. Lori MacVittie's neat article about her personal NAS notwithstanding, using Acopia's ARX to provide seamless migration for your failed personal NAS just does not make fiscal sense.

AFS provides the same kind of virtualization, but at a different cost. First, no special hardware is needed. The cost comes in complexity: clients have to run the AFS client. Ports exist to lots of modern operating systems (from AIX to Windows), but installing clients is definitely more expensive than plugging a network-transparent NFS proxy into your network. The other cost is in administration: the ramp-up for AFS is fairly steep. While efforts have been made to help people get started with AFS, there is still a lot of work to be done.

The two key features of AFS that provide this virtualization are the @sys magic, and the separation of filesystems into volumes, with volume metadata managed by database servers. These key pieces let administrators glue together namespaces seamlessly. The stable semantics of volume migration also lets administrators migrate data around a site even while users are accessing that data, letting users stay even further from the underlying details of the storage infrastructure.

Panasas, on the other hand, is a clear winner over AFS in its product niche: high-performance NFS. Like Lustre and several other filesystem products that live in the High-Performance Filesystem niche, Panasas accomplishes this by parallelizing remote filesystem accesses. AFS gets some performance benefits from its caching, but the filesystem accesses are done against a single filesystem. AFS also doesn't really do NFS.

So who is buying Panasas? While l have no knowledge of the sales, I can make some fairly educated guesses: Organizations with large data sets in NFS (or CIFS) that need greater performance. The large research organizations (high-energy physics labs, TeraGRID research groups, etc) might be interested in Panasas (except they already have Lustre, with support via ClusterFS). The most direct competitors to Panasas, then, are the NAS appliance vendors. It is interesting that most of the large research organizations have historically been heavy AFS users as well, and many still are. AFS is widely used for the cross-site sharing of data, but it simply doesn't perform well enough to be competitive with NAS appliances running NFS.

My suggestion, then, to the OpenAFS community is to get serious about helping people get started with AFS, and compete with the Acopias of the world. Also, look into improving AFS performance: as AFS is more complex than NFS, there is likely never to be a performance comparison in favor of AFS; however, parallel filesystem accesses have been around for quite a while, and an implementation of it in AFS could be very interesting.