Deciding when to forget in the Elephant file system
Interesting paper on an approach to version controlled filesystems mentioned during an interview with Hans Reiser:
"Modern file systems associate the deletion of a file with the immediate release of storage, and file writes with the irrevocable change of file contents. We argue that this behavior is a relic of the past, when disk storage was a scarce resource. Today, large cheap disks make it possible for the file system to protect valuable data from accidental delete or overwrite.
This paper describes the design, implementation, and performance of the Elephant file system, which automatically retains all important versions of user files. Users name previous file versions by combining a traditional pathname with a time when the desired version of a file or directory existed. Storage in Elephant is managed by the system using file-grain user-specified retention policies."
---
Comment from Slashdot interview with Hans Resier:
"I'm going back to school this fall, and in a year I hope to be admitted into a Masters of Computer Science program. I'd like my main research focus to be on filesystems.
I'm preparing by reading everything I can find: I'm working on Tanenbaum & Woodhull's "OS Design & Implementation"; I've read "Design and Implementation of the Second Extended Filesystem"; Steve Pate's "UNIX Filesystems" is waiting on my shelf; and of course, there's the FAQ and ReiserFS v.3 Whitepaper at www.namesys.com [namesys.com]. Specific questions: what branches of math are useful in this line of research? Any books, articles, etc., that I haven't listed that are a 'must read' or 'should read'? Those who have succeeded in building a better filesystem: what have they done that I should also do? Any mistakes I should avoid? Anything that no one told you about filesystems that you wish you had known up front? And are there any special tricks (above and beyond mastering your subject) to getting hired in this field once a degree is in hand?
Hans:
I was never able to get hired in this field, so I am probably not the one to ask about how to get hired.;-) Hmmm. Oh I know one! Don't tell your potential employer that you are working on your own file system nights and weekends, and you will retain all rights to it, and you won't stop work on it once they hire you.;-)
You should probably read about Plan 9, and about namespaces generally. The literature on namespaces seems to be just about hierarchical namespaces, but the notion present in that literature that they should be unified is a good one. I rather liked Gerard Salton's book on automatic text processing. Ted Nelson's Xanadu project was interesting reading, and you'll want to read Codd and Date about databases. Mikhail Gilula's book about set theoretic databases is a good one.
In regards to math, study the design of new mathematical models. Study closure, and its importance to various models ranging from algebra to relational algebra. Understand why mathematical models were designed to have the structure they have rather than learning what those structures are, so that you can learn to construct your own models. I don't know of any courses that teach that, but it is what is important to learn.
Are you sure that it wouldn't be better to hang out in cafes and bookstores for 4 years, and at the end of it write some piece of a filesystem? Cafes, bookstores, and attending random seminars will educate you better, and writing some piece of a filesystem will employ you better."