Thursday, February 26, 2009

New MRC based on BabuDB

Since the beginning of this year, we have been working on a new MRC implementation. We came to the conclusion to completely rewrite the MRC for the following reasons:

  • The first MRC implementation relies on an in-memory database based on Java Tree Maps, with checkpoints being created by writing the Tree Maps to disk with Java's built-in serialization support. This approach has severe limitations with respect to the size of a volume database. If volumes grow to a size of more than a couple 100k files and directories, the former MRC will run out of memory and crash when trying to create a database checkpoint.
  • There is quite some potential to improve the first MRC's performance. Index structures used to arrange metadata are not optimal; in the general case, several lookups are needed to retrieve metadata of a single file, and retrieving directory contents requires one lookup per directory entry. A better arrangement of indices can greatly boost the speed of metadata lookups.
  • There is no support for consistent metadata snapshots at runtime with the first MRC. In order to create a snapshot of the MRC database, which is e.g. done when checkpointing the database, the first implementation blocks all incoming requests until the snapshot creation process has finished, which renders the system unusable during this time. The ability to create snapshots without interrupting system operability will later become a requirement to create consistent file system snapshots and backups at runtime.

We have now completed the re-write, with a database backend based on BabuDB. The new backend exhibits a higher performance and support for larger databases than the previous backend. Besides, the MRC's architecture has been completely revised, and most of the code has been rewritten from scratch.