Distributed FileSystem on Cersat platform

Purposes

At Cersat, we manage and use everyday data from dozens of satellite mission archives. This requires a big storage archive capacity since satellite datasets are nowadays about a few hundred gigabyte to hundreds of terabytes, and even petabyte for new missions.

Before 2011, most of our data archives were stored on a Long Term Archive using tapes library (ie. exabytes to LTO5), keeping on disk only a few percentage of data for effective use. It is still a good solution to store data at reasonable price for safety and long term conservation , but the drawback is that data stored on such facility were rarely used due to slow accesses.

For our new platform storage layer, we wanted a unique large and scalable online storage capacity. Here are a few features required :

on-disk storage at low cost
one unique filesystem, remotely accessible using NFS-like protocols (seen as a single disk on client side)
horizontally scalable for storage from a few terabytes to petascale, simply by adding new storage nodes
horizontally scalable for performances, to allow massive distributed processings
easy to manage
fault tolerant and limited impact for the users (eg. when a storage node fails)
open source filesystem working on linux

Experiments

In 2010-2011, we tried several Distributed File System solutions : Hadoop DFS (HDFS), GlusterFS, Lustre, Ceph, MooseFS, ...

Some of them were discarded since not suitable for our needs :

HDFS : can't break existing software compatibility (impossible to use new APIs to access the filesystem, like for HDFS)
Lustre : too much complicated, we don't have a team to manage the filesystem, no enough fault tolerant

Some of them were experimented in real conditions... not successfully :

GlusterFS : was great in theory and first approaches when all was working well. But nightmare starts when one node fails : complicated to understand, no easy monitoring, hours of pain to keep nominal. In practice, we had maintenance on it every weed. And a few things were also clearly not acceptable : a simple "ls" can take up to one minute, "find" are really really slow, and even worse it can also return an imcomplete listings ! Conclusion was that it may be good to store a few big system images, but not millions of satellite data files.

As a last resort, we started to use a very unknown solution : MooseFS. We installed it (easy), used it in real conditions (successfully), and it was the only filesystem to still be up and 100% nominal after weeks without any maintenance on our side. So, we kept it !

Deployment

Since 2011, we use MooseFS Distributed File System to store and use data from satellite mission archives up to petascale. Even if MooseFS is not perfect, it allowed us to easy scale from 10TB on 2 servers to more than 3PB on a hundred servers. We used it everyday for massive data processing as well as daily use for scientists.

TODO : lien vers architecture et détails sur l'implémentation MooseFS

Purposes

Experiments

Deployment

Comments