Introducing the selfstorage software project


Disclaimer: this is an idea, not an actual software.

I am convinced self-hosting needs self-healing distributed storage (such as Ceph) to be resilient. Nobody wants to self-host a Nextcloud instance only to see it disappear because the disk fails. Although a rigorous backup discipline is a good mitigation, it is rare and data loss eventually happens. Centralized services are generally good at doing whatever is necessary to never loose any data, the same is expected from self-hosted services.

It is however very rare for a self-hosted service to rely on self-healing distributed storage. In my opinion because:

  • It is complex to diagnose and fix problems: once installed, the storage cluster will run unattended for months, if not years. A disk or a machine will fail and it will keep running. But eventually a problem will happen: it can be a bug or a human error. The cluster will be in a state where data cannot be accessed or the cluster just stops working altogether. The self-hosted administrator is facing a problem on a software they do not fully understand, they do not have the means to buy support from a company and they may be contemplating data loss because of that. And they will wonder why they did not just rely on backups instead.
  • Data is in a single physical location: even the most advanced self-hosted installations rarely use machines distributed in multiple physical locations. If anything happens at this location, everything is lost and using self-healing distributed storage won’t help.

These are difficult problems to solve but nobody seem to work on them at the moment. So I’m going to give it a try, although there is very little chance of success. I will start by writing a storage system from scratch. It is not the easiest path forward but this is something I’ve wanted to do for years and it’s worth a shot. It is likely that I’ll fail to create anything useful but it will stop haunting me.

The building blocks I’m going to use are:

To be useful the selfstorage needs to be:

  • reliable (not too many bugs and resilient to environmental changes such as disk failure or network failure)
  • efficient (I/O must be the bottleneck, not the CPU)
  • problems must be simpler to understand and fix than Ceph
  • independent selfstorage clusters can share raw storage, i.e. a self-hosted provider can be paired with other trusted self-hosted providers to keep part of the data and be able to recover from a total loss of their machines

I’m not going to work on this full time: it will take a while before it works. It may be shorter if I just give up :wink:

1 Like

You should look into the TiKV project. It leverages the Raft algorithm for consensus, and RocksDB for storage.

1 Like

The TiKV project is very interesting. Amazing how popular it grew in such a short period of time. That certainly gives me pause, thanks for the reference. The only bit that is missing in my checklist seems to be the focus on self-hosting but maybe that can be added.