Problems downloading large (>1GB) files in Nextcloud

Bonjour,

I’ve experienced transient problems downloading large (1.3GB and 2.9GB) files found in Nextcloud, via the web interface, with the following error in the nginx reverse proxy error logs /var/log/nginx/error.log:

2021/01/04 09:02:05 [error] 28316#28316: *5384 upstream prematurely closed connection while reading upstream, client: 10.63.0.1, server: nextcloud, request: "GET /remote.php/webdav/MVI_6988.MP4?downloadStartSecret=bv26g HTTP/1.1", upstream: "http://127.0.0.1:4300/remote.php/webdav/MVI_6988.MP4?downloadStartSecret=bv26gd", host: "nextcloud"

The Nextcloud client has similar problems.

In the Nextcloud nginx front for php-fpm logs /srv/nextcloud/apache$ docker-compose logs web there are no errors:

2021/01/04 10:12:36 [warn] 30#30: *340 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/3/00/0000000003 while reading upstream, client: 172.19.0.1, server: , request: "GET /remote.php/webdav/MVI_6988.MP4?downloadStartSecret=1owxn HTTP/1.1", upstream: "fastcgi://172.19.0.4:9000", host: "nextcloud"

And I remember seeing 200 and 206 requests correctly handled but with a payload no greater than 1GB.

There are no error messages in the /var/www/html/data/nextcloud.log file.

In an attempt to debug I rebuilt the web image with docker-compose build web and ran docker-compose up -d (after changing one minor parameter). However… I can no longer reproduce the problem: something fixed itself, presumably because of the container being rebuilt but I have no clue if that is the reason.

I found an issue that seems related in the Nextcloud forums but I’m not convinced this helps because the nginx documentation does not suggest this is such a good idea. And I doubt communications were interrupted during more than one minute.

To be continued!

Most downloads of a 2.9GB file fail but not all. There is no error log: the reason why they sometime abort is unclear.

  • abort at 1095447160
  • abort at 1094397560
  • complete with 3167423309
  • abort at 1096768432

The problem cannot be repeated (downloading the same 2.9GB file from the web is consistently successful) on another Nextcloud installation with the exact same configuration (deployed from enough 2.1.14). The Nextcloud that sometimes fails has more disk and more RAM, which should not be an issue, on the contrary. It has a MySQL backend instead of a postgresql backend which should not be an issue either.

The machine was rebooted (in case the problem was a side effect of a leftover of some kind) and the download still fails most of the time around the first 1GB.

The failure pattern is changing: it was downloaded successfully four times in a row from the browser. Maybe it’s a transient bug or a race condition of some kind and not a configuration problem. I wonder if disabling the buffering mode in the web container would improve the situation.

The problem is most likely an excessive RAM usage by the nginx reverse proxy in front of Nextcloud (note that in the case of Nextcloud the reverse proxy has enough_nginx_cache: false meaning proxy_cache is not active):

28515 www-data  20   0 1206056   1.1g    696 D   0.5  57.2   2:50.77 nginx: worker process                                                          
25269 systemd+  20   0  244424 238812     28 D   0.0  12.0   1:47.27 nginx: worker process    

load

It does not trigger OOM Killer but it is very likely that it exceeds a threshold that causes the nginx worker to restart. The steps to reproduce seem to simply be: “download a 3GB+ file from Nextcloud until nginx RSS grows over 1GB”.

It is confirmed and addressed here.

This is helpful and it looks like the problem is fixed for downloading from the web interface. But the Nextcloud client (2.6.2 on Ubuntu 20.04) still has trouble synchronizing this file.