New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat webdav nextcloud chunked v2 #6133
base: master
Are you sure you want to change the base?
Feat webdav nextcloud chunked v2 #6133
Conversation
All chunked uploads are stuck to 100% now that I've fixed the retry mechanisms. I'll figure out the issue here. |
Hitting an issue when copying a file to my nextcloud with this branch. Logs are pasted below.
The issue seems to be that If these comments are not the correct place for these comments, definitely let me know. Thanks for your work on this feature. It'll really help me out! |
@alkenerly This is the right place. Thanks a lot for testing, I'll look into it. |
@alkenerly Can you retry with the latest changes? |
It sometimes crashes here, not sure why (chunk size: Note that both
|
Running some long running transfers now. I am moving approx 4GB files using a 20mbit connection. So not only are the files large, my transfers will also stress the timeouts of both servers. Tentatively: it works! with an exception Sometimes, the transfer works exactly how you would expect it to! Other times, it gets a 423 Locked when finalizing chunks That error repeats itself 10 times in quick succession (with the pacer rate limit never causing sleep longer than 2s). After 10 of those errors it cancels the chunked upload. However, when it attempts to restart the upload, it seems to figure out that the file was indeed uploaded correctly. It would appear that rclone successfully uploaded the file and nextcloud successfully assembled the chunks. Interestingly, the first low level retry is not triggered by a 423 file locked exception, but by a gateway 504 timeout.
I bet the first request to get the chunks to merge was successfully received, but for whatever reason I get a 504. That causes the subsequent requests to report that the files are locked (which makes sense because nextcloud is performing the merge operation on them). Maybe there isn't a practical way around this except for each user hunting down every kind of timeout that exists on their webserver / reverse proxies etc. If only there were a "hey nextcloud, are you currently merging chunks from this directory right now?" API... Do you have any ideas? |
@alkenerly Could you open an issue on NextCloud's repo? It may be possible to do a workaround but that would be super brittle and/or slow. |
@devnoname120 Just so I can clarify the issue before opening an issue over there: I would think that a 504 is returned when there is a request that takes a long time to respond to. The request that is taking so long in this instance is probably the HTTP MOVE command that assembles the chunks. There would not be any other long outstanding requests other than that, MOVE, right? It is weird because that MOVE command probably takes way less outstanding time than uploading one chunk's worth of data at my slow data rate. |
I may jump in here right away as I seen your development happening as a Nextcloud dev The gateway timeout is usually something that can only be solved by the server configuration where the frontend webserver or load balancer has a too short timeout set for requests. On the PHP side those requests do not get cancelled until the PHP maximum execution time is reached (which is 60 minutes by default with Nextcloud: https://github.com/nextcloud/server/blob/f110ff9f4d54c4124483f3ba0e6c681362b79a50/lib/base.php#L624) From the above description it seems like the following is happening:
The general recommendation would be to adjust the server configuration to increase the timeout. |
The PUT requests are a bit different here as there is data transferred during the time, so frontend Webservers usually keep the connection open in that case, while for the MOVE the connection is basically idle between client (rclone) and webserver as it just waits for the PHP operation to finish and return a HTTP response. |
Thanks for jumping in! I am uploading another set of chunks now but am pausing rclone's execution before it issues the MOVE command. My plan is to issue the last MOVE command myself probably using curl so that I can get a rough time until I get the 504. I am running my nextcloud behind an nginx reverse proxy. So it will be a little scavenger hunt to determine which timeout I need to increase in order to get around this issue. All that being said, I haven't personally found an issue with this branch (because this issue I have is a server configuration issue). Great work @devnoname120 |
@juliushaertl Would there be a way to check the state of the Some workarounds I can do on my side:
|
I have increased the relevant timeouts in my environment to prevent getting 504s on chunk assembly. I'll leave this branch transferring files of various sizes for a bit and will report any issues I hit. Thanks again for the work you put in |
Transferred a mix of chunkable and non chunkable files over the last few days and have had no issues |
Thanks a lot! Since then I implemented parallel chunk upload (see https://github.com/devnoname120/rclone/tree/feat-webdav-nextcloud-parallel-chunk-upload). There are two issues remaining:
So it will be for a next pull request. I think that this pull request is functional enough that it makes sense to keep it as is. I'll add some tests and clean up the code a bit, then I'll ask for reviews from the maintainers. |
@alkenerly Can you confirm that it still works with my latest changes? There should be no difference in behavior, I just made the code cleaner. |
There may always be scenarios where file locks can stay around for longer, e.g. if the php process gets killed for some reason. However those should be cleaned up by a background job (if the database is used as a locking backend) or by its set TTL if redis is used as a locking backend. The TTL has a default of 1 hour, but is also configurable on the Nextcloud config.php ( However I agree that it is not the ideal way to monitor if a move is still pending.
Currently there is not really such a thing as a dedicated status check of the operation. During the move operation you would as mentioned receive a 409 locked status code, once it is finished the temporary upload folder will get removed and then you should get a 404 as the source folder for the move is no longer found. |
@juliushaertl Thank you for your reply. It would be great if this feature could be added to NextCloud. |
@ncw Could you do a first review pass on this pull request? I'll then add tests. Thanks! |
I think this is looking nice :-)
I put a comment inline.
Looking at the commits, this needs a lot of tidying up!
I think we should have one commit each for the preliminary work, ie
- lib/rest
- fstests
I don't think cmd/cmount
should be here.
I'm not sure how many commits the backend/webdav
code should be.
I know its been through a few different authors, so if you squash commits with different authors then add a Co-authored-by: Name1 Name2 <[email protected]>
line at the end of the commit after a blank line - that will keep the authorship straight.
I don't know how experienced a git user you are, so if you'd like me to do the squashing I can.
I'll run the CI now too and we can see if that shows up anything else.
Thank you :-)
1a68409
to
96e58d9
Compare
96e58d9
to
2783ac1
Compare
@ncw Thank you for your comments. I addressed the changes + a lint formatting issue. |
Co-authored-by: Thibault Coupin <[email protected]> Co-authored-by: Nick Craig-Wood <[email protected]>
2783ac1
to
6b0c8ac
Compare
Done! Are you ready for another review? |
Yes! My main concern is to test low-level HTTP/2 retries. I looked into how to make a mock so as to simulate a Would you mind taking over this part? Thank you. |
Hey, i would love to see this feature upstream as right now i have some bad issues when trying to upload to nextcloud files larger than 50mb! |
Supersedes #4865.
Fixes #3666.
Follow-up PR with parallel chunks upload: #6179
Compared to the previous PR:
master
& solve conflicts.rclone
is closed abruptly.MOVE
operation without erroring out with404
.chunk_size
config tonextcloud_chunk_size
and set it to NextCloud's default:10 MiB
.application/x-www-form-urlencoded
in order to follow the recommandations of NextCloud.Known limitations:
readers.NewRepeatableLimitReaderBuffer()
. This creates a buffer of chunk size, so it may use lots of memory if chunk size is set to a big value in the remote config. Maybe we could obtain a seekable reader from the upper-level abstraction instead of creating our own wrapper.423 Locked
. Need to figure out why this happens and if there is a workaround.UNLOCK
method may work to force the unlock.TODO:
504 Gateway Time-out
then423 Locked
).[ ][In a follow-up PR because this is independent from these changes] Check the hash of the uploaded file to make sure it was uploaded properly?[ ][In as follow-up PR, see https://github.com/devnoname120/rclone/tree/feat-webdav-nextcloud-parallel-chunk-upload] Upload chunks in parallel to boost the speed of sending big files with small chunks.rclone/backend/b2/b2.go
Lines 1214 to 1233 in 1d6d41f
rclone/backend/box/upload.go
Lines 232 to 249 in 1d6d41f
io.Reader
instead of aio.ReadSeeker
. So we need to fetch chunks data in order, and only then start the parallel jobs. Since theio.Reader
may be from a remote, fetching the chunks in order and storing them temporarily would lower the speed (can't download the chunks in parallel). I'm not sure if there is a good reason rclone doesn't provide anio.ReadSeeker
. Maybe it's a just matter of modifying rclone's core code to make it provide anio.ReadSeeker
without major modifications.[ ][In a follow-up Pr] Implementfs.CleanUp()
to purge all the temporary upload folders.What is the purpose of this change?
Enable chunked uploads for Nextcloud (WebDav).
See https://docs.nextcloud.com/server/latest/developer_manual/client_apis/WebDAV/chunking.html
Was the change discussed in an issue or in the forum before?
#3666
#4865
Checklist