The Wayback Machine - https://web.archive.org/web/20220530091440/https://github.com/rclone/rclone/pull/6025
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: overlap check is now filter-sensitive #6025

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Schlossgeist
Copy link

@Schlossgeist Schlossgeist commented Mar 6, 2022

What is the purpose of this change?

Previously, the overlap check was based on simple prefix checks of the source and destination paths. Now it actually checks whether the destination is excluded via any filter rule or a "--exclude-if-present"-file.

Was the change discussed in an issue or in the forum before?

Forum Post

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

Previously, the overlap check was based on simple prefix checks of the source and destination paths. Now it actually checks whether the destination is excluded via any filter rule or a "--exclude-if-present"-file.
@Schlossgeist
Copy link
Author

@Schlossgeist Schlossgeist commented May 11, 2022

I've noticed some initially unwanted (but maybe reasonable) behavior. Because of the way I implemented the recursive checks, the sync command is only allowed to be executed when the destination directory already exists (in contrast to sync commands under normal circumstances where the destination directory is created automatically).
Although I think that the ability to sync to subdirectories which are exculded from the sync is inherently useful, I also think that this is kind of an edge-case usecase which should required some additional action from the user (i. e. creating the destination directory). It also gives beginner users the impression that it is generally not ok to try and sync or move directories in this way.

What are your thoughts on this? @ncw

@ncw
Copy link
Member

@ncw ncw commented May 12, 2022

I've noticed some initially unwanted (but maybe reasonable) behavior. Because of the way I implemented the recursive checks, the sync command is only allowed to be executed when the destination directory already exists

This is because you are listing the directory I think.

It should be possible to do these checks without actually calling List shouldn't it? I'm concerned about calling List since it is potentially an expensive operation (think a bucket with 1 million entries) so we don't want to do it twice.

(in contrast to sync commands under normal circumstances where the destination directory is created automatically). Although I think that the ability to sync to subdirectories which are exculded from the sync is inherently useful, I also think that this is kind of an edge-case usecase which should required some additional action from the user (i. e. creating the destination directory). It also gives beginner users the impression that it is generally not ok to try and sync or move directories in this way.

I'm not sure I understood the problem there! Can you give some examples please?

I will give the code a quick first review now.

Copy link
Member

@ncw ncw left a comment

A quick initial review :-)

Thank you

@@ -796,6 +796,53 @@ func Overlapping(fdst, fsrc fs.Info) bool {
return strings.HasPrefix(fdstRoot, fsrcRoot) || strings.HasPrefix(fsrcRoot, fdstRoot)
}

// OverlappingFilterCheck returns true if fdst and fsrc point to the same
// underlying Fs and they overlap without fdst being excluded by any filter rule.
func OverlappingFilterCheck(ctx context.Context, fdst fs.Fs, fsrc fs.Fs) bool {
Copy link
Member

@ncw ncw May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs testing in operations_test.go

}

// FilterCheckR checks if fdst would be included in the sync
func FilterCheckR(ctx context.Context, fdstRelative string, inputPath string, fsrc fs.Fs) bool {
Copy link
Member

@ncw ncw May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs testing in operations_test.go also

include := true
fi := filter.GetConfig(ctx)
includeDirectory := fi.IncludeDirectory(ctx, fsrc)
entries, _ := fsrc.List(ctx, inputPath)
Copy link
Member

@ncw ncw May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really don't like calling List here. We should be able to do this just by examining the filter and paths shouldn't we?


subRemoteName := r.FremoteName + "/rclone-sync-test"
FremoteSync, err := fs.NewFs(ctx, subRemoteName)
FremoteSync.Mkdir(ctx, "")
Copy link
Member

@ncw ncw May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error for this needs to be checked.

require.NoError(t, FremoteSync.Mkdir(ctx, ""))

is fine

@Schlossgeist
Copy link
Author

@Schlossgeist Schlossgeist commented May 16, 2022

After the rewrite I essentially got the automatic directory creation for free, so I just decided to keep it to be consistent with the standard sync behavior. If you're satisfied with the implementation, I can continue to look into how to properly write the tests for operations_test.go.

Copy link
Member

@ncw ncw left a comment

I think this looks much better now.

I don't think you need to call Mkdir either.

Can you write a table driven test for FilterCheckR testing all the corner cases - it is quite complicated that function and I'd like to be sure it is correct!

A test for OverlappingFilterCheck would be nice too.

Thank you :-)

fdstRoot := fixRoot(fdst)
fsrcRoot := fixRoot(fsrc)
if strings.HasPrefix(fdstRoot, fsrcRoot) {
fdstRelative := strings.ReplaceAll(fdstRoot, fsrcRoot, "")
Copy link
Member

@ncw ncw May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better written something like this (ReplaceAll could replace stuff it isn't supposed to).

fdstRelative := fdstRoot[len(fsrcRoot):]

}

if !include {
err := fsrc.Mkdir(ctx, fdstRelative)
Copy link
Member

@ncw ncw May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be necessary to run Mkdir here - the sync will do it for you.

for i := 0; i <= pos; i++ {
newPath += dirs[i]
}
dir := fs.NewDir(newPath, time.Now())
Copy link
Member

@ncw ncw May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this dir gets used anywhere does it? How about currentPath := newPath?

@Schlossgeist
Copy link
Author

@Schlossgeist Schlossgeist commented May 18, 2022

I used the existing TestOverlapping for inspiration. I only wrote a test for OverlappingFilterCheck though. Can you explain why there should be a separate test for FilterCheckR? This function is only called from OverlappingFilterCheck with almost the same arguments every time. It feels a bit redundant to test the same function in sync_test.go and twice in operations_test.go.

@ncw
Copy link
Member

@ncw ncw commented May 19, 2022

I used the existing TestOverlapping for inspiration. I only wrote a test for OverlappingFilterCheck though. Can you explain why there should be a separate test for FilterCheckR? This function is only called from OverlappingFilterCheck with almost the same arguments every time. It feels a bit redundant to test the same function in sync_test.go and twice in operations_test.go.

It might be that FilterCheckR shouldn't be an exported function, if so rename it filterCheckR and then you don't need to make tests for it :-)

@Schlossgeist
Copy link
Author

@Schlossgeist Schlossgeist commented May 19, 2022

I renamed the function because it should not be used outside the package. As you might have already guessed, I am new to programming with go and didn't know about this concept. Thank you for your time and patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants