The Wayback Machine - https://web.archive.org/web/20220519145004/https://github.com/python/cpython/pull/29130
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-45558: shutil.copytree: Allow disabling copystat #29130

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

doronbehar
Copy link

@doronbehar doronbehar commented Oct 21, 2021

Sometimes it is desirable to not copy any permissions and attributes from the source directory. This change allows disable running copystat after a copy.

https://bugs.python.org/issue45558

@the-knights-who-say-ni
Copy link

@the-knights-who-say-ni the-knights-who-say-ni commented Oct 21, 2021

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@doronbehar

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

FFY00
FFY00 approved these changes Oct 25, 2021
Copy link
Member

@FFY00 FFY00 left a comment

The implementation looks alright, but the docs need some fixes.

I am a bit unsure about this feature, but let's wait until a core dev replies to the bpo 😊

Lib/shutil.py Show resolved Hide resolved
@@ -450,7 +450,7 @@ def _ignore_patterns(path, names):
return _ignore_patterns

def _copytree(entries, src, dst, symlinks, ignore, copy_function,
ignore_dangling_symlinks, dirs_exist_ok=False):
copy_stat, ignore_dangling_symlinks, dirs_exist_ok=False):
Copy link
Member

@FFY00 FFY00 Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_stat should be added to the docs too.

Copy link
Author

@doronbehar doronbehar Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_stat should be added to the docs too.

Do you mean to add this to the docs of _copytree? I thought it is a "hidden" function.

Copy link
Member

@FFY00 FFY00 Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, I commented on the wrong location. This should be added to the docs of shutil.copytree, it still does not list the new copy_stat argument in the function definition (should be line 228), it only mentions it in the description.

Copy link
Author

@doronbehar doronbehar Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think understood you correctly.

@doronbehar doronbehar force-pushed the shutil.copytree-copy_stat branch from 1f0f8a3 to d054024 Compare Oct 26, 2021
@github-actions
Copy link

@github-actions github-actions bot commented Nov 26, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Nov 26, 2021
@FFY00
Copy link
Member

@FFY00 FFY00 commented Nov 26, 2021

@jaraco could you have a look? Thanks!

copy_function=copy2, ignore_dangling_symlinks=False, \
dirs_exist_ok=False)
copy_stat=True, copy_function=copy2, \
ignore_dangling_symlinks=False, dirs_exist_ok=False)
Copy link
Member

@jaraco jaraco Nov 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting this new parameter before existing ones will break existing uses if passed positionally. Was this position chosen for a reason?

Copy link
Author

@doronbehar doronbehar Nov 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The position was chosen so it'd be near the copy_function parameter.

Copy link
Member

@jaraco jaraco left a comment

As proposed, this change need some work.

Added complexity

This change adds complexity to the already fairly complex implementation, increasing the mccabe complexity of _copytree from 18 to 20.

Generally, a mccabe complexity of > 10 is undesirable.

This suggests that the implementation is already too complex to take on additional complexity and a refactoring/redesign should be considered before bolting on more complexity.

Interacting parameters

This implementation requires the user to manage the interaction of two parameters, copy_function and copy_stat. That is, if copy_stat is used, the user is also advised to use copy_function=copyfile. It's a bad design if using one parameter implies the need to change another parameter. Instead, the caller should be able to specify each parameter independently. If there are invalid combinations of parameters, those should be explicitly disallowed. As implemented, the docs should read something like:

To disable the copying of stat for files, pass copy_function=copyfile. To disable the copying of stat for symlinks and directories, pass copy_stat=False.

It's not enough to give the user the opportunity to pass the correct arguments, it's the duty of the API designer to make it difficult if not impossible for the user to specify the wrong thing.

Passing boolean flags

I discourage the use of boolean flags to functions. Instead, it's preferable to pass the desired behavior. Similar to how copy_function allows the caller to specify the copy behavior, I'd consider if the caller could instead pass a copystat_function that defaults to copystat. Doing so would help keep _copytree more simple and would mirror the existing customization behavior.

No tests

This new behavior doesn't appear to have any tests. Covering the new behavior with tests is essential, especially the expected behavior of different combinations of interacting parameters.


I don't blame you for the approach you've taken and of course can't hold you accountable for the complexity of the current implementation. At first blush, these changes are reasonable, but they'll also introduce a lasting maintenance burden, so I'd prefer to see something of a redesign to support the necessary use-case.

I'd like to see something where copy_tree accepts a CopyHandler object, a protocol that encapsulates symlink, ignore, copy_function, ignore_dangling_symlinks, and dirs_exist_ok behaviors (and now copystat) and provides a DefaultCopyHandler that implements all of the default behaviors, but would allow a caller to readily override any of those behaviors, and most importantly would help avoid weaving the boolean values through the implementation.

If such a thing existed, the design for customizing copystat could be something like this:

class DefaultCopyHandler:
    def copy_stat(self, src, dst):
        return copystat(src, dst)

class NoStatCopyHandler(DefaultCopyHandler):
    def copy_stat(self, src, dst):
        return

copytree(..., copy_handler=NoStatCopyHandler())

I realize it's probably unfair to ask someone to take a small feature tweak and instead redesign the interface (with all of the compatibility concerns that will introduce), but I hope I've conveyed why the proposed approach is all but nonviable. If you can add tests, it's possibly acceptable, but I'd defer to the owner of shutil or another core dev to approve.

@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Nov 26, 2021

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@arhadthedev
Copy link
Contributor

@arhadthedev arhadthedev commented Nov 26, 2021

@jaraco I believe complexity can be reduced back in the following way:

--- a/Lib/shutil.py
+++ b/Lib/shutil.py
@@ -449,6 +449,16 @@ def _ignore_patterns(path, names):
         return set(ignored_names)
     return _ignore_patterns

+def _is_symlink(entry):
+    is_symlink = entry.is_symlink()
+    if is_symlink and os.name == 'nt':
+        # Special check for directory junctions, which appear as symlinks
+        # but we want to recurse.
+        lstat = entry.stat(follow_symlinks=False)
+        return lstat.st_reparse_tag != stat.IO_REPARSE_TAG_MOUNT_POINT
+    else:
+        return is_symlink
+
 def _copytree(entries, src, dst, symlinks, ignore, copy_function,
               ignore_dangling_symlinks, dirs_exist_ok=False):
     if ignore is not None:
@@ -460,21 +470,12 @@ def _copytree(entries, src, dst, symlinks, ignore, copy_function,
     errors = []
     use_srcentry = copy_function is copy2 or copy_function is copy

-    for srcentry in entries:
-        if srcentry.name in ignored_names:
-            continue
+    for srcentry in filter(entries, lambda x: x.name not in ignored_names):
         srcname = os.path.join(src, srcentry.name)
         dstname = os.path.join(dst, srcentry.name)
         srcobj = srcentry if use_srcentry else srcname
         try:
-            is_symlink = srcentry.is_symlink()
-            if is_symlink and os.name == 'nt':
-                # Special check for directory junctions, which appear as
-                # symlinks but we want to recurse.
-                lstat = srcentry.stat(follow_symlinks=False)
-                if lstat.st_reparse_tag == stat.IO_REPARSE_TAG_MOUNT_POINT:
-                    is_symlink = False
-            if is_symlink:
+            if _is_symlink(srcentry):
                 linkto = os.readlink(srcname)
                 if symlinks:
                     # We can't just leave it to `copy_function` because legacy

In addition os.path.join(src, srcentry.name) may be replaced with srcentry.path as documentation says

/cc: @doronbehar

@github-actions github-actions bot removed the stale label Nov 27, 2021
@doronbehar
Copy link
Author

@doronbehar doronbehar commented Nov 27, 2021

I can merge @arhadthedev 's suggestion and move the copy_stat parameter to the end of the list of parameters, but indeed it'd have been nicer if there was a better solution to this issue.

It's a bad design if using one parameter implies the need to change another parameter. Instead, the caller should be able to specify each parameter independently. If there are invalid combinations of parameters, those should be explicitly disallowed.

Intuitively, before writing this patch, I expected that using copy_function=shutil.copyfile would have done the job, because shutil.copy2 runs copystat unconditionally. But TBH I don't understand why does copytree needs to run copystat no matter what copy_function is used:

cpython/Lib/shutil.py

Lines 507 to 509 in 311910b

try:
copystat(src, dst)
except OSError as why:

And here:

cpython/Lib/shutil.py

Lines 480 to 485 in 311910b

# We can't just leave it to `copy_function` because legacy
# code with a custom `copy_function` may rely on copytree
# doing the right thing.
os.symlink(linkto, dstname)
copystat(srcobj, dstname, follow_symlinks=not symlinks)
else:

I also not sure I understand what scenario that comment describes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants