New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-45558: shutil.copytree: Allow disabling copystat #29130
base: main
Are you sure you want to change the base?
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
The implementation looks alright, but the docs need some fixes.
I am a bit unsure about this feature, but let's wait until a core dev replies to the bpo
Misc/NEWS.d/next/Library/2021-10-23-07-16-33.bpo-45558.jGmQIV.rst
Outdated
Show resolved
Hide resolved
@@ -450,7 +450,7 @@ def _ignore_patterns(path, names): | |||
return _ignore_patterns | |||
|
|||
def _copytree(entries, src, dst, symlinks, ignore, copy_function, | |||
ignore_dangling_symlinks, dirs_exist_ok=False): | |||
copy_stat, ignore_dangling_symlinks, dirs_exist_ok=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy_stat
should be added to the docs too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy_stat
should be added to the docs too.
Do you mean to add this to the docs of _copytree
? I thought it is a "hidden" function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, sorry, I commented on the wrong location. This should be added to the docs of shutil.copytree
, it still does not list the new copy_stat
argument in the function definition (should be line 228), it only mentions it in the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I think understood you correctly.
f73285b
to
1f0f8a3
Compare
1f0f8a3
to
d054024
Compare
This PR is stale because it has been open for 30 days with no activity. |
@jaraco could you have a look? Thanks! |
copy_function=copy2, ignore_dangling_symlinks=False, \ | ||
dirs_exist_ok=False) | ||
copy_stat=True, copy_function=copy2, \ | ||
ignore_dangling_symlinks=False, dirs_exist_ok=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inserting this new parameter before existing ones will break existing uses if passed positionally. Was this position chosen for a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The position was chosen so it'd be near the copy_function
parameter.
As proposed, this change need some work.
Added complexity
This change adds complexity to the already fairly complex implementation, increasing the mccabe complexity of _copytree
from 18 to 20.
Generally, a mccabe complexity of > 10 is undesirable.
This suggests that the implementation is already too complex to take on additional complexity and a refactoring/redesign should be considered before bolting on more complexity.
Interacting parameters
This implementation requires the user to manage the interaction of two parameters, copy_function
and copy_stat
. That is, if copy_stat
is used, the user is also advised to use copy_function=copyfile
. It's a bad design if using one parameter implies the need to change another parameter. Instead, the caller should be able to specify each parameter independently. If there are invalid combinations of parameters, those should be explicitly disallowed. As implemented, the docs should read something like:
To disable the copying of stat for files, pass
copy_function=copyfile
. To disable the copying of stat for symlinks and directories, passcopy_stat=False
.
It's not enough to give the user the opportunity to pass the correct arguments, it's the duty of the API designer to make it difficult if not impossible for the user to specify the wrong thing.
Passing boolean flags
I discourage the use of boolean flags to functions. Instead, it's preferable to pass the desired behavior. Similar to how copy_function
allows the caller to specify the copy behavior, I'd consider if the caller could instead pass a copystat_function
that defaults to copystat
. Doing so would help keep _copytree more simple and would mirror the existing customization behavior.
No tests
This new behavior doesn't appear to have any tests. Covering the new behavior with tests is essential, especially the expected behavior of different combinations of interacting parameters.
I don't blame you for the approach you've taken and of course can't hold you accountable for the complexity of the current implementation. At first blush, these changes are reasonable, but they'll also introduce a lasting maintenance burden, so I'd prefer to see something of a redesign to support the necessary use-case.
I'd like to see something where copy_tree
accepts a CopyHandler
object, a protocol that encapsulates symlink
, ignore
, copy_function
, ignore_dangling_symlinks
, and dirs_exist_ok
behaviors (and now copystat
) and provides a DefaultCopyHandler
that implements all of the default behaviors, but would allow a caller to readily override any of those behaviors, and most importantly would help avoid weaving the boolean values through the implementation.
If such a thing existed, the design for customizing copystat could be something like this:
class DefaultCopyHandler:
def copy_stat(self, src, dst):
return copystat(src, dst)
class NoStatCopyHandler(DefaultCopyHandler):
def copy_stat(self, src, dst):
return
copytree(..., copy_handler=NoStatCopyHandler())
I realize it's probably unfair to ask someone to take a small feature tweak and instead redesign the interface (with all of the compatibility concerns that will introduce), but I hope I've conveyed why the proposed approach is all but nonviable. If you can add tests, it's possibly acceptable, but I'd defer to the owner of shutil or another core dev to approve.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
@jaraco I believe complexity can be reduced back in the following way: --- a/Lib/shutil.py
+++ b/Lib/shutil.py
@@ -449,6 +449,16 @@ def _ignore_patterns(path, names):
return set(ignored_names)
return _ignore_patterns
+def _is_symlink(entry):
+ is_symlink = entry.is_symlink()
+ if is_symlink and os.name == 'nt':
+ # Special check for directory junctions, which appear as symlinks
+ # but we want to recurse.
+ lstat = entry.stat(follow_symlinks=False)
+ return lstat.st_reparse_tag != stat.IO_REPARSE_TAG_MOUNT_POINT
+ else:
+ return is_symlink
+
def _copytree(entries, src, dst, symlinks, ignore, copy_function,
ignore_dangling_symlinks, dirs_exist_ok=False):
if ignore is not None:
@@ -460,21 +470,12 @@ def _copytree(entries, src, dst, symlinks, ignore, copy_function,
errors = []
use_srcentry = copy_function is copy2 or copy_function is copy
- for srcentry in entries:
- if srcentry.name in ignored_names:
- continue
+ for srcentry in filter(entries, lambda x: x.name not in ignored_names):
srcname = os.path.join(src, srcentry.name)
dstname = os.path.join(dst, srcentry.name)
srcobj = srcentry if use_srcentry else srcname
try:
- is_symlink = srcentry.is_symlink()
- if is_symlink and os.name == 'nt':
- # Special check for directory junctions, which appear as
- # symlinks but we want to recurse.
- lstat = srcentry.stat(follow_symlinks=False)
- if lstat.st_reparse_tag == stat.IO_REPARSE_TAG_MOUNT_POINT:
- is_symlink = False
- if is_symlink:
+ if _is_symlink(srcentry):
linkto = os.readlink(srcname)
if symlinks:
# We can't just leave it to `copy_function` because legacy In addition /cc: @doronbehar |
I can merge @arhadthedev 's suggestion and move the
Intuitively, before writing this patch, I expected that using Lines 507 to 509 in 311910b
And here: Lines 480 to 485 in 311910b
I also not sure I understand what scenario that comment describes. |
Sometimes it is desirable to not copy any permissions and attributes from the source directory. This change allows disable running
copystat
after a copy.https://bugs.python.org/issue45558