Check if input_file is directory and then add all files within that directory #336
Conversation
Adding links as extra context/breadcrumb:
(Unfortunately these links will decay as the Slack workspace is on a free Slack plan with limited message history). |
# it will be passed in as an input file and we need to | ||
# ignore. | ||
if os.path.isdir(real_filename): | ||
return | ||
arcname = arcname_from(package_filename) | ||
|
||
self._zipfile.write(real_filename, arcname=arcname) |
thundergolfer
Jul 10, 2020
Collaborator
Double checking that we actually don't want to write the directory as a file? That seems right.
Double checking that we actually don't want to write the directory as a file? That seems right.
AutomatedTester
Jul 10, 2020
Author
Contributor
I don't think we do as we would likely be creating the directory in another test with files in it. I am not sure people would want an empty directory placed into the wheel but have been suprised before by what people do.
I don't think we do as we would likely be creating the directory in another test with files in it. I am not sure people would want an empty directory placed into the wheel but have been suprised before by what people do.
# it will be passed in as an input file and we need to | ||
# ignore. | ||
if os.path.isdir(real_filename): | ||
return |
alexeagle
Jul 10, 2020
Collaborator
shouldn't there be a TODO here to actually copy in the directory contents? this PR is just a temp workaround right?
"we need to ignore" seems surprising from a user perspective
shouldn't there be a TODO here to actually copy in the directory contents? this PR is just a temp workaround right?
"we need to ignore" seems surprising from a user perspective
AutomatedTester
Oct 27, 2020
Author
Contributor
I can't see a case where people will want an empty directory as part of their build. They may have another step that populates it but that case is already taken care of
I can't see a case where people will want an empty directory as part of their build. They may have another step that populates it but that case is already taken care of
I can copy it but what happens when we already have copied the contents into that directory? Do we want to overwrite the contents?
We should probably print a warning that a directory has been passed in from a previous point
…On Jul 10, 2020, 3:13 PM +0100, Alex Eagle ***@***.***>, wrote:
@alexeagle commented on this pull request.
In experimental/rules_python/wheelmaker.py:
> @@ -101,6 +101,11 @@ def arcname_from(name):
return normalized_arcname
+ # If anywhere in the dependency tree a directory is set
+ # it will be passed in as an input file and we need to
+ # ignore.
+ if os.path.isdir(real_filename):
+ return
shouldn't there be a TODO here to actually copy in the directory contents? this PR is just a temp workaround right?
"we need to ignore" seems surprising from a user perspective
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I don't think there should be a collision because the bazel-out folder doesn't have one |
The reason I hit this situation is I have a When we get to the |
Another option would be to sort the input files (so directory would always
be before files in it) and use isdir to check for directories. This way
empty directories could be added to a wheel too.
pt., 10 lip 2020 o 17:50 David Burns <[email protected]> napisał(a):
… The reason I hit this situation is I have a py_library that has a step
that generates code and is copied in it's data attribute.
When we get to the py_wheel stage, according to --sandbox_debug the files
in the directory have already been added as --input_file and then near
the bottom of the list there is a
--input_file=OUTS/dir/from/another/target. Since this is after the files,
in my case, it feels like I would overwrite them. If you remove this patch
from the WORKSPACE and then run
https://github.com/SeleniumHQ/selenium/blob/trunk/py/BUILD.bazel#L69-L87
you will be able to recreate my issue.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#336 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKU4J4LZSA6ZUGNK4DPJJCDR242D7ANCNFSM4OV7RAKQ>
.
--
Paweł Stradomski
|
acb10ad
to
aee2615
@alexeagle or @thundergolfer could you have a look and see what is left for me to do here? |
From a user's perspective, silently dropping directories seems like a pretty big footgun; either it should add support for directories or the action should fail completely with a descriptive error. Also, a directory is a perfectly valid output from a rule using
Alternatively, you can pass the directory to the action script and unpack the directory yourself, as was previously suggested. Technically a path collision shouldn't be possible, but perhaps with the addition of prefix stripping/prepending it could happen: perhaps that case is rare enough that we just bail out with an error at that point. |
@aaliddell Could you give me an example rule for this? As mentioned before I am not sure why someone would want an empty directory in a wheel. It's always going to have some file in the directory likely so it will be created anyway. If we are wanting to accept empty directories, should we be adding a new argument to the wheelmaker because this issue was because a directory was being passed to |
I don't think we strictly need a different flag, we can just stat and treat
differently if it's a directory.
I'm not 100% sure about the wheel requirements for directories, in
particular whether it should have an entry in RECORD (what's the checksum
of a directory)?
It might be worth asking on distutils-sig.
czw., 29 paź 2020 o 10:56 David Burns <[email protected]> napisał(a):
… @aaliddell <https://github.com/aaliddell> Could you give me an example
rule for this?
As mentioned before I am not sure why someone would want an empty
directory in a wheel. It's always going to have some file in the directory
likely so it will be created anyway.
If we are wanting to accept empty directories, should we be adding a new
argument to the wheelmaker because this issue was because a directory was
being passed to input_file. We could add a new one called input and
input_list
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#336 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKU4J4IGRE24QUXGMIBLJ6DSNE34DANCNFSM4OV7RAKQ>
.
--
Paweł Stradomski
|
I agree that a completely empty directory may be valid to skip within a wheel; it's the dropping all directories regardless of contents that's my concern. For an example of where this sort of thing comes up, I often run into this when dealing with gRPC protoc plugins, where the output of a plugin is non-deterministic from file names, which requires capturing the output as a directory rather than a set of files. As an example, we may capture a folder full of generated From Bazel's perspective, it treats the directory as a single Using |
It appears a
So |
This patch only drops if there is
This patch handles this case. If you see this Without this patch, I can't generate a wheel because somewhere it passes We can (and this is only for wheels)
OR
|
After re-reading PEP 427, I think we should be safe to just ignore all
directories when build a wheel.
Every file that's present in the archive (aside from RECORD and signature
files) must have it's contents hash in RECORD.
Since it's not possible to calculate a hash of a directory (whether empty
or not), this pretty much means we can't add directories to wheels. So
packaging an empty directory is regrettably not possible - and if installer
implementations extract empty directories, this would mean they can be
tricked into creating empty directories that were not included in the
signatures.
So, just ignoring dirs in wheelmaker LGTM.
czw., 29 paź 2020 o 15:22 David Burns <[email protected]> napisał(a):
… I agree that a completely empty directory may be valid to skip within a
wheel; it's the dropping all directories regardless of contents that's my
concern.
This patch only drops if there is --input-file=some/directory but if you
then pass in --input-file=some/directory/some_file.py that file, and the
directory that was dropped originally, will be added.
For an example of where this sort of thing comes up, I often run into this
when dealing with gRPC protoc plugins, where the output of a plugin is
non-deterministic from file names, which requires capturing the output as a
directory rather than a set of files. As an example, we may capture a
folder full of generated .py files that should all be included in the
py_library, wheel etc (the python plugin is actually well behaved, but
this is just an example).
This patch handles this case. If you see this genrule
<https://github.com/SeleniumHQ/selenium/blob/trunk/py/BUILD.bazel#L132-L143>
creates a directory and populates it with .py files and is added to a
py_package via a py_library
<https://github.com/SeleniumHQ/selenium/blob/trunk/py/BUILD.bazel#L37-L55>
.
Without this patch, I can't generate a wheel because somewhere it passes
--input-file=directory/created/by/genrule and then add_file in
wheelmaker.py tries to write the directory as a file.
We can (and this is only for wheels)
- ignore the directory creation but they will be created if there are
files within that directory (this PR)
OR
- Add the potentially empty directory to the wheel and don't care that
there may be a random empty directory
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#336 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKU4J4MEMBS6WHD4RRGJNF3SNF3CPANCNFSM4OV7RAKQ>
.
--
Paweł Stradomski
|
Here's a demo of what I'm getting at, based on this PR's latest commit: aaliddell@7b1f2cd It declares a directory that contains a single python file called
As you can see, the file list contains the generated directory and the generated file, but Prior to the commits in the PR, running the same patch causes the error you mentioned in the initial PR comment, which is at least a more visible failure than silently missing files in the wheel.
I think it would be useful to establish why both the file and the directory are being passed to wheelmaker. That seems like the source of this issue for your use case. If the files are being passed, the directory containing them shouldn't be in the input file list in the first place.
As mentioned above, a genrule that creates a dir without enumerating the contained files is not recommended behaviour, which might be the source of you seeing the overlapping file and dir in the input file list.
The dirs themselves shouldn't appear in the RECORD file, but the contents of these dirs should be enumerated there and copied to the wheel. |
If in a dependency tree a directory is used as out this is passed as input_file to wheelmaker which then generates an unhandled error as it tries to write the directory as a file.
7d696c8
to
1bb9080
I have changed this patch to now walk the directory it is given and add those files. It should do this recursively. I have tested this out on the Selenium Project and it works |
If in a dependency tree a directory is used as out this is passed as
input_file to wheelmaker which then generates an unhandled error as
it tries to write the directory as a file.