The Wayback Machine - https://web.archive.org/web/20200917121604/https://github.com/pytorch/pytorch/issues/44472
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert num_kernels to int64 before calling into CUDA GET_BLOCKS #44472

Open
walterddr opened this issue Sep 10, 2020 · 8 comments · May be fixed by #44789
Open

Convert num_kernels to int64 before calling into CUDA GET_BLOCKS #44472

walterddr opened this issue Sep 10, 2020 · 8 comments · May be fixed by #44789

Comments

@walterddr
Copy link
Contributor

@walterddr walterddr commented Sep 10, 2020

🐛 Bug

Follow up on #43476.
Scanning CUDA KernelUtils.GET_BLOCKS might be called with overflowed 32bit integers. should convert to int64 to hold num_kernels

Additional context

See examples such as:

int num_kernels = channels * depth_col * height_col * width_col;

int n = THCTensor_(nElement)(state, output);

int n = THCTensor_(nElement)(state, gradInput);

int count = self.numel();

cc @ngimel

@dipanshug124
Copy link

@dipanshug124 dipanshug124 commented Sep 11, 2020

@walterddr in the python 3.x the size of int is not defined, it would work as well for 64 bit.
I'm not getting your issue, can you explain it using an example?

@kshitij12345
Copy link
Contributor

@kshitij12345 kshitij12345 commented Sep 11, 2020

@dipanshug124 You can look at #43476 (comment) for more context.

@Fowley-P
Copy link

@Fowley-P Fowley-P commented Sep 12, 2020

Hey, I'm looking at fixing this issue, but as someone who doesn't have a lot of experience with the size of datasets used for this, would a u32 suffice, or does it have to be i64? I'd like to avoid using the full i64 if possible, especially for something like CUDA kernels, which can't have a negative value.

@Randl
Copy link
Contributor

@Randl Randl commented Sep 13, 2020

I don't think u32 is enough (or will soon be not enough). Not sure whether negative values (say, -1) are used for some purposes, else you could use u64.

@ngimel
Copy link
Contributor

@ngimel ngimel commented Sep 13, 2020

i64 in this case won't get passed to cuda kernels, it is used to compute number of blocks without overflow.

@Fowley-P
Copy link

@Fowley-P Fowley-P commented Sep 14, 2020

Yeah I mentioned the negative value aspect of it because it's specifically to compute the number of kernels needed, and you'll never have negative values for the kernels so might as well use uints instead of ints to increase the capacity for almost no cost. And with respect to the extra space, I try to avoid using unnecessary space no matter which part of the code I'm working on. I'll get to work making the changes, but it'll take me a bit to make the changes in all the places I need to. Thanks for your input.

@walterddr
Copy link
Contributor Author

@walterddr walterddr commented Sep 14, 2020

@Fowley-P sorry for the late reply. this issue is to address the legacy int32 computation to prevent using overflowed value to compute # of blocks needed. so (1) doesnt matter whether it is unsign/sign since it cannot be more than MAX_INT32 # of blocks. it will error out way sooner. (2) dont worry about covering every single instance. please feel free to publish a PR when you at least address the 4 examples above. --Rong

soulitzer added a commit that referenced this issue Sep 15, 2020
…BLOCKS"

this addresses the example int64 issue in #44472.


Differential Revision: [D23699819](https://our.internmc.facebook.com/intern/diff/D23699819)

[ghstack-poisoned]
@walterddr
Copy link
Contributor Author

@walterddr walterddr commented Sep 15, 2020

only partially closed via #44688

@walterddr walterddr reopened this Sep 15, 2020
@walterddr walterddr linked a pull request that will close this issue Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants
You can’t perform that action at this time.