Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upConvert num_kernels to int64 before calling into CUDA GET_BLOCKS #44472
Comments
@walterddr in the python 3.x the size of int is not defined, it would work as well for 64 bit. |
@dipanshug124 You can look at #43476 (comment) for more context. |
Hey, I'm looking at fixing this issue, but as someone who doesn't have a lot of experience with the size of datasets used for this, would a u32 suffice, or does it have to be i64? I'd like to avoid using the full i64 if possible, especially for something like CUDA kernels, which can't have a negative value. |
I don't think u32 is enough (or will soon be not enough). Not sure whether negative values (say, -1) are used for some purposes, else you could use u64. |
i64 in this case won't get passed to cuda kernels, it is used to compute number of blocks without overflow. |
Yeah I mentioned the negative value aspect of it because it's specifically to compute the number of kernels needed, and you'll never have negative values for the kernels so might as well use uints instead of ints to increase the capacity for almost no cost. And with respect to the extra space, I try to avoid using unnecessary space no matter which part of the code I'm working on. I'll get to work making the changes, but it'll take me a bit to make the changes in all the places I need to. Thanks for your input. |
@Fowley-P sorry for the late reply. this issue is to address the legacy int32 computation to prevent using overflowed value to compute # of blocks needed. so (1) doesnt matter whether it is unsign/sign since it cannot be more than MAX_INT32 # of blocks. it will error out way sooner. (2) dont worry about covering every single instance. please feel free to publish a PR when you at least address the 4 examples above. --Rong |
…BLOCKS" this addresses the example int64 issue in #44472. Differential Revision: [D23699819](https://our.internmc.facebook.com/intern/diff/D23699819) [ghstack-poisoned]
only partially closed via #44688 |
Follow up on #43476.
Scanning CUDA KernelUtils.GET_BLOCKS might be called with overflowed 32bit integers. should convert to int64 to hold num_kernels
Additional context
See examples such as:
pytorch/aten/src/ATen/native/cuda/vol2col.cuh
Line 107 in c010ef7
pytorch/aten/src/THCUNN/generic/SpatialDepthwiseConvolution.cu
Line 71 in c010ef7
pytorch/aten/src/THCUNN/generic/SpatialDepthwiseConvolution.cu
Line 154 in c010ef7
pytorch/aten/src/ATen/native/cuda/MaxUnpooling.cu
Line 431 in c010ef7
cc @ngimel