MinMax based observers: respect device affinity for state_dict #44537

vkuzo · 2020-09-11T04:34:49Z

Stack from ghstack:

#44537 MinMax based observers: respect device affinity for state_dict

Summary:

Originally, the min_val, max_val, min_vals, max_vals
attributes of observers were Tensors but not buffers. They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:

create model A, move it to a device (cpu/cuda) and save its state_dict
create model B, load its state dict.
min_val|min_vals|max_val|max_vals would always be loaded to model A's device, even if the rest of model B was on a different device
the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:

model A is on CPU, state dict is saved
model B is created and moved to GPU, state_dict from model A is loaded
assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default nn.Module save/load code handle
device assignment. We special case PerChannelMinMaxObserver and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:

python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23644493

Summary: Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b3af9e8050e8d5f9d63f212ac81a020fee1215cc Pull Request resolved: #44537

dr-ci · 2020-09-11T06:03:08Z

💊 CI failures summary and remediations

As of commit f1c41b7 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 11 19:40:52 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future

Sep 11 19:40:52 At: 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 11 19:40:52  
Sep 11 19:40:52 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 11 19:40:52  
Sep 11 19:40:52 At: 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 11 19:40:52  
Sep 11 19:40:52 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 11 19:40:52  
Sep 11 19:40:52 At: 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 11 19:40:52  
Sep 11 19:40:52 [W tensorpipe_agent.cpp:576] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 11 19:40:52 [W tensorpipe_agent.cpp:576] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 11 19:40:52 ok (1.737s) 
Sep 11 19:40:54   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:576] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 11 19:40:54 [W tensorpipe_agent.cpp:576] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 4 times.

raghuramank100

Good catch!. Curious as to what was wrong with the original code though?

vkuzo · 2020-09-11T17:31:07Z

Good catch!. Curious as to what was wrong with the original code though?

The original code did the equivalent of

# this makes self.min_val have the device of state_dict['min_val']
self.min_val = state_dict['min_val']

the expected behavior is

# this preserves the device of `self.min_val`
self.min_val.copy_(state_dict['min_val'])

…dict" Summary: Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23644493](https://our.internmc.facebook.com/intern/diff/D23644493) [ghstack-poisoned]

Summary: Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 306ab0a784f25ed415feaf30e61e4c5680cf9dc6 Pull Request resolved: #44537

facebook-github-bot · 2020-09-12T02:15:24Z

This pull request has been merged in 70dfeb4.

Summary: Pull Request resolved: #44537 Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23644493 fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e

Summary: Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 737eefdab061a9ab923d37c3bbe1c2bae13ef556 Pull Request resolved: #50868

…ate_dict" Summary: Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25991570](https://our.internmc.facebook.com/intern/diff/D25991570) [ghstack-poisoned]

Summary: Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 317bd8b5fb9a0c233f75c7c838afcc66f34064d6 Pull Request resolved: #50868

…50868) Summary: Pull Request resolved: #50868 Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as #44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25991570 fbshipit-source-id: 1193a6cd350bddabd625aafa0682e2e101223bb1

raghuramank100 approved these changes Sep 11, 2020

View changes

facebook-github-bot closed this in pytorch/pytorch@70dfeb4 Sep 11, 2020

facebook-github-bot added the merged label Sep 12, 2020

facebook-github-bot deleted the gh/vkuzo/144/head branch Sep 15, 2020

mruberry added the Merged label Oct 28, 2020

vkuzo mentioned this pull request Jan 21, 2021

fake_quant: fix device affinity and buffer resizing for state_dict #50868

Closed

Feb	MAR	Nov
	26
2021	2022	2023

pytorch / pytorch Public

MinMax based observers: respect device affinity for state_dict #44537

MinMax based observers: respect device affinity for state_dict #44537

vkuzo commented Sep 11, 2020 •

edited

dr-ci bot commented Sep 11, 2020 •

edited

raghuramank100 left a comment

vkuzo commented Sep 11, 2020

facebook-github-bot commented Sep 12, 2020

pytorch / pytorch Public

MinMax based observers: respect device affinity for state_dict #44537

MinMax based observers: respect device affinity for state_dict #44537

Conversation

vkuzo commented Sep 11, 2020 • edited

dr-ci bot commented Sep 11, 2020 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (1/1)

raghuramank100 left a comment

vkuzo commented Sep 11, 2020

facebook-github-bot commented Sep 12, 2020

vkuzo commented Sep 11, 2020 •

edited

dr-ci bot commented Sep 11, 2020 •

edited