The Wayback Machine - https://web.archive.org/web/20220326150346/https://github.com/pytorch/pytorch/pull/44537
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MinMax based observers: respect device affinity for state_dict #44537

Closed
wants to merge 2 commits into from

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Sep 11, 2020

Stack from ghstack:

  • #44537 MinMax based observers: respect device affinity for state_dict

Summary:

Originally, the min_val, max_val, min_vals, max_vals
attributes of observers were Tensors but not buffers. They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:

In practice, the case people would sometimes hit is:

  • model A is on CPU, state dict is saved
  • model B is created and moved to GPU, state_dict from model A is loaded
  • assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default nn.Module save/load code handle
device assignment. We special case PerChannelMinMaxObserver and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:

python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23644493

Summary:

Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers.  They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment.  We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:

```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
vkuzo added a commit that referenced this issue Sep 11, 2020
Summary:

Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers.  They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment.  We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:

```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: b3af9e8050e8d5f9d63f212ac81a020fee1215cc
Pull Request resolved: #44537
@dr-ci
Copy link

@dr-ci dr-ci bot commented Sep 11, 2020

💊 CI failures summary and remediations

As of commit f1c41b7 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 11 19:40:52 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Sep 11 19:40:52 At: 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 11 19:40:52  
Sep 11 19:40:52 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 11 19:40:52  
Sep 11 19:40:52 At: 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 11 19:40:52  
Sep 11 19:40:52 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 11 19:40:52  
Sep 11 19:40:52 At: 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(93): serialize 
Sep 11 19:40:52   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(145): serialize 
Sep 11 19:40:52  
Sep 11 19:40:52 [W tensorpipe_agent.cpp:576] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 11 19:40:52 [W tensorpipe_agent.cpp:576] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 11 19:40:52 ok (1.737s) 
Sep 11 19:40:54   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:576] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 11 19:40:54 [W tensorpipe_agent.cpp:576] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 4 times.

Copy link
Contributor

@raghuramank100 raghuramank100 left a comment

Good catch!. Curious as to what was wrong with the original code though?

@vkuzo
Copy link
Contributor Author

@vkuzo vkuzo commented Sep 11, 2020

Good catch!. Curious as to what was wrong with the original code though?

The original code did the equivalent of

# this makes self.min_val have the device of state_dict['min_val']
self.min_val = state_dict['min_val']

the expected behavior is

# this preserves the device of `self.min_val`
self.min_val.copy_(state_dict['min_val'])

…dict"

Summary:

Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers.  They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment.  We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:

```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23644493](https://our.internmc.facebook.com/intern/diff/D23644493)

[ghstack-poisoned]
vkuzo added a commit that referenced this issue Sep 11, 2020
Summary:

Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers.  They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment.  We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:

```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 306ab0a784f25ed415feaf30e61e4c5680cf9dc6
Pull Request resolved: #44537
@facebook-github-bot
Copy link
Contributor

@facebook-github-bot facebook-github-bot commented Sep 12, 2020

This pull request has been merged in 70dfeb4.

@facebook-github-bot facebook-github-bot deleted the gh/vkuzo/144/head branch Sep 15, 2020
xuzhao9 added a commit that referenced this issue Sep 18, 2020
Summary:
Pull Request resolved: #44537

Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers.  They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment.  We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:
```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23644493

fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e
vkuzo added a commit that referenced this issue Jan 21, 2021
Summary:

Ensures that `FakeQuantize` respects device affinity when loading from
state_dict, and knows how to resize scale and zero_point values
(which is necessary for FQ classes wrapping per channel observers).

This is same as #44537, but for
`FakeQuantize`.

Test Plan:

```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
vkuzo added a commit that referenced this issue Jan 21, 2021
Summary:

Ensures that `FakeQuantize` respects device affinity when loading from
state_dict, and knows how to resize scale and zero_point values
(which is necessary for FQ classes wrapping per channel observers).

This is same as #44537, but for
`FakeQuantize`.

Test Plan:

```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 737eefdab061a9ab923d37c3bbe1c2bae13ef556
Pull Request resolved: #50868
vkuzo added a commit that referenced this issue Jan 25, 2021
…ate_dict"

Summary:

Ensures that `FakeQuantize` respects device affinity when loading from
state_dict, and knows how to resize scale and zero_point values
(which is necessary for FQ classes wrapping per channel observers).

This is same as #44537, but for
`FakeQuantize`.

Test Plan:

```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25991570](https://our.internmc.facebook.com/intern/diff/D25991570)

[ghstack-poisoned]
vkuzo added a commit that referenced this issue Jan 25, 2021
Summary:

Ensures that `FakeQuantize` respects device affinity when loading from
state_dict, and knows how to resize scale and zero_point values
(which is necessary for FQ classes wrapping per channel observers).

This is same as #44537, but for
`FakeQuantize`.

Test Plan:

```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 317bd8b5fb9a0c233f75c7c838afcc66f34064d6
Pull Request resolved: #50868
facebook-github-bot added a commit that referenced this issue Jan 25, 2021
…50868)

Summary:
Pull Request resolved: #50868

Ensures that `FakeQuantize` respects device affinity when loading from
state_dict, and knows how to resize scale and zero_point values
(which is necessary for FQ classes wrapping per channel observers).

This is same as #44537, but for
`FakeQuantize`.

Test Plan:
```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25991570

fbshipit-source-id: 1193a6cd350bddabd625aafa0682e2e101223bb1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants