torch.nn.MSELoss displays incorrectly #567

profPlum · 2024-12-12T01:01:33Z

profPlum
Dec 12, 2024

Hi, I realize you guys are trying to establish your own conventions.
But I think this particular convention of expecting all losses to sum across the batch dim is too surprising for regular pytorch users... Especially considering that it creates a silent error, this is the worst kind!

Examples where this sum-reduction convention is problematic:

If you pass torch.nn.MSELoss() to the eval_losses dictionary it will incorrectly be divided by the batch size, because it is assumed to use sum reductions.
If you consider certain other regression loss functions like R^2 (e.g. torcheval.metrics.R2Score) which is expected to have an interpretable range between 0-1 (but also uses mean reductions) it becomes much more confusing. It will return very small values without any warning indicating the model is very very bad.
At high resolutions the Lp and H1 losses could explode causing NaN gradients and weights.
Neural Operators are resolution invariant but Lp and H1 losses are not. This makes it difficult to compare such loss values across different resolutions (which is a primary use case of the library).

Proposed Solution(s):
Minimal Solution: For the Lp and H1 etc... losses just divide the output by the size of the batch dimension (inside the loss function). This will at least fix problems 1 & 2.

More General Solution: Alternatively you could just do what Pytorch itself does (e.g. torch.nn.L1Loss and torch.nn.NLLLoss) and adopt the mean reduction convention for loss functions. After all the purpose of Neural Operators is resolution invariance isn't it? Well regular Lp and H1 etc... losses are not resolution invariant. So why not adopt mean reduction?

JeanKossaifi · 2024-12-17T14:09:32Z

JeanKossaifi
Dec 17, 2024
Maintainer

Thank you @profPlum for the detailed issue - we simplified the API of the losses in #486

In short: to support arbitrary inputs (and not just regular grids), we allow users to optionally provide custom quadratures. In the absence of that quadrature, we assume a regular grid over a domain of a given measure (by default, now set to 1) but that can be specified by users (e.g. 2*pi for Navier-Stokes). This allows the loss to converge to the integral as we refine the discretization.

So by default, the loss ends up averaging over the spatial dimensions (dividing by 1/measure), and the user can choose whether to reduce over channels and batch via sum or mean, as in PyTorch. We are also averaging over the batch dimension in the trainer (not the loss directly), to allow us to efficiently compute averages over multiple mini-batches in a distributed manner (e.g. for the validation set).

With the new changes, the API should be clearer and default to what you would expect. Let us know if you have any other feedback or suggestions!

0 replies

profPlum · 2024-12-17T23:39:30Z

profPlum
Dec 17, 2024
Author

@JeanKossaifi

I still think it would be more intuitive if metrics weren't divided by batch size in Trainer (& losses default to mean reductions like torch.nn.L1Loss). Because if you consider custom loss/metric functions (e.g. lambdas) then the warning won't work anymore. And I think generally that if users define a loss/metric a particular way they expect it to be reported as-is, without silent modification behind the scenes.

P.S. For me the original motivating example was trying to debug a wrapped version of torchmetrics.functional.r2_score

0 replies

JeanKossaifi · 2024-12-20T15:24:58Z

JeanKossaifi
Dec 20, 2024
Maintainer

I agree on principle but then it becomes trickier to accumulate results over multiple devices and mini-batches (e.g. for a large validation set).

One option can instantiate a distributed meter handler that will properly accumulate averages and assume that the batch size of the input corresponds to the actual dimension average over (that assumption may not always hold, e.g. if a model augments the batch dim by, e.g. mirroring inputs).

0 replies

profPlum · 2024-12-20T17:11:48Z

profPlum
Dec 20, 2024
Author

I don’t understand how that becomes tricky, you can just average all the mini batch metrics. It is already done in many libraries, e.g. Keras.

Also I don’t think it matters if a user does data augmentation, augmented data points can be treated as regular data points.

Just sum all the mini metrics, count how many you summed N then do sum(metrics)/N.

0 replies

JeanKossaifi · 2024-12-28T10:49:31Z

JeanKossaifi
Dec 28, 2024
Maintainer

The mean of means is not in general equal to the global mean, we are not guaranteed to have the same mini-batch size in each device/node using DDP, especially during evaluation, where we care mostly about accuracy. We'd need to track the mini-batch size in each device and accumulate these properly through an incremental mean handler.

0 replies

profPlum · 2024-12-28T15:42:57Z

profPlum
Dec 28, 2024
Author

The mean of means is not in general equal to the global mean, we are not guaranteed to have the same mini-batch size in each device/node using DDP, especially during evaluation, where we care mostly about accuracy. We'd need to track the mini-batch size in each device and accumulate these properly through an incremental mean handler.

Hmm that sounds like a strange customization option you support. I’m not sure how many people use it. But even so I think it is possible to overcome. You could multiply each metric by the batch size then divide by total data set size or something. That is my current hacky workaround.

0 replies

dhpitt · 2025-03-25T17:38:09Z

dhpitt
Mar 25, 2025
Maintainer

I think this makes a better discussion than an issue, since we have a convention for our problems

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

torch.nn.MSELoss displays incorrectly #567

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 7 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

torch.nn.MSELoss displays incorrectly #567

Uh oh!

Uh oh!

profPlum Dec 12, 2024

Replies: 7 comments

Uh oh!

JeanKossaifi Dec 17, 2024 Maintainer

Uh oh!

Uh oh!

profPlum Dec 17, 2024 Author

Uh oh!

JeanKossaifi Dec 20, 2024 Maintainer

Uh oh!

profPlum Dec 20, 2024 Author

Uh oh!

JeanKossaifi Dec 28, 2024 Maintainer

Uh oh!

profPlum Dec 28, 2024 Author

Uh oh!

dhpitt Mar 25, 2025 Maintainer

profPlum
Dec 12, 2024

JeanKossaifi
Dec 17, 2024
Maintainer

profPlum
Dec 17, 2024
Author

JeanKossaifi
Dec 20, 2024
Maintainer

profPlum
Dec 20, 2024
Author

JeanKossaifi
Dec 28, 2024
Maintainer

profPlum
Dec 28, 2024
Author

dhpitt
Mar 25, 2025
Maintainer