Open
Description
Dear authors,
In the paper, it is said that the final loss of attention supervision is the average of the cross entropy loss of the attention weights in each attention head. However, in
HateXplain/Models/bertModels.py
Line 57 in 01d7422
I am concerned about this detail because of the
Did I get it right? I would appreciate any clarification on this matter.
Thank you very much! 😊
Metadata
Metadata
Assignees
Labels
No labels