@codertimo the BERT positional embedding method is to just learn an embedding for each position. So you can use nn.Embedding with a constant input sequence [0,1,2,...,L-1] where L is the maximum sequence length.
@codertimo
Since BERT uses learned positional embeddings and it is one of the biggest difference between original transformers and BERT, I think it is quite urgent to modify the positional embedding part.
The position embedding in the BERT is not the same as in the transformer. Why not use the form in bert?
The text was updated successfully, but these errors were encountered: