
PyTorch • Multi30k • Full encoder–decoder architecture
Implemented a full Transformer model from scratch in PyTorch, including multi-head attention, positional encodings, custom tokenizers, and a training loop on the Multi30k dataset. Focused on reproducing “Attention is All You Need” details and understanding every component end-to-end.
Transformer Project Github Link