I do fun stuff.
Once in a while, it's also useful.

Projects

This is a PyTorch implementation of GPT/GPT-2 from the original papers "Improving Language Understanding by Generative Pre-Training" and "Language Models are Unsupervised Multitask Learners" (Alec Radford et al.). GPT is coded from scratch in "vanilla" PyTorch without use of PyTorch transformer classes. The model was trained on part of The Pile dataset comprising of 21.5 bln tokens for only one epoch (the process took about two months on one 8Gb GPU). Even after one epoch of training the model exhibits ability (albeit clearly well below human level) to generate sensible prompt completions. The model achieves perplexity=19.35 on the validation set.



Full article

This is a PyTorch implementation of Transformer from the original paper "Attention Is All You Need". Transformer is coded from scratch in "vanilla" PyTorch without use of PyTorch transformer classes. The model was trained on UN English-French parallel corpus and can be used for translation of formal, official documents from English to French. The model achieves the BLEU score of 0.43 on the validation set and exhibits close to human quality of translation on 15 long and convoluted test sentences I thought up myself. With Google Translate English->French translation serving as reference for those 15 sentences, the model achieves the BLEU score of 0.57.



Full article

This exercise was driven by pure curiosity rather than practical need. I was curious whether I can take a relatively simple CNN, calculate partial derivatives of cost function with respect to all learnable parameters analytically by hand, and then program it without using any ML framework or autograd library. I did exactly that, trained an image classifier, and got the same accuracy as in TensorFlow. I’m publishing the calculations here in case if anybody, for whatever reason, would be interested to follow.



Full article