Projects

January 2024

PyTorch implementation of GPT/GPT-2 from scratch from the original papers "Improving Language Understanding by Generative Pre-Training" and "Language Models are Unsupervised Multitask Learners".

This is a PyTorch implementation of GPT/GPT-2 from the original papers "Improving Language Understanding by Generative Pre-Training" and "Language Models are Unsupervised Multitask Learners" (Alec Radford et al.). GPT is coded from scratch in "vanilla" PyTorch without use of PyTorch transformer classes. The model was trained on part of The Pile dataset comprising of 21.5 bln tokens for only one epoch (the process took about two months on one 8Gb GPU). Even after one epoch of training the model exhibits ability (albeit clearly well below human level) to generate sensible prompt completions. The model achieves perplexity=19.35 on the validation set.

Full article

December 2022

PyTorch implementation of Transformer from scratch from the original paper "Attention Is All You Need".

This is a PyTorch implementation of Transformer from the original paper "Attention Is All You Need". Transformer is coded from scratch in "vanilla" PyTorch without use of PyTorch transformer classes. The model was trained on UN English-French parallel corpus and can be used for translation of formal, official documents from English to French. The model achieves the BLEU score of 0.43 on the validation set and exhibits close to human quality of translation on 15 long and convoluted test sentences I thought up myself. With Google Translate English->French translation serving as reference for those 15 sentences, the model achieves the BLEU score of 0.57.

Full article

July 2020

Coding ResNet-50 from scratch and training it on ImageNet

This is a pure training exercise. I wanted to build a relatively large CNN from scratch without making any use of anybody else's code. I also wanted to learn how to handle training on a relatively large dataset.

Full article

November 2019

Traffic sign recognition in TensorFlow 2.x

This is an exercise in image recognition. The goal is to train a simple convolutional network to recognize some traffic signs.

Full article

June 2019

Calculation of derivatives of cost function for Convolutional Neural Networks

This exercise was driven by pure curiosity rather than practical need. I was curious whether I can take a relatively simple CNN, calculate partial derivatives of cost function with respect to all learnable parameters analytically by hand, and then program it without using any ML framework or autograd library. I did exactly that, trained an image classifier, and got the same accuracy as in TensorFlow. I’m publishing the calculations here in case if anybody, for whatever reason, would be interested to follow.

Full article