Who I am

Alex sitting at his computer holding a pen and thinking

Machine Learning Engineer and Entrepreneur

Got my PhD in Theoretical Physics ( Impurity in Luttinger liquid, Decoherence in qubits) from the University of Birmingham, UK in 2005. Worked briefly as a postdoc researcher at the University of Birmingham (Decoherence and relaxation in qubits) and at Lancaster University, UK ( Photoemission in graphene).

In 2007 I left academia and co-founded a dating start-up ZeSecret in Santa Clara, California. I screwed this one up big time and in 2010 it had to be closed. Between 2011 and 2017 I did two online projects in Russia. The first one was in banking and it was successful to some degree, the other was in healthcare and it failed.

In 2017 I moved to Canada and switched from web to Machine Learning. I started self-teaching myself through online courses, personal projects, and contract work. My ultimate goal is to found a Machine Learning start-up that would grow to a decent size. Or bigger. I'm currently looking for ML start-up ideas while continuing to educate myself. My main focus is on transformers in various forms.

More about me

Learning Projects — Featured

January 2024

PyTorch implementation of GPT/GPT-2 from scratch from the original papers "Improving Language Understanding by Generative Pre-Training" and "Language Models are Unsupervised Multitask Learners".

This is a PyTorch implementation of GPT/GPT-2 from the original papers "Improving Language Understanding by Generative Pre-Training" and "Language Models are Unsupervised Multitask Learners" (Alec Radford et al.). GPT is coded from scratch in "vanilla" PyTorch without use of PyTorch transformer classes. The model was trained on part of The Pile dataset comprising of 21.5 bln tokens for only one epoch (the process took about two months on one 8Gb GPU). Even after one epoch of training the model exhibits ability (albeit clearly well below human level) to generate sensible prompt completions. The model achieves perplexity=19.35 on the validation set.

Full article

December 2022

PyTorch implementation of Transformer from scratch from the original paper "Attention Is All You Need".

This is a PyTorch implementation of Transformer from the original paper "Attention Is All You Need". Transformer is coded from scratch in "vanilla" PyTorch without use of PyTorch transformer classes. The model was trained on UN English-French parallel corpus and can be used for translation of formal, official documents from English to French. The model achieves the BLEU score of 0.43 on the validation set and exhibits close to human quality of translation on 15 long and convoluted test sentences I thought up myself. With Google Translate English->French translation serving as reference for those 15 sentences, the model achieves the BLEU score of 0.57.

Full article

Let's Talk

If you are running an ML start-up or think about founding an ML start-up, I would love to talk to you.

More about that