daniil_photo_croped

Daniil Litvinov

Bioinformatician | Data scientist

About

I am a computational biologist with a commercial experience in python programming and in machine learning.

My industry experience is applying machine learning methods and statistics for analyzing omics data and computer vision methods for cryo-EM data analysis. In addition, I have experience teaching statistics and machine learning for students of biology and medicine.

My primary goal is to continue to develop my skills in the field of AI/ML/DL applied to biology.

Experience

Middle bioinformatician

  • Development of libraries for bulk RNA-seq data deconvolution as well as tumor profiling
  • Created custom cross-validation technique, as well as several approaches for feature selection, e.g., a genetic algorithm with regularization for the number of fetures
  • Designed the entire life cycle of models (feature selection, selection of hyperparameters and validation) in the pipeline in K8s
June 2022 - Present
Yerevan, Armenia

Junior bioinformatician

  • Participation in the development of a platform for the analysis of omics data (single-cell RNA-seq, CITE-seq, bulk RNA-seq)
  • Created a pipeline for semi-automatic cell types annotation
  • Researched the application of statistical approaches and ML for the detection of aneuploid cells based on transcriptomic data
February 2021 - May 2022
Moscow, Russia

Laboratory assistant

  • Created CLI tool as well as web application for cryo-EM maps resolution estimation using deep learning
  • Structural studies of various viruses using cryo-EM and molecular modeling
  • Obtained a full atomic model of a new bacteriophage TaPaz
October 2020 - May 2022
Moscow, Russia

Laboratory assistant

  • Molecular identification and phylogenetic analysis of a new strain of carotenogenic microalgae
  • Selection of cultivation conditions and analysis of the carotenoid composition of the studied strain
  • Metagenomic analysis of the composition of the microbial community according to 16S rRNA data
June 2020 - May 2020
Moscow, Russia

Projects

These are some of my projects that have a visual component, either a CLI or a presentation
lorennmap_logo_croped
View demo
LoreNNMap

September 2022

Created CLI tool as well as a web application for cryo-EM maps resolution estimation using deep learning.

This tool is based on 3D-UNet model architecture. Classic UNet is commonly used for image segmentation tasks and it is essentially a classification, but in this case, I used this model for regression.

Python Django PyTorch Keras EMAN2 RELION-3 scikit-learn Bash
wagtail_logo_croped
View demo
Portfolio website

October 2022

This is a Python Django-based personal portfolio website.

The website uses Wagtail CMS. Wagtail is a Django Content Management System.

All content: personal information, portfolio projects, social media links, etc. can be adjusted in Wagtail admin.

Python Docker JavaScript CSS HTML Django Wagtail SQL
im_spring_project_croped
View demo
scRNA-seq data integration

June 2021

The goal of this project is to tackle the complexity of data analysis by identifying the best approaches. The single-cell transcriptomics analysis has multiple steps, but we have focused on data integration — a crucial step when working with clinical data coming from patients.

Python R scikit-learn Scanpy BBKNN MNN Scanorama Cell Ranger Bash
sky_run_croped
View demo
Sky runners

February 2021

This project aims to study differential genes expression of 19 sportsmen during physical and psychological stress before and after running in extreme highlands conditions (2450-3450 m, Elbrus m.) and also in the "start" point before arrival at the competition (St. Petersburg).

Python R scikit-learn DESeq2 FastQC Bash STAR RSEM MSigDB GeneQuery
rec_sys_croped
View demo
Recommender system

October 2022

Content-based recommender system API based on the text of the post and user data.

Developed models based on text features obtained with TF-IDF, BERT, RoBERTa, and DistilBERT.

Created an A/B testing system to compare models using the hit rate metric.

Python Docker SQL PyTorch scikit-learn CatBoost FastAPI Optuna

Skills

PYTHON
PYTORCH
LINUX
R

Programming

  • Python (Numpy, Pandas, Matplotlib, Seaborn, Sklearn, PyTorch, Keras, FastAPI, Django)

  • R (ggplot2, Seurat, DeSeq2, dplyr)

  • Linux, Bash, git, Docker, Kubernetes

  • JavaScript

Machine Learning Methods

  • Classical Machine Learning (linear models, tree-based approaches, Catboost, LightGBM, XGBoost, Bayesian methods)

  • Deep learning (MLP, CNN, image segmentation, detection, RNN, LSTM, Transformers, AE, VAE, GAN, TabNet)

  • Model tuning (Optuna, genetic algorithm, Boruta)

  • Interpretable machine learning (SHAP, LIME, Pixel Attribution)

Bioinformatics

  • Databases (NCBI, UniProt, PDB, MsigDB, SILVA)

  • Command-line tools (Cellranger, cellSNP, Picard, BLAST, GATK, STAR, SPAdes)

  • Protein sequence analysis tools (MAFFT, MUSCLE, HMMER, ESM)

  • Protein structure analysis tools (Rosetta, Phenix, Coot, AlphaFold)

Languages

  • Russian – Native

  • English – Full professional proficiency

  • German – Elementary proficiency

Life Sciences

  • Biological education, which helps me to understand specialized biological and medical literature

  • Work experience in molecular, microbiological, and biochemical labs

Soft skills

  • Agile software development methods

  • Presentation skills

Education

Structural biology, Master of science

September 2020 - May 2022
Moscow, Russia

Bioinformatics for Biologists, Retraining program

September 2020 - May 2021
St. Petersburg, Russia

Biology, Bachelor of science

September 2016 - May 2020
Moscow, Russia

Publications

These are my co-authored publications
first_article_logo_2022_croped
Paper

Structural characterization of propiolactone inactivated severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) particles

2022

Here, we characterized the β-propiolactone inactivated SARS-CoV-2 virions using transmission electron microscopy (TEM) and atomic force microscopy (AFM).

TEM showed that the spike (S) proteins were in the pre-fusion conformation. Notably, the S proteins could be recognized by specific monoclonal antibodies. Analytical TEM showed that the inactivated virions retained nucleic acid. Altogether, we demonstrated that the inactivated SARS-CoV-2 virions retain the structural features of native viruses and provide a prospective vaccine candidate.

algae_article_2020_croped_2
Paper

Combined Production of Astaxanthin and Carotene in a New Strain of the Microalga Bracteacoccus aggregatus BM5/15 (IPPAS C-2045)

2021

Possible pathways of astaxanthin synthesis are proposed based on the carotenoid composition obtained in this work.

Collectively, a new strain B. aggregatus BM5/15 is a potential biotechnological source of two natural antioxidants, astaxanthin and β-carotene. The results give rise for further works on the optimization of B. aggregatus cultivation on an industrial scale.

mod_model_croped
Paper

Molecular Modeling of the HR2 and Transmembrane Domains of the SARS-CoV-2 S Protein in the Prefusion State (2021)

2021

The paper reports molecular modeling of the S protein fragment corresponding to its coiled-coil HR2 domain and fully palmitoylated transmembrane domain.

Model stability in the lipid bilayer was confirmed by all-atom and coarse-grained molecular dynamics simulations. It has been demonstrated that palmitoylation leads to a significant decrease in transmembrane domain mobility and local bilayer thickening, which may be relevant for protein trimerization.

tapaz_struct_croped
Paper

Structure of A. Baumannii Phage Tapaz, Revealed with Cryo-Electron Microscopy

2021

The cryo-EM map was reconstructed with single particle analysis independently for the capsid, tail, and baseplate regions.

The capsid was reconstructed at 3.9 Å resolution with I3 symmetry applied. The baseplate region of the phage was reconstructed at 3.5 Å resolution with C3 symmetry. The tail region was reconstructed at 2.6 Å resolution with helical symmetry (Rise 36.4 Å, Twist 25.7 deg).

The initial atomic model for the tail region was built from the sequence with Deeptracer and was further refined in coot.

algae_article_2020_croped_3
Paper

Diversity of Carotenogenic Microalgae in the White Sea Polar Region

2020

We isolated several strains of carotenogenic microalgae from the coastal zone of the White Sea, where they were abundant.

The obtained microalgae related to four species of Chlorophytes: Haematococcus lacustris, H. rubicundus, Coelastrella aeroterrestrica, and Bracteacoccus aggregatus. The last three species have been reported for polar latitudes for the first time.

Most likely, carotenogenic algae on the White Sea coast are abundant due to their high physiological and metabolic plasticity, which is essential for surviving under adverse conditions of the northern regions.

Contact Me