We are seeking a highly experienced and motivated Senior Data Scientist to join our cell-free cf-mRNA data science team. This role is pivotal in driving innovation in secondary and tertiary analyses of cf-RNA sequencing, with a focus on developing predictive and prognostic models in AD research. The successful candidate will play a key role in advancing analysis techniques such as normalization, batch correction, differential gene expression analysis, pathway and enrichment analysis, feature selection/engineering, and model development optimized for generalizable validation.
We are looking for someone with a deep understanding of gene expression analysis, statistics, data science, and machine learning modeling, coupled with a passion for producing rigorous, translational results. This role requires a strong background in first-principles data exploration, excellent communication skills, a collaborative, data-driven mindset and a commitment to best practice clinical-grade implementation.
*This is not a remote position - this role is fully on site.
Key Responsibilities:
- Lead innovation in secondary and tertiary analysis of cf-RNA sequencing data, focusing on delivering rigorous and reproducible results
- Develop and implement advanced methods for differential gene expression analysis, pathway analysis, and enrichment analysis, optimizing for accuracy and biological insights
- Build, train, test, and validate predictive models, including logistic regression, random forests, and neural networks, as well as leverage existing RNA-seq large language models (LLMs) for inference and analysis
- Design and build scalable, efficient data analysis pipelines
- Engage in hypothesis-driven research, rigorously testing and validating new methods and models
- Critically evaluate results, ensuring robust models that are applicable in real-world clinical contexts beyond academic publications
- Visualize complex datasets and create compelling narratives to communicate findings to both scientific and executive audiences
- Collaborate with cross-functional teams, contributing to the company’s overall scientific and technical strategy
Qualifications:
- PhD in a quantitative field with a strong focus on biological sciences (e.g., Applied Statistics, Biophysics, Computational Biology)
- Postdoctoral experience is highly desirable
- 5+ years of biotech industry experience with a proven track record of leading successful projects
- Expertise in gene expression data analysis, including count table filtering, normalization strategies, noise quantification, differential expression analysis, and dimensionality reduction
- Strong foundation in statistical principles and rigorous application; including, but not limited to, hypothesis testing, P-value corrections, Bayesian approaches, bootstrapping, and permutation testing
- Extensive experience in building, training, testing, and validating machine learning and deep learning models, including model selection based on comparative analysis and performance metrics. Proficient in feature set development (selection, engineering, etc.) and skilled in updating and performing inference with RNA-seq-specific large language models (LLMs)
- Ability to innovate both in applying library methods and developing algorithms from scratch
- Experience with common data science infrastructure, including pipelines, clusters, databases, and feature stores. Direct experience with cloud platforms (AWS preferred) for scaling, deploying, and managing data workflows is a strong advantage
- Proficient in Python and Unix/Linux environments; additional proficiency in other languages (e.g. R, Julia, Rust) is a strong plus
- Strong coding skills across the software development lifecycle
- Deep scientific curiosity and a solid grasp of the scientific method, hypothesis testing, and model validation
- Passion for building predictive and prognostic models that perform effectively in real-world applications
- Independent research capabilities, with the ability to drive projects with minimal supervision
- Exceptional data visualization skills and the ability to translate complex datasets into actionable insights
- Excellent communication skills, with the ability to message both technical and executive-level audiences