Data Scientist

Greenfield Source • Full-time • Cambridge, MA, US • $100k - $120k / year • 1m ago

In this entry-level role you will work on-site at least 3-4 days per week, providing data engineering and software development to interdisciplinary projects ranging from basic research & discovery to clinical sample processing and iterative biological analysis. Motivated by the chance to help decipher fundamental biological processes that directly impact patient outcomes, the ideal candidate will exude a deep passion for science and through this will: • integrate, co-develop, and/or maintain analysis pipelines, software, databases and/or UIs • embrace collecting, annotating & shepherding access to internal & external data • prioritize data standardization & accuracy towards the extraction of actionable scientific insight • continuously optimize scalability, software & data quality, and internal SOPs • help to harden, benchmark, and deploy cutting-edge bio-informatic methods • interface and iterate with stakeholders on data collection and requirements gathering • identify & fill gaps in data, software, documentation • demonstrate strong drive, flexibility, resilience, and positive attitude in the face of challenges This is an excellent opportunity for anyone considering graduate or medical school, to learn about different careers, gain valuable experience and advice from a wide range of biomedical researchers, and develop a positive reputation across faculty at many of the top cancer centers in the country. The Ideal Candidate Would Possess Most of These Experiences & Skills • B.S or M.S. in Bioinformatics, CS, or Math; biologists with strong coding may be considered • 0-2 years of experience in related capacity • Coding proficiency in Python and/or R, UNIX shell • AWS and/or Google cloud infrastructure • Proficiency in SQL databases (Postgres preferred) • Effective communicator with very strong oral & written skills and attention to detail • Highly motivated thinker, who wants to put their own stamp on projects & responsibilities • Capable of working from incomplete information without micromanagement • Able to systematically prioritize deliverables across multiple projects • Proficiency in using APIs to drive systems to collect data, process, and compute upon it • Experience with public datasets used in biomedical research • Portals for data visualization & dashboarding with Jupyter, PANDAS, Streamlit, R/Shiny Proficiency In or Exposure to the Following Would be Strong Pluses • University, clinical, biotech, pharma settings • NextFlow, WDL, CWL workflow orchestration languages • SevenBridges, Terra, Synapse, CodeOcean, Foundry data management & analysis systems • NoSQL, HTML, JavaScript, and Java tools / programming languages • Integration of external software tools, data repositories or research publications into concrete deliverables aligned with organizational goals • Performing root cause of failure analysis on data & processes to answer specific research questions or identify opportunities for improvement • Storage, pipelined analysis, and interpretation of large-scale: • DNA and RNA sequence data (bulk and single cell) • spatial profiling and pathology image data • clinical data elements • Multi-omic bioinformatics, statistics, data cleansing, integration, and analysis • Biomarker discovery, clinical trial sample processing and analysis • Cloud / big data toolchains: Docker, Kubernetes, Spark, Kafka, Parquet, HDF5 • Oncology, immunology, immunotherapies for cancer