We are looking for a passionate and innovative Data Scientist that will focus on NLP to research and build the next generation of our Machine Learning technology. Your primary focus will be on researching and rapid-prototyping models and code that will be solutions to interesting and difficult data science problems in the NLP domain. You will also be responsible for working with other data scientists, our engineers and architects for deploying your models and code to production.
Expect to spend most of your time solving complex problems and building high fidelity models and highly-scalable back-end services. You will work in an environment where productivity is fostered. You will spend very little time in meetings, and you won't be distracted by processes that get in the way of high-quality results.
Responsibilities
- Own the Applied Research team architecture
- Drive model training methodologies and systems to increase model creation and accuracy
- Represent Applied Research in architecture discussions to facilitate a unified AutoQL architecture
- Collaborate with machine learning researchers on software engineering techniques and methodologies.
- Participate in the entire application lifecycle, focusing on implementation and troubleshooting
- Research and apply emerging technologies and drive continuous improvement of our software product
Responsibilities
- Research and apply emerging methods, techniques and technologies. Drive continuous improvement of our software product. Build on and adapt existing methods in NLP for our problem domains.
- Research, develop and use deep learning and other ML models for domain specific NLP deployment.
- Research, develop and use graph representations, knowledge bases, and graph embeddings to condition machine learning models on, and augment data.
- Research, develop and use data augmentation techniques for machine learning training.
- Design and implementation of high quality, high performance, scalable cloud applications.
- Collaborate with ML Engineers, Architects and Back-End developers to integrate your code and models into our tech stack and infrastructure.
- Participate in the entire application lifecycle, focusing on implementation and troubleshooting.
Minimum Requirements
- Masters or PhD degree in Computer Science, Computer & Electrical Engineering, Math, Physics, Statistics, or equivalent. Thesis research should include significant Machine Learning content.
- Working experience in deep learning frameworks such as PyTorch and Tensorflow.
- Working experience in general Python ML and data packages such as Scikit-learn, spaCy, and NLTK.
- Fluent in Python including object-oriented and function-oriented programming.
- Knowledge of the latest NLP research literature, in particular deep learning transformer architectures relevant to compositional learning, machine translation, generative models, RAG.
- Good understanding of relational database technologies, graphs and vector stores along with their respective query languages
- Strong testing and debugging skills. Proficient understanding of code versioning tools such as Git.
- Intellectual curiosity, self-motivation, creativity, critical thinking, and innate problem-solving skills with a strong desire to learn, innovate, and continuously challenge yourself.
- A positive, team-focused, results-oriented attitude, and strong collaboration skills.
Preferred Requirements
- Published papers in NLP, ML, and/or quantitative computing discipline is an asset.
- Familiarity with REST API design and implementation with knowledge of Python web framework such as Flask is highly valued.
- Familiarity with cloud computing development and platforms such as Google Cloud, Azure or AWS is an asset.
- Experience with distributed model training (Hugging Face Accelerate, PyTorch Distributed etc.)
- Knowledge of containerization and code deployments via CI/CD pipelines.
- Familiarity with Agile methodologies.