Data Engineer

Tomato.ai • Full-time • Remote (United States, US) • 2m ago

Company Overview

Tomato.ai softens accents on calls. The company raised over $12M and is led by 2 ex-Googlers who worked in the speech space for years. The founders previously sold 4 tech startups. The company is remote-first, based in the US, and hiring for this role world-wide.

Pay range

Highly competitive compensation and benefits. Exact compensation may vary based on skills, experience, and location.

Location

Fully remote.

Responsibilities

Develop and operate pipelines for large scale speech data processing using Apache Beam and Google Cloud Dataflow
Develop algorithms for training data selection and augmentation for speech ML models
Closely collaborate with the researchers for achieving the model performance goals.

Required Qualifications

Minimum 5 years of experience in data engineering.
Experienced in web scraping and data processing.
Extensive hands-on experience on large scale data processing
Proficient in at least one of Apache Beam, Spark and Flink
Passionate about and skillful at data analysis; able to produce practical insights
Good understanding of the state-of-the-art deep learning techniques.
Proficient in Python and PyTorch
Hands-on experience on ML model training
Attention to details
Effective communication skills.
Ability to work independently in a remote-first environment.

Preferred Qualifications

Experienced with audio data processing.
Familiarity with GCP.
Experience with speech or audio ML models is optional but a big plus