CTG is seeking to fill a Data Engineer opening for our client in Morrisville, NC.
Location: Morrisville, NC (hybrid remote)
Duration: 12 months+, with possibility for extension
Overview:
We are excited to offer a few data engineer positions in Large Language Model and Multi-modality LLM field, specifically in European languages. The goal is to work with the team on the data part to help build strong multi-lingual AI models. In addition to English, the candidate is required to be proficient one or more of the following languages: German, Italian, French and Portuguese.
Duties:
- Develop and maintain web scraping and data extraction processes to gather large-scale text and image data from diverse sources.
- Clean, preprocess, and tag text and image data to ensure data quality and usability.
- Work with different data formats such as Parquet, JSONL, and CSV, ensuring efficient data storage and retrieval.
- Collaborate with data scientists and machine learning engineers to support the evaluation and improvement of large language models.
- Stay up-to-date with the latest research and advancements in the field of data engineering, web scraping, and machine learning. Actively participate in academic research and reading groups.
- Implement and optimize data pipelines for high-volume data processing.
- Strong proficiency in Python and solid understanding of HTML, JSON, and web technologies.
Education & Experience:
- Master degree required
- 2-4 years of experience
Excellent verbal and written English communication skills and the ability to interact professionally with a diverse group are required.
CTG does not accept unsolicited resumes from headhunters, recruitment agencies, or fee based recruitment services for this role.
To Apply:
To be considered, please apply directly to this requisition using the link provided. For additional information, please contact Jamie Robinson at Jamie.Robinson@ctg.com. Kindly forward this to any other interested parties. Thank you!