What you will do:
Entrepreneurial Mindset
- Problem-Solving: Approach technical challenges with an entrepreneurial mindset, unblocking the team for faster prototyping and experimentation of data solutions.
- Strive for Efficiency: Test new tools and technologies to generate ideas around AI/ML solutions, enhancing the team’s capability for faster experimentation.
Data Pipeline Design
- Data Engineering Expertise: Design, build, and maintain scalable data pipelines that are robust, reliable, and efficient.
- ETL Processes: Implement ETL processes to acquire, transform, and load data from various sources into data storage solutions.
- Scalability: Ensure data pipelines are scalable to handle large volumes of data and perform efficiently under varying workloads.
Data Acquisition and Cleaning
- Data Integration: Integrate data from diverse sources (SQL, NoSQL, APIs, flat files, etc.) while maintaining data integrity and quality.
- Data Cleansing: Clean and preprocess data to address inconsistencies, missing values, and outliers, ensuring data readiness for analysis and modeling.
- Automated Data Processing: Utilize tools and techniques for automating data acquisition and cleaning processes to streamline workflows and reduce manual effort.
ML Ops Management
- Model Deployment: Deploy machine learning models into production environments, ensuring versioning, deployment automation, and integration with ML pipelines.
- Monitoring and Logging: Set up monitoring systems to track model performance metrics, log outputs, and detect anomalies or issues.
Dashboard and API Development
- Front-End Development: Develop interactive dashboards and user interfaces to visualize data insights and model outputs.
- API Development: Design and implement APIs to expose data and ML models, enabling seamless integration with other applications and systems.
Collaboration with AI Engineers
- Cross-Functional Collaboration: Work effectively with AI Engineers, Software Engineers and business stakeholders to understand their data requirements, translate them into technical solutions, and optimize data access for AI/ML model development.
- Communication: Convey complex technical concepts and solutions effectively to non-technical stakeholders and team members.
Our Technology Stack
- Programming Languages: Proficiency in Python and SQL for data manipulation, extraction, and analysis. Experience in least one object-oriented language to develop web applications.
- Databases: Experience with both SQL, NoSQL databases (e.g., PostgreSQL, MongoDB, Snowflake) and data formats like Apache Parquet for data storage and optimization.
- ETL Tools and Data Pipelines: Familiarity with ETL processes and tools such as Apache Airflow or AWS Glue.
- Cloud Platforms: Expertise in cloud platforms and services such as AWS for deploying and managing data infrastructure.
- MLOps Tools: Experience with MLOps tools like AWS Sagemaker for managing machine learning workflows.
What you bring:
- Bachelor's degree or higher in Computer Science, Engineering, Mathematics, Statistics, or a related field.
- Overall 9+ years of experience, including graduate school.
- At least 5+ years of experience in a data engineering or software engineering role, with a proven track record in designing and implementing data solutions.
- Proficiency in Python and SQL for data manipulation, extraction, and analysis.
- Experience with both SQL and NoSQL databases, with the ability to design and optimize database schemas.
- Experience as a full-stack software engineer is highly advantageous.
- Familiarity with cloud platforms and services such as AWS, Azure, GCP, etc., for deploying and managing data infrastructure.
- Knowledge of capital markets and financial data is a significant plus.
- Experience in MLOps and familiarity with tools like AWS Sagemaker, MLFlow, etc., is advantageous.