Company Description
StackOverdrive.io is a DevOps, Data Engineering & Web Development Consultancy headquartered in New York City. We help companies build better software and shorten the time from development to production by leaving manual processes behind using infrastructure-as-code. We embrace an automated, collaborative, & agile way of working to increase your organization’s speed, reliability, and efficiency at scale with minimal bureaucracy.
Role Description
We are seeking a highly skilled and experienced Data Infrastructure Engineer to join our innovative team. The ideal candidate will have a robust background in building and managing data infrastructure and pipelines. You will be responsible for designing, developing, and optimizing data workflows using cutting-edge tools and technologies such as Apache Airflow, Kubernetes, Apache Spark, Amazon Redshift, and various Business Intelligence (BI) tools. Your work will be pivotal in ensuring the efficient processing and storage of large volumes of data, enabling data-driven decision-making across the organization.
Key Responsibilities
Infrastructure Design and Development:
- Design and implement scalable and efficient data architectures, including databases and large-scale processing systems.
- Develop robust data pipelines using Apache Airflow for workflow orchestration.
- Leverage Kubernetes for container orchestration to ensure seamless deployment and management of data services.
Data Integration and ETL:
- Implement Extract, Transform, Load (ETL) processes to integrate data from various sources into data warehouses like Amazon Redshift.
- Ensure high data quality, integrity, and reliability throughout the data lifecycle.
Performance Optimization:
- Optimize and tune data processing jobs for performance and cost-efficiency using Apache Spark.
- Continuously monitor and improve the performance of data pipelines and infrastructure.
Collaboration:
- Work closely with data engineers, data scientists, analysts, and other stakeholders to understand data requirements and deliver effective solutions.
- Collaborate with DevOps and IT teams to deploy and manage data infrastructure in cloud environments.
Data Management and Security:
- Implement data governance practices to ensure compliance with data privacy regulations.
- Manage data security and access controls to safeguard sensitive information.
Reporting and Visualization:
- Utilize BI tools to create reports and dashboards that provide actionable insights to business users.
- Support ad-hoc data analysis requests and provide expertise on data querying and visualization.
Required Qualifications
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Proven experience as a Data Infrastructure Engineer or in a similar role.
- Proficiency with Apache Airflow, Kubernetes, Apache Spark, and Amazon Redshift.
- Experience with BI tools such as Tableau, Domo, Power BI, or Looker.
- Strong knowledge of SQL and experience with database management systems.
- Familiarity with cloud platforms such as AWS, GCP, or Azure.
- Understanding of data warehousing concepts and ETL processes.
- Strong problem-solving skills and the ability to work in a fast-paced environment.
- Excellent communication and collaboration skills.
Preferred Qualifications
- Experience with programming languages such as Python, Java, or Scala.
- Knowledge of data governance and data privacy regulations.
- Experience with big data technologies like Hadoop.
- Familiarity with machine learning workflows and tools.
Location
- We are located in NYC
- 100% Remote Position
- U.S. Citizens only
- We are an equal Opportunity Employer