Onsite role, must be local
Need valid LinkedIn
For this platform we are seeking an experienced Data Engineer with 7+ years of expertise in building scalable data pipelines, data streaming, and batch processing solutions.
Key Responsibilities
- Design, build, and maintain real-time data pipelines using Apache Kafka and Spark Streaming for processing large volumes of streaming data.
- Develop and optimize scalable data ingestion and data processing pipelines leveraging AWS data services such as Kinesis, Glue, Redshift, and S3 for both batch and streaming data architectures.
- Work with AWS Kinesis to capture, process, and analyze real-time streaming data, integrating with Kafka and Spark for seamless processing.
- Build and manage data lakes on AWS S3, designing storage layers to support both structured and unstructured data in real-time.
- Implement event-driven architectures using Kafka and AWS Lambda to trigger processing pipelines based on incoming data events.
- Collaborate with data scientists, data analysts, and backend developers to ensure proper data modeling and pipeline design for real-time analytics and business intelligence on AWS.
- Use AWS Redshift for data warehousing and Athena for serverless querying of data, integrating with streaming data pipelines.
- Monitor and optimize the performance of streaming applications and data pipelines to ensure high efficiency, scalability, and low-latency processing using AWS cloud-native monitoring tools like CloudWatch and AWS X-Ray.
- Ensure data governance, security, and compliance within AWS services, particularly when working with sensitive data and real-time processing.
- Deploy and manage infrastructure as code using AWS CloudFormation or Terraform to automate the provisioning of data streaming environments.
Requirements
- Strong experience with Apache Kafka, Kafka Streams, and Spark Streaming for building real-time data pipelines.
- Expertise in AWS data services such as Kinesis, S3, Glue, Lambda, Redshift, and Athena for building and scaling data solutions in the cloud.
- Hands-on experience with big data frameworks like Apache Spark for processing large-scale data sets.
- Proficiency in Python, Pyspark and Java for building and maintaining data pipelines.
- Strong understanding of real-time data processing architectures and distributed systems.
- Experience with stream processing frameworks such as Apache Flink, Storm, or NiFi .
- Familiarity with AWS data lake architectures and best practices for storing and querying data in S3.
- Knowledge of database technologies, including NoSQL (e.g., DynamoDB, Cassandra) and SQL databases.
- Experience with containerization and orchestration tools like Docker and Kubernetes for deploying applications in AWS environments.
- Strong experience in monitoring, troubleshooting, and optimizing real-time streaming pipelines using AWS services like CloudWatch, X-Ray, and AWS Step Functions.
- Experience with data governance, security, and compliance when working with real-time data in cloud environments.