Job Title: Data Analyst
Location: Columbus, Ohio
W2 Contract Only
Job Description:
Responsibilities:
- Develop and maintain data platforms using Python, Spark, and PySpark.
- Handle migration to PySpark on AWS.
- Design and implement data pipelines.
- Work with AWS and Big Data.
- Produce unit tests for Spark transformations and helper methods.
- Create Scala/Spark jobs for data transformation and aggregation.
- Write Scaladoc-style documentation for code.
- Optimize Spark queries for performance.
- Integrate with SQL databases (e.g., Microsoft, Oracle, Postgres, MySQL).
- Understand distributed systems concepts (CAP theorem, partitioning, replication, consistency, and consensus).
Skills:
- Proficiency in Python, Scala (with a focus on functional programming), and Spark.
- Familiarity with Spark APIs, including RDD, DataFrame, MLlib, GraphX, and Streaming.
- Experience working with HDFS, S3, Cassandra, and/or DynamoDB.
- Deep understanding of distributed systems.
- Experience with building or maintaining cloud-native applications.
- Familiarity with serverless approaches using AWS Lambda is a plus.
- Python, Spark, Pyspark, AWS, Scala.
Please share me your updated resume to bhavya@vmcsofttech.com