Job Title : ML Ops Engineer
Duration : Long Term Contract
Location : Bentonville AR(Onsite)
Key Responsibilities:
• Work with AI/ML Platform Enablement team within the eCommerce Analytics team. The broader team is currently on a transformation path, and this role will be instrumental in enabling the broader team's vision.
• Work closely with data scientists to help with production models and maintain them in production.
• Deploy and configure Kubernetes components for production cluster, including API Gateway, Ingress, Model Serving, Logging, Monitoring, Cron Jobs, etc. Improve the model deployment process for MLE for faster builds and simplified workflows
• Be a technical leader on various projects across platforms and a hands-on contributor of the entire platform's architecture
• System administration, security compliance, and internal tech audits
• Responsible for leading operational excellence initiatives in the AI/ML space which includes efficient use of resources, identifying optimization opportunities, forecasting capacity, etc.
• Design and implement different flavors of architecture to deliver better system performance and resiliency.
• Develop capability requirements and transition plan for the next generation of AI/ML enablement technology, tools, and processes to enable Walmart to efficiently improve performance with scale.
Tools/Skills (hands-on experience is must):
• Administering Kubernetes. Ability to create, maintain, scale, and debug production Kubernetes clusters as a Kubernetes administrator and In-depth knowledge of Docker.
• Ability to transform designs ground up and lead innovation in system design
• Deep understanding of data center architectures, networking, storage solutions, and scale system performance
• Have worked on at least one Kubernetes cloud offering (EKS/GKE/AKS) or on-prem Kubernetes (native Kubernetes, Gravity, MetalK8s)
• Programming experience in Python, Node, Golang, or bash
• Ability to use observability tools (Splunk, Prometheus, and Grafana ) to look at logs and metrics to diagnose issues within the system.
• Experience with Seldon core, MLFlow, Istio, Jaeger, Ambassador, Triton, PyTorch, Tensorflow/TFserving is a plus.
• Experience with distributed computing and deep learning technologies such as Apache MXNet, CUDA, cuDNN, TensorRT
• Experience hardening a production-level Kubernetes environment (memory/CPU/GPU limits, node taints, annotations/labels, etc.)
• Experience with Kubernetes cluster networking and Linux host networking
• Experience scaling infrastructure to support high-throughput data-intensive applications
• Background with automation and monitoring platforms, MLOps ,and configuration management platforms
• FLASK ,
Education & Experience: -
experience in roles with responsibility over data platforms and data operations dealing with large volumes of data in cloud based distributed computing environments.
• Graduate degree preferred in a quantitative discipline (e.g., computer engineering, computer science, economics, math, operations research).
• Proven ability to solve enterprise level data operations problems at scale which require cross-functional collaboration for solution development, implementation, and adoption.
Must have skills:
Python, Node, Golang, or bash
Experience with Seldon core, MLFlow, Istio, Jaeger, Ambassador, Triton, PyTorch, Tensorflow/TFserving is a plus.
Experience with distributed computing and Apache MXNet, CUDA, cuDNN, TensorRT
Experience production-level Kubernetes environment (memory/CPU/GPU limits, node taints, annotations/labels, etc.)
Lovepreet Singh
Account Manager
m: +1 669-309-1773
w: www.e-solutionsinc.com
e: Love.s@e-solutionsinc.com
LinkedIn: https://www.linkedin.com/in/singhlovepreet/
“Disclaimer: E-Solutions Inc. provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. We especially invite women, minorities, veterans, and individuals with disabilities to apply. EEO/AA/M/F/Vet/Disability.”