About the Role
Dexterity is hiring an HPC Engineer to design and optimize our physical datacenter infrastructure spanning tens of thousands of cores. We are looking for an engineer with experience designing, deploying, and scaling large HPC deployments
This position requires excellent project management and technical know-how. You'll have broad responsibilities and freedom to analyze and solve problems, and your solutions will have an immediate impact on our operations worldwide.
Responsibilities
- Build and deploy new machines in our datacenter
- Optimize our distributed storage layer including NFS configuration, local caching, etc
- Maintain our distributed workload manager
- Project management - gathering requirements, tracking updates and providing detailed documentation
- Debug hardware failures
- Working with the networking teams on upgrading hardware, cabling, optics, etc
- Travel to data centers to ensure successful project completion
- Performing firmware upgrades on cluster machines
Qualifications
- BS in Computer Science, Electrical Engineering, or related IT field or 5+ years of relevant experience
- 5+ years of experience in HPC-like environments with large scale deployments greater than 10k cores
- Extensive experience with Linux troubleshooting and NFS
- Experience with distributed workload managers / job schedulers such as SLURM
- Proficient Python and shell scripting
- Deep knowledge of cpu hardware and differences
- Experience with Unix/Linux command line utilities and networking stack
Compensation and Benefits
- Annual base salary range of $175,000 to $225,000
- Participation in company bonus pool
- Health insurance and wellness benefits
- Unlimited vacation policy
- This role can be either remote or in person role in our San Francisco office