Senior AI Infrastructure Mgmt Engineer
Roles and Responsibilities
Linux Expertise:
Possess in-depth knowledge of Linux operating systems, including CentOS, Ubuntu, and Red Hat, with expertise in shell scripting, package management, and system administration.
Configure and optimize Linux-based servers for performance, security, and resource utilization, including kernel tuning, file system management, and network configuration.
Cloud Expertise (AWS/Azure):
Demonstrate hands-on experience with a wide range of AWS and Azure services, including but not limited to EC2, S3, Lambda, RDS, Azure VMs, Azure Blob Storage, Azure Functions, etc.
Architect cloud solutions leveraging best practices and services offered by AWS and Azure, optimizing for scalability, reliability, and cost-effectiveness.
Implement and manage hybrid cloud environments, facilitating seamless integration and interoperability between AWS and Azure services.
Infrastructure as Code (IAC):
Develop and maintain Infrastructure as Code (IAC) templates using tools such as Terraform or AWS CloudFormation, defining infrastructure components as code for automated provisioning and configuration.
Establish version control practices for IAC templates, ensuring traceability, auditability, and reproducibility of infrastructure changes.
AI/ML Infrastructure Mgmt:
Experience setting up cloud infrastructure stack, databases, service endpoints, GPU as well as CPU resource scaling, optimization etc.
Should have worked AIOps/MLOP
Should have worked on deploying AI/ML Apps using Docker and Kubernetes
Should have worked on scaling, high availability and reliability tasks for AI application
Should have worked on deploying and maintaining GPU clusters for AI/ML training and inference
Qualifications Required
Bachelor's degree in Computer Science, Engineering, or related field.
Skills and Experience Required
6+ years of experience in Infrastructure Mgmt. roles, with a focus on cloud platforms (Azure and AWS Preferred).
Hands-on experience with operations (DevSecOps) principles and best practices.
Proficiency in scripting languages such as Python, PowerShell, or Bash.
Excellent communication and collaboration skills.
Certifications such as AWS Solution Architect Associate, AWS Cloud Practitioner, Azure DevOps Engineer Expert, Azure Administrator Certified Kubernetes Administrator or relevant industry certifications are a plus.
Why you'll love working with us:
Opportunity to work on impactful technical challenges with global reach.
Vast opportunities for self-development, including online university access and knowledge sharing opportunities.
Sponsored Tech Talks & Hackathons to foster innovation and learning.
Generous benefits packages including health insurance, retirement benefits, flexible work hours, and more.
Supportive work environment with forums to explore passions beyond work. This role presents a unique opportunity to contribute to the future of impactful business solutions while advancing your career in a collaborative and innovative environment.