305 King St W
Suite 1100
Kitchener, ON N2G 1B9
Canada
Site Reliability Engineer Remote
Site Reliability Engineer Description
DESCRIPTION
Join our dynamic team as a Site Reliability Engineer and lead the way in optimizing and automating our Linux-based infrastructure.
With 3 to 5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure, you will play a crucial role in elevating our capabilities and ensuring high-impact, internet-facing production services run smoothly.
If you are passionate about driving innovation and efficiency in a tech-forward environment, we want you on our team.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
Responsibilities
- Optimize Linux operating systems for high-impact, internet-facing production services and distributed systems
- Implement and coordinate advanced telemetry using tools like Splunk, Grafana, and Prometheus
- Address complex issues in Kubernetes, setting high standards and best practices for the team
- Create and maintain cutting-edge automation scripts using Bash and Python to enhance operational efficiency
- Build and operate advanced systems such as Kubernetes or EKS, sharing knowledge and experience with the team
- Design and maintain robust, high-performance cloud infrastructure with AWS, ensuring top levels of availability and reliability
- Champion innovative automation solutions to minimize manual work and drive efficiency
- Lead deployment automation using tools like Terraform or CloudFormation to boost productivity and reliability
Requirements
- 3 – 5 years of experience in SRE roles
- Advanced experience with Linux system administration & IIS
- Advanced experience with Kubernetes, Docker, and containerization
- Proficient in managing Infrastructure as Code via tools such as Terraform or CloudFormation
- Proficient in Python & Bash
- Proficient in Splunk, Prometheus & Grafana
We Offer
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Relocation bonus: transportation, 2 weeks of accommodation for you and your family and more
- Monthly non-taxable amount for the electricity and internet bills
Conditions
- By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy