Senior Site Reliability Engineer Remote

We are seeking a skilled Senior Site Reliability Engineer to join our team and contribute to the development and maintenance of highly reliable and scalable systems. This role will involve optimizing infrastructure, automating processes, and ensuring system performance across cloud platforms and distributed systems. You will collaborate with cross-functional teams, lead technical initiatives, and provide mentorship to team members, fostering a culture of continuous improvement and operational excellence.

#LI-DNI

Responsibilities

Optimize Linux-based operating systems to ensure high performance for production services and distributed systems
Implement advanced telemetry solutions using tools like Grafana, Prometheus, and Splunk to enhance monitoring and organizational capabilities
Troubleshoot complex issues in Kubernetes, establishing best practices and standards for the team
Create and maintain automation scripts using Bash and Python to improve operational workflows
Develop and manage container orchestration systems such as Kubernetes or EKS, sharing expertise with the team
Design and maintain high-performance cloud infrastructure with AWS to ensure availability and reliability
Lead automation initiatives to reduce manual processes and enhance team efficiency
Provide strong leadership and foster a collaborative team environment through effective communication and ownership
Encourage continuous learning and professional growth within the team, cultivating a culture of improvement and curiosity
Offer technical mentorship and guidance to team members, ensuring clarity and efficiency in communication
Strategically manage disaster recovery and capacity planning to maintain system scalability and resilience
Automate deployment processes using tools like Terraform or CloudFormation to increase productivity and reliability
Integrate open-source technologies such as Cassandra, Kafka, Postgres, Solr, and Redis to strengthen SRE practices

Requirements

Bachelor's degree in Computer Science or a related field involving coding (e.g., physics or mathematics), or equivalent practical experience
At least three years of hands-on experience as a Site Reliability Engineer
Proficiency in Bash for scripting and automation tasks
Experience using Grafana for monitoring and visualization
Strong understanding of Linux systems and their optimization for production environments
Familiarity with Microsoft Internet Information Services (IIS) for managing web server infrastructure
Knowledge of Prometheus for monitoring and alerting in distributed systems
Proficiency in Python for developing automation and improving operational workflows
English language proficiency at a B2 level or higher, with excellent written and verbal communication skills

Nice to have

Experience working with Amazon Web Services (AWS) and designing scalable cloud solutions
Familiarity with cloud platforms and their integration into system architecture
Expertise in Kubernetes for container orchestration and management
Experience with Splunk for advanced telemetry and log management
Knowledge of Terraform and Terraform Cloud for infrastructure as code and deployment automation
Strong troubleshooting skills for identifying and resolving complex system issues

We offer

Career plan and real growth opportunities
Unlimited access to LinkedIn learning solutions
International Mobility Plan within 25 countries
Constant training, mentoring, online corporate courses, eLearning and more
English classes with a certified teacher
Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
Flexible work schedule and dress code
Collaborate in a multicultural environment and share best practices from around the globe
Hired directly by EPAM & 100% under payroll
Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
13 % employee savings fund, capped to the law limit
Grocery coupons
30 days December bonus
Employee Stock Purchase Plan
12 vacations days plus 4 floating days
Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
Monthly non-taxable amount for the electricity and internet bills

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.

Apply Apply

Apply For

Senior Site Reliability Engineer

Mexico

Thank you for your submission!

Our Talent Acquisition team will contact you with further details.

Oops...

Something went wrong. Please try again.

First Name*

Last Name*

Email*

Location*

Upload your file

Drag & drop your resume or browse files

Copy & paste your cover letter, CV link or message

* Indicates required fields

*Please complete required fields

Cancel

Refer a Friend

Know someone who would be a great fit for this opportunity?

Refer Now

GET IN TOUCH

Hello.
How can we help you?

Get in touch with us. We'd love to hear from you.

Our
Locations

AMERICAS

EMEA

APAC

Buenos Aires

Billinghurst 1833
6º Floor
C1425 Ciudad
Autónoma de Buenos Aires
Argentina

Map
P: +54-11-5218-5711/12
Córdoba

Av. Colón 778
7° & 9° Floor
X5000 Córdoba
Argentyna

Map
P: +54-351-570-9800

Buenos Aires

Billinghurst 1833
6º Floor
C1425 Ciudad
Autónoma de Buenos Aires
Argentina

Frequent Searches

Senior Site Reliability Engineer Description

Responsibilities

Requirements

Nice to have

We offer

Related job openings