Skip navigation EPAM
CONTACT US

Senior Site Reliability Engineer Pune, India

  • hot

Senior Site Reliability Engineer Description

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

We are seeking a talented and motivated Senior Site Reliability Engineer (SRE) to join our organization.

The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies.


#LI-DNI

Responsibilities

  • Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues
  • Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services
  • Automate repetitive tasks and workflows to improve efficiency and reduce error using scripting and programming languages
  • Implement and manage observability tools for comprehensive monitoring, alerting, and logging
  • Develop and execute automation strategies using tools like Jenkins, GitLab, and Ansible/Chef
  • Define and oversee SLI, SLO, SLA, and Error Budget to maintain service quality
  • Provide on-call support for incident management and participate actively in response activities

Requirements

  • Should have 5 to 8 years of experience
  • Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments
  • Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure
  • Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud
  • Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments
  • Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed

We offer

  • Opportunity to work on technical challenges that may impact across geographies
  • Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
  • Opportunity to share your ideas on international platforms
  • Sponsored Tech Talks & Hackathons
  • Unlimited access to LinkedIn learning solutions
  • Possibility to relocate to any EPAM office for short and long-term projects
  • Focused individual development
  • Benefit package:
    • Health benefits
    • Retirement benefits
    • Paid time off
    • Flexible benefits
  • Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)

A DAY IN THE LIFE

BLOG

Salman Talat
Director, Account Management
TORONTO, CANADA

Read More

BLOG

Iryna Kovalenko
Delivery Manager
KYIV, UKRAINE

Read More

BLOG

Jan Mazurek
Chief Business Analyst
GDANSK, POLAND

Read More

GET IN TOUCH

Hello.
How can we help you?

Get in touch with us. We'd love to hear from you.

Our
Locations