Skip navigation EPAM
CONTACT US

Senior Site Reliability Engineer Remote

Senior Site Reliability Engineer Description

We are in search of a Senior Site Reliability Engineer with a focus on cost savings and advanced system maintenance to join our team.

As a Senior Site Reliability Engineer, you will play a pivotal role in building, supporting, and optimizing high-capacity systems that are both efficient and cost-effective. You will be responsible for sophisticated tasks within our AWS infrastructure, working closely with product development teams to enhance automation, improve system performance, and ensure the reliability of our systems while optimizing costs and resources effectively.

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.


#LI-DNI

Responsibilities

  • Develop and implement advanced cloud cost optimization strategies through in-depth analysis and resizing recommendations
  • Collaborate with engineering and product teams to engineer sophisticated cost-aware architectural solutions
  • Design, maintain, and optimize comprehensive dashboards for monitoring cloud expenditures in real-time
  • Identify and implement AWS cost-saving strategies including Reserved Instances and Savings Plans with a focus on maximizing financial efficiency
  • Foster a culture of financial prudence and accountability regarding cloud resource usage across engineering teams
  • Design, analyze, and manage troubleshooting strategies for highly distributed large-scale production systems and cloud-based services
  • Lead continuity planning efforts, including failure injections and the validation of effective monitoring configurations
  • Propose and integrate infrastructure scalability enhancements to manage at least double the current expected load
  • Control middleware, network, storage, database, and server coordination on a larger, more complex scale
  • Perform advanced performance testing and tuning to ensure optimized system responsiveness
  • Develop, refine, and oversee telemetry processes to monitor key operational metrics for better decision-making

Requirements

  • 3+ years of experience as a software engineer developing, debugging, and deploying enterprise applications in high-demand environments
  • Strong track record of managing cloud infrastructure costs using tools like AWS Cost Explorer
  • Advanced proficiency in infrastructure automation technologies such as Terraform
  • Expertise in managing container orchestration using ECS or Kubernetes on a large scale
  • Expert troubleshooting skills across hosting technologies including web servers, operating systems, and network components
  • Advanced skills in continuous deployment frameworks and lifecycle management (e.g., CI/CD)
  • Deep understanding of database operations and deployment with cloud databases like RDS MySQL, Postgres, and Aurora
  • Expert knowledge of caching strategies for high concurrency workloads
  • Mastery of Lean/Agile deployment processes such as Blue/Green, ZDT, and Canary
  • Expertise with telemetry SaaS systems including New Relic products like APM and Synthetics
  • Exceptional problem-solving and root cause analysis capabilities with a track record of high-impact solutions
  • Excellent communication skills and ability to manage culturally aligned escalation response plans across different teams
  • English level B2+ for effective communication across global teams

Nice to have

  • Bachelor's or Master's Degree in Computer Science or an equivalent field
  • Advanced ability to communicate across a wide range of technical and non-technical stakeholders
  • Proficiency in multiple programming languages including JavaScript, Python, and PHP, among others

We offer

  • Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept)
  • Medicina Prepaga (It covers the collaborator and direct family group)
  • Paternity Leave (Two additional days are added to what is established by law, total of 4 days)
  • Discounts card
  • English Training (English lessons, twice per week)
  • Training Program (Access to multiple customized training plans according to the needs of each role within the company)
  • Marriage bonus (The company doubles the allowance established by law that ANSES offers)
  • Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company)
  • External Agreements and Discounts
  • Vacations: 14 calendar days a year

By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.

GET IN TOUCH

Hello.
How can we help you?

Get in touch with us. We'd love to hear from you.

Our
Locations