305 King St W
Suite 1100
Kitchener, ON N2G 1B9
Canada
Senior Site Reliability Engineering (Microsoft Azure) Remote
Senior Site Reliability Engineering (Microsoft Azure) Description
We are seeking an experienced Senior Site Reliability Engineer who will focus on maintaining a large Data Platform on Microsoft Azure.
The ideal candidate will possess strong analytical skills and should have a background in managing system reliability, availability, and scalability in a demanding production environment.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
#LI-DNI
Responsibilities
- Set up and configure monitoring tools such as Data Dog
- Monitor system availability, reliability, and capacity
- Track application usage and its impact on production systems
- Scale the environment to meet evolving demands
- Create run books and other system troubleshooting documentation
- Oversee release and deployment of applications/components
- Manage incidents and adhere to change management processes
- Administer and deploy CI/CD tools including Git, Jira, GitLab, and Jenkins
- Develop infrastructure scripting solutions using PowerShell or Python
- Present and communicate architecture visually to stakeholders
- Maintain and enhance the Microsoft Azure platform to ensure optimal performance
Requirements
- Minimum of 3 years of experience as a Site Reliability Engineer
- Proficiency in setting up and configuring monitoring tools
- Expertise in capacity planning, scaling, and system troubleshooting
- Background in Release and Deployment management
- Familiarity with Incident and Change Management processes
- Competency in administering and deploying CI/CD tools
- Skills in infrastructure scripting with PowerShell or Python
- In-depth knowledge of Microsoft Azure
- Ability to effectively communicate technical concepts and architecture visually
- Excellent interpersonal skills with high emotional intelligence
Nice to have
- Knowledge of Microsoft Azure Data Factory and Databricks
We offer
- Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept)
- Medicina Prepaga (It covers the collaborator and direct family group)
- Paternity Leave (Two additional days are added to what is established by law, total of 4 days)
- Discounts card
- English Training (English lessons, twice per week)
- Training Program (Access to multiple customized training plans according to the needs of each role within the company)
- Marriage bonus (The company doubles the allowance established by law that ANSES offers)
- Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company)
- External Agreements and Discounts
- Vacations: 14 calendar days a year
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.