305 King St W
Suite 1100
Kitchener, ON N2G 1B9
Canada
Site Reliability Engineer Remote
Site Reliability Engineer Description
We are seeking a Site Reliability Engineer to join our team and independently handle complex tasks ranging from infrastructure improvements to development and deployment automation.
The ideal candidate should have a few years of experience, be capable of in-depth troubleshooting, and be proficient in resolving platform issues. This role offers the opportunity to engage in sprint planning and story grooming, providing insights into implementation complexities. Reporting to the Engineering Manager, this position is pivotal in maintaining optimal operational standards and enhancing our engineering practices.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
#LI-DNI
Responsibilities
- Investigate and troubleshoot platform issues
- Analyze, develop and enhance automation deployments independently
- Write scripts to automate various tasks
- Participate actively in sprint meetings and story estimations
- Monitor APM tools in production like Datadog
- Manage application log collection and analysis
- Oversee application and instance alerts related to site reliability
- Contribute to infrastructure architecture discussions
- Maintain applications and libraries essential to the platform
- Control code deployment servers and methods
- Mentor and assist other engineers
- Perform code reviews
Requirements
- 2+ years of experience in Site Reliability Engineer or a similar role
- Proficiency in TypeScript, NodeJS/NestJS, React Native
- Strong background in Python/Django and familiarity with PostgreSQL, Redis
- Competency in CircleCI, Spinnaker, Expo
- Experience in administrating production application workloads on AWS Cloud
- Understanding of public Cloud networks and VPC peering
- Skills in Cloud computing including EC2, SNS/SQS, RDS
- Knowledge of containers and orchestration using Docker, Kubernetes, EKS
- Experience with provisioning and configuration management tools like Terraform, Ansible
- Linux or Windows server administration skills
- Showcase of ability to integrate monitoring, logging, and alerting into built systems
- Capable of debugging complex issues collaboratively
- Experience working within HIPAA compliance and other standards
- Flexibility to learn quickly and adapt to changing requirements
Nice to have
- Familiarity with monitoring tools like Datadog
- Knowledge of scripting languages such as Python, Groovy, Powershell, or Ruby
- Background in or passion for working in health services
We offer
- Connectivity Bonus (15,000 ARS are paid with a salary receipt at the end of each month as a non-wages concept)
- Medicina Prepaga (It covers the collaborator and direct family group)
- Paternity Leave (Two additional days are added to what is established by law, total of 4 days)
- Discounts card
- English Training (English lessons, twice per week)
- Training Program (Access to multiple customized training plans according to the needs of each role within the company)
- Marriage bonus (The company doubles the allowance established by law that ANSES offers)
- Referral Program (Referral bonus is paid when the referral of a collaborator joins the Company)
- External Agreements and Discounts
- Vacations: 14 calendar days a year
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.