We are seeking a Senior Site Reliability Engineer to join our client’s team. Our client is a US financial services software provider.
Responsibilities
As a Senior Site Reliability Engineer, you will collaborate across teams to consistently enhance and maintain a scalable, reliable production environment for operating the next-generation capital markets platform for alternative investments.
• Champion a culture of site reliability: Instill best practices and a robust
reliability culture within your team.
• Drive technical excellence across your team, ensuring a high standard of reliability practices.
• Collaborate seamlessly with cross-functional teams to identify and resolve system or product reliability issues.
• Strengthen iownit’s tech stack and deployments by fortifying our observability and monitoring pipelines, scaling systems, and advancing our release process.
• Lead the investigation and resolution of incidents with a focus on proactive monitoring and automation, ensuring postmortem actions are implemented to prevent recurrence.
• Guarantee the stability of releases to client environments, upholding a seamless user experience.
• Collaborate with team members and stakeholders to define clear and measurable service level agreements (SLAs) and service level indicators (SLIs), and establish realistic and achievable service level objectives (SLOs).
• Document and share knowledge within the organization through internal forums and communities of practice.
• Showcase expertise in reliability, scalability, performance, security, enterprise system architecture, and toil reduction.
• Implement these practices within applications or platforms.
• You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative.
• You’ll also work alongside a professional infrastructure team, which provides a solid baseline for the SRE role.
• With a highly automated infrastructure in place, your focus will be on enhancing and building rather than starting from scratch.
• You won’t be alone in this journey.
Technical Requirements
• At least 2 years of experience supporting and maintaining AWS infrastructure (VPC, EC2, ECS, Cloudwatch, Security Hub, IAM, S3, etc).
• Proficient with infrastructure-as-code approach using CloudFormation (our main tool) or Terraform (acceptable alternative).
• Can configure consolidated logging, alarms, and dashboards (core aspect of the role).
• Can easily navigate CloudWatch (one of the main tools) logs through multiple application layers to identify the root cause of an issue.Strong experience with modern container orchestration systems such as AWS ECS (built on top of Docker Swarm), with Kubernetes knowledge as a plus.
• Good understanding of network and security-related configurations in AWS environments; AWS Certified DevOps Engineer is a big plus.
• Used to create and execute playbooks to remediate standard issues.Some experience with scripting or programming languages (Bash, Python, Typescript).
• Strong bias for action and ownership, with the ability to take initiative in solving issues.
Nice to Have
• Experience with AWS Site-to-Site VPN or other VPN services.
• Experience with large-scale distributed systems is very appreciated.
• Practical experience in the identification and mitigation of DDoS attacks together with cybersecurity teams.Can work with databases using SQL and performance monitoring tools.
Soluções Integradas para a melhor
performance das suas equipas.
+351 211 535 033 | [email protected]