Senior Site Reliability Engineer
We are working with a really exciting Software company that has developed an impressive digital platform to pioneer and transform the world of diamonds through the use of blockchain! The business was
We are working with a really exciting Software company that has developed an impressive digital platform to pioneer and transform the world of diamonds through the use of blockchain! The business was founded in 2018 and has had lots of praise from the likes of Forbes over the last few years. Our client offers a start-up environment, but is part of a much larger group, and is very well funded.
We are looking for a Senior Site Reliability Engineer with experience in chaos testing/engineering to help the existing team to continue with their success and growth plans. This role will work as part of a multi-disciplinary team to build solutions for their leading edge platform that is providing end to end traceability of diamonds across the value chain.
The role will report into the DevOps Lead, and will work with Software Developers, Data Scientists, Blockchain Developers, and Support/Operations teams.
Responsibilities
- Continually assess, collect, and automate impact points across the stack to provide real-time insights into error rates, performance degradation and MTTRs
- Develop error budgets, SLOs and alerting
- Take ownership of disaster recovery plans and regularly verify
- Collaborate on Terraform code to enable best practices, compliance, and security
- Champion Observability, Monitoring, and Alerting
- Automation
- Chaos Testing / Engineering
Criteria
- Hands-on experience working in AWS
- Production experience running kubernetes (EKS) clusters
- Creating SLO’s and error budgets
- Scritping languags as it related to automation (Python)
- Extensive exposure to tools like Chaos Monkey, Chaoskube, or Gremlin
- Experience creating hypotheses for chaos experiments and running game days
- Deep understanding of cloud native deployment
- Solid understanding of modern SRE techniques and deployment of reactive operations
- Previous experience within a Start-up / Tech culture
Core Stack
-
AWS
- Kubernetes
- PostgresSQL Aurora RDS
- Sentry
- PagerDuty
- Elastic Cloud
- Terraform
- Gitlab
- Python, Go, Rust, NodeJS
This is a fantastic opportunity to join a leading edge start-up business which is a part of Global group, with plenty of funding. If you would like to learn more, please reach out to jared.wolfaardt@robertwalters.com for a confidential discussion.
Robert Walters Operations Limited is an employment business and employment agency and welcomes applications from all candidates
We are working with a really exciting Software company that has developed an impressive digital platform to pioneer and transform the world of diamonds through the use of blockchain! The business was founded in 2018 and has had lots of praise from the likes of Forbes over the last few years …