Site Reliability Engineer in New York, NY at Open Systems Technologies

Date Posted: 9/3/2020

Job Snapshot

Job Description

This role will not sponsor candidates.

This role is only open to candidates with at least five years with post-grad experience who are able to start working immediately.

A top alternative investment fund is seeking a Site Reliability Engineer to join their team in New York. This candidate will join a newly-created internal development group that will help develop cloud-native applications and deliver them as Software-as-a-Service. He/She will be working as part of a cross-functional Cloud Operations team and will be responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning of our services. 

You will closely collaborate with the Product, Application Development, Engineering and Security teams and is an embedded member of application specific teams during the build and deployment phases actively contributing to the development process. You will work with a technology stack that includes KMS/HSM, core AWS services (EC2/S3/ELB…), Code [Commit | Deploy | Pipeline], Cloud Formation, Jenkins, Fargate, Docker, Ansible, Aurora and RDS [ Mysql | Postgres | Oracle ], Glue, Tb scale Redshift, Dynamo, Batch and Step Functions, Lambda, Cognito and Machine Learning.


  • Operational management of product and application suites following concepts and ideas from Google’s Site Reliability Engineering (SRE).
  • Incident management, response and support under our mature incident management framework utilizing follow-the-sun methodology.
  • Drive Well-Architected and Application In Service Reviews for new applications, with a focus on the reliability and operational excellence pillars.
  • Software development of shared job and workflow control solution utilizing AWS Batch, Step Functions and Lambda.
  • Software development during build and rollout phases utilizing project specific languages including Node.js, Java and Python. 
  • Provide standardized offerings to facilitate the successful deployment of stacks including Continuous build, test, integration, and deployment platforms and pipelines


  • Undergraduate degree in computing or related area, Master's is a plus
  • 5+ years experience in Software development, Engineering or Operations operationalizing and preferably supporting highly available and scalable applications
  • 5+ years experience with professional software development and SDLC with one or more of the following C, C++, Java, Node.Js, SQL, JavaScript or similar programming languages
  • 5+ years experience with at least one of the following programming and scripting languages: Ruby, Go, Python, Perl, bash, ksh
  • 5+ years hands on experience with at least three of the following areas:
    • Infrastructure-as-a-service platforms: AWS, Google Compute Engine, Azure, Soft Layer, Linux OpenStack, etc.
    • Configuration management and automation tools such as: Chef, Puppet, and Ansible
    • Orchestration template technologies such as: OpenStack Heat, AWS Cloud Formation, Azure Resource Manager, Google Cloud Deployment Manager, and Hashicorp Terraform
    • Development using Github or Bitbucket
    • Containers and container scheduling and management platforms such as: Docker, rkt, Mesos, or Kubernetes
    • Managing traditional enterprise platforms for compute, network, and storage
    • Managing traditional enterprise platforms for application runtimes, integration middleware, and relational databases
  • Cloud certification from AWS, Google or Azure experience 
  • Linux certification (LPIC-2, LFCE, RHCE) 
  • Scrum leadership and agile development experience
  • Expertise with hedge funds, investor relations, private equity and / or real estate
Job category:
  • Information Technology
Job keywords:
  • Reliability Engineer
  • C++
  • Scripting