Find your perfect job.

Back to Job Search Results

Site Reliability Engineer

Date Posted: 5/25/2023

Job #1628712
Houston, Texas

Our Information Systems group builds and maintains over 100 applications that enable data-driven, real-time decisions across the company.  As a Site Reliability Engineer, you will be responsible for maintaining and improving the reliability of several mission critical applications to be running in 24 x 7 x 365 manner within the data center.

•            Responsible for monitoring and maintaining the agreed upon uptime SLAs for the mission critical applications.
•            Document Downtime Incidents, Downtime Response, meet with various teams to identify root cause and establish remediation  procedures
•            Understand, execute and document complex production execution platforms with the goal of triaging downtime incidents and developing a documented triage process.
•            Proactively monitor health of software and hardware
•            Design and develop solutions that support high availability, reliability, and security for existing applications
•            Develop and maintain automation tools and processes for deployment, configuration, monitoring, and alerting of existing systems
•            Collaborate with development and operations teams to identify and resolve software and infrastructure issues
•            Identify and influence/implement best practices for system scalability, security, and performance
•            Conduct research and development to identify new tools and technologies that can improve existing software systems
•            Participate in on-call rotation to respond to critical incidents and ensure system availability and reliability

•            Bachelor's degree in Computer Science, Computer Engineering, or related field
•            5+ years of experience in software engineering or site reliability engineering
•            Proficiency in Linux
•            Knowledge of containerization and container orchestration technologies such as Docker and Kubernetes
•            Experience debugging production systems using instrumentation and monitoring
•            Experience with observability platforms
•            Development experience with Python is preferred

Apply Now

Accepted file types are DOC, DOCX, PDF, HTML, and TXT.

Mandatory questions are indicated. All other questions are optional. I agree that any sensitive personal information I voluntarily provide in response to optional questions will be handled in accordance with the Global Privacy Policy. I am not a citizen of, ordinarily resident, or physically located in Cuba, Iran, North Korea, Syria, or the Crimea, Donetsk, or Luhansk regions of Ukraine nor ordinarily resident or physically located in the Russian Federation. I understand that I can withdraw this consent at any time by contacting