Jobs at Waypoint Human Capital

Site Reliability Engineer

Annapolis Junction, MD · Information Technology

Position Type: On-Site
Location: Annapolis Junction, MD
Clearance: Active TS/SCI w/ Poly

Description:
Waypoint’s client is seeking a highly skilled Site Reliability Engineer with extensive development and system administration experience with large systems. The ideal candidate will leverage their expertise to design and implement automation solutions that enhance monitoring and system administration teams. The goal is to streamline tasks that are risky, prone to errors, labor-intensive, time-consuming, or repetitive. These tasks may involve following existing SOPs or developing new ones to ensure consistent execution.

Responsibilities:

Develop and implement automation tools to support monitoring and system administration, reducing the risk and labor associated with manual tasks.
Create sustainable tools that match or exceed the performance of manual methods, enhancing the efficiency and reliability of operations.
Provide technical direction for system monitoring, health checks, and performance optimization.
Work on tasks with varying complexity, including those requiring development of GUIs for easier cluster management and automation.
Formulate and enforce Standard Operating Procedures to ensure consistency and reliability in system administration tasks.
Utilize knowledge of tools like SALT and PUPPET to determine the best automation approach for different tasks.

Requirements:

Bachelor's Degree in Computer Science or a related technical field (equivalent to two years of experience).
Master's Degree in a Technical Field (equivalent to four years of experience).
Degrees in Mathematics, Information Systems, Engineering, or similar fields will be considered.
AWS Certified Solutions Architect - Professional.
AWS DevOps Engineer - Professional.
Fourteen (14) years in software development/engineering, covering requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution.
Ten (10) years in system engineering/architecture.
Ten (10) years with products supporting highly distributed, massively parallel computation (e.g., Hbase, Hadoop, Accumulo, Big Table, Cassandra, Scality).
Ten (10) years writing software scripts for automation using languages like Perl, Python, or Ruby.
Four (4) years managing and monitoring large Cloud Systems.
Experience in providing technical direction for the development, engineering, interfacing, integration, and testing of complete hardware/software systems, including postmortem analysis and incident management.
Active TS/SCI security clearance with a current full scope polygraph.

Desired:

Proficient in developing automation tools to improve system administration efficiency.
Experienced in handling and optimizing large-scale, distributed systems.
Skilled in writing and maintaining scripts for software automation.

Capable of providing technical direction and improving organizational processes.
Able to work effectively in a fast-paced, dynamic environment with shifting priorities.

Jobs at Waypoint Human Capital

Site Reliability Engineer

Share This Job