Please scroll down, To apply

Site Reliability Engineer with Security Clearance

hiring now

Apex Systems

2024-11-08 07:43:10

Job location Chantilly, Virginia, United States

Job type: fulltime

Job industry: I.T. & Communications

Job description

We are seeking talented professionals to join our successful and growing team in building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency's (CISA) dynamic approach to strengthening the cybersecurity of Federal networks and systems through better awareness and visibility into their security posture and cyber threats. The CDM Data Services product is an integrated suite of multiple Commercial Off the Shelf (COTS) products, software configuration packages, and custom code which work together to operate as an integrated solution tailored to meet Department of Homeland Security (DHS) requirements. Seeking a talented Site Reliability Engineer (SRE) to play a key role in defining, implementing, and growing our SRE practice to ensure the reliability, availability, and performance of our critical production environments. The SRE will contribute to a culture of continuous improvement, identifying areas for enhancement, and driving initiatives to improve system reliability, scalability, and efficiency. The successful candidate will have demonstrated hands-on experience designing, implementing, and maintaining solutions to ensure that systems, including infrastructure and applications, are resilient, highly available, and performant. The SRE will also play a critical role in defining and measuring the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our solution. The SRE will be responsible for setting up comprehensive logging, monitoring, and alerting solutions using the Elastic stack and other tools as necessary to ensure the continuous performance of services. Additionally, they will respond to incidents, perform root cause analyses, and implement solutions to prevent recurrences. The Journeyperson SRE will work in close collaboration with other SRE team members, developers, testers, infrastructure engineers, DevOps engineers, and other stakeholders to integrate reliability and observability into the software development lifecycle. Required Skills
US citizenship with ability to obtain Public Trust Suitability
4+ years of experience as a Site Reliability Engineer (SRE) or equivalent
4+ years of demonstrated experience designing, implementing, and maintaining observability solutions to include logging, monitoring, and alerting
4+ years of hands-on experience with SRE tools (e.g., Elastic, Prometheus, Grafana, Splunk, etc.)
2+ years defining and measuring SLOs and SLIs
2+ years of relevant experience using cloud platforms (AWS GovCloud preferred)
2+ years of hands-on programming or scripting (e.g., Python, Bash, etc.)
Strong knowledge of microservices, containerization, and orchestration tools (Docker, Kubernetes)
Proven ability to collaborate with cross-functional teams (development, testing, and product) to integrate reliability and observability into the software development lifecycle
Strong problem-solving and analytical skills
Proactive, detail-oriented approach to identifying inefficiencies and implementing improvements Desired Skills
Bachelor's degree in Computer Science, Engineering, or a related field (or 4 additional years of related experience)
Experience working in an Agile/SAFe environment using ALM tools (Jira, Confluence, or similar)
Strong understanding of CI/CD principles and platforms (Jenkins, CircleCI, GitLab, GitHub Actions, Argo, Travis CI, etc.)
Expertise in configuration management tools (Ansible, Puppet, Chef)
Experience with infrastructure as code (Terraform, CloudFormation)
In-depth understanding of networking, security, and system administration of Linux operating systems
Knowledge of version control platforms and branching strategies
Knowledge of disaster recovery planning, backup strategies, and data replication
Experience supporting large Federal programs ($200M+)

Inform a friend!

<!– job description page –>
Top