Site Reliability Engineer Job at iCIMS, Holmdel, NJ

UzNHV2FpSHhXdXhFLzV3RTlYZlloQWtYaUE9PQ==
  • iCIMS
  • Holmdel, NJ

Job Description

Job Summary

We are seeking a skilled Engineer, Site Reliability (SRE) to contribute to the reliability, scalability, and performance of our multi-cloud SaaS platform serving thousands of customers worldwide. This role involves hands-on technical work in incident response, system monitoring, automation, and continuous improvement of our platform reliability. The successful candidate will work within a global SRE team to ensure optimal system performance and customer satisfaction.

Responsibilities

  • System Monitoring & Reliability:
    • Monitor multi-cloud infrastructure (AWS, Azure, GCP) using New Relic, Grafana, and Sumo Logic
    • Maintain reliability of AWS resources, Auth0/Okta authentication, databases, and legacy applications
    • Implement monitoring, alerting, and dashboards for assigned systems
  • Incident Management & Response:
    • Respond to alerts and incidents within SLA timeframes
    • Perform root cause analysis and document findings
    • Create and maintain runbooks and troubleshooting procedures
    • Participate in 24/7 on-call rotation
  • Automation & Improvement:
    • Develop scripts to reduce manual operational overhead
    • Build monitoring and alerting solutions
    • Support infrastructure-as-code initiatives
    • Implement automated remediation where possible
  • Success Metrics:
    • Customer Impact : Reduced MTTR and improved customer satisfaction scores
    • Reliability : Achievement of 99.9%+ uptime SLAs across all products and regions
    • Proactive Prevention: Reduction in incident frequency through automated detection and prevention
    • Cross-functional Collaboration: Improved partnership metrics with Product, Engineering, and Customer Success teams
    • Automation Delivery: Complete assigned automation projects to reduce manual tasks
    • Knowledge Sharing: Contribute to team knowledge base and mentor junior engineers

Qualifications

  • 4+ years experience in SRE, DevOps, or Infrastructure Engineering
  • Hands-on experience with AWS (required) and Azure (preferred)
  • Strong Linux system administration skills
  • Experience with monitoring tools (New Relic, Grafana, Prometheus)
  • Scripting skills in Python, Bash, or similar
  • Knowledge of databases (SQL Server, PostgreSQL, MongoDB)

Job Tags

Worldwide,

Similar Jobs

SPECTRIO LLC

Graphic and Motion Designer Job at SPECTRIO LLC

Graphic and Motion Designer Location Remote in Florida : Description: As a leading provider of comprehensive digital signage solutions, Spectrio empowers clients to transform their business locations into modern, dynamic destinations for customers and employees. Headquartered... 

Compass Group

Imaging Service Engineer II Job at Compass Group

 ...Join Intelas, a Compass One Healthcare company. Intelas, a Compass One Healthcare company, delivers smarter asset management by blending expert service teams with intelligent, data-driven strategies that help hospitals improve uptime, simplify oversight, and make more... 

TBi Airport Management Inc.

Airport Operations Coordinator Job at TBi Airport Management Inc.

 ...TBI Airport Management, Inc. Ontario International Airport Airport Operations Coordinator General Description: Under the...  ...Obtain and maintain security clearance as required by role and TSA regulations. Supplemental Information: Knowledge of:... 

Staffed4U

Junior Researcher / System Administrator Job at Staffed4U

 ...Location: Annapolis Junction, MD Clearance Required: TS/SCI with Poly Telework: No We are seeking a Junior Researcher / System Administrator to provide technical research, documentation, and IT support. This role is ideal for someone with a foundation in data... 

HealthCorps

Teens Make Health Happen Marketing & Communications Internship Job at HealthCorps

 ...committed toimprovinglivesby addressing health challenges incommunities through...  ...regional staff and other college interns. This internship will help you to supplement your current...  ...a diverse range of experiences in the public health and non-profit fields. The...