Notifications

Are you open to new opportunities?

Upload your resume to increase your chances of getting noticed.

TekChronicles
Lead Site Reliability Engineer
Jersey City, NJ
$70.00 Per Hour (Employer provided)
Easy Apply
We are looking for someone who can go deep into application architecture, workflows, design, code behavior, infrastructure dependencies, and production support……
24h
ATG USA
3.3
DevOps Engineer
Remote
Easy Apply
You will focus on modular design and remote state management to keep our multi-cloud footprint organized and reproducible.…
24h
devops engineer jobs in Remote Remote jobs ATG USA jobs in Remote
ATG USA salaries
Employee reviews at ATG USA ATG USA interview experience Working at ATG USA
iRhythm Technologies
3.3
Sr. Integration Engineer II
San Francisco, CA
$138K - $180K (Employer provided)
Easy Apply
Bachelor's degree or Master's degree in Computer Science, Engineering, or related technical field, or 10 years of equivalent experience. What You Will Be Doing.…
24h
integration engineer jobs in San Francisco, CA San Francisco, CA jobs iRhythm Technologies jobs in San Francisco, CA
iRhythm Technologies salaries integration engineer salaries in San Francisco, CA
Employee reviews at iRhythm Technologies iRhythm Technologies interview experience Working at iRhythm Technologies
Beyond Finance
3.6
Senior DevOps Engineer
Chicago, IL
$120K - $160K (Employer provided)
Easy Apply
Automate infrastructure provisioning and configuration using tools like Terraform or CloudFormation. We leverage best in class tools, and press to eliminate……
24h
devops engineer jobs in Chicago, IL Chicago, IL jobs Beyond Finance jobs in Chicago, IL
Beyond Finance salaries devops engineer salaries in Chicago, IL
Employee reviews at Beyond Finance Beyond Finance interview experience Working at Beyond Finance
FanDuel
3.8
Senior Observability Engineer
Atlanta, GA
$149K - $186K (Employer provided)
Easy Apply
Good communication and collaboration skills, with the ability to work effectively with both technical and non-technical stakeholders.…
24h
engineer jobs in Atlanta, GA Atlanta, GA jobs FanDuel jobs in Atlanta, GA
FanDuel salaries engineer salaries in Atlanta, GA
Employee reviews at FanDuel FanDuel interview experience Working at FanDuel
SRE Building Associates
Construction Project Manager (3+ yrs. exp req)
Vail, CO
$100K (Employer provided)
Easy Apply
Company-provided cell phone and iPad. SRE Building Associates is seeking an experienced *Project Manager* to oversee construction projects from initial……
24h
Vantor
3.1
GIS DevOps Engineer
Herndon, VA
$145K - $215K (Employer provided)
Easy Apply
Degree or equivalent demonstrated experience in a technical field. CompTIA Security+ or comparable certification for privileged user access.…
30d+
Ron Turley Associates
4.1
Database Engineer/Architect
Glendale, AZ
$155K - $165K (Employer provided)
Easy Apply
Bachelor's Degree not required but preferred, especially in computer science or a related field. I'm the Manager of Infrastructure, Cloud Operations and SRE at……
7d
Navy Federal Credit Union
3.6
Platform Engineer
Pensacola, FL
$78K - $123K (Employer provided)
BS degree in Computer Science or a related technical field or equivalent experience. Virtual Server Engineering uses Site Reliability Engineering (SRE)……
24h
CardWorks
3.2
Lead Site Reliability Engineer
Pittsburgh, PA
$146K - $162K (Employer provided)
Easy Apply
Participates in incident and problem management by serving as incident coordinator for high-severity events, driving cross-functional responses, conducting……
30d+
William O'Neil Securities
Senior Monitoring & Observability Engineer, Los Angeles
Los Angeles, CA
$115K - $125K (Employer provided)
Easy Apply
Please answer yes or no and briefly describe your experience with incident response, outage troubleshooting, escalation support, root cause analysis, or post-……
13d
United States Cold Storage
3.5
Software Development Manager - Modernization
Camden, NJ
$160K - $180K (Employer provided)
Easy Apply
You will collaborate with architecture, site reliability, QA, and platform teams to drive alignment, contribute to US Cold's technical standards, and own team……
30d+
Sterling St. James
4.0
Director of Cloud Operations
Wauwatosa, WI
$165K (Employer provided)
Easy Apply
Up to $165,000 Base Salary • Milwaukee, WI Area • Hybrid (3 Days Onsite / 2 Days Remote) Join a global pioneer in industrial machine software and AI automation……
20d
InsureMyTrip
Principal Software Engineer
Warwick, RI
$131K - $197K (Employer provided)
Easy Apply
Manage and develop a small team of engineers, driving accountability, ownership, and consistent delivery standards.…
30d+
Epitec
3.3
Cloud Database Engineer
Dearborn, MI
$65.00 - $75.00 Per Hour (Employer provided)
Easy Apply
This role extends beyond traditional database administration, acting as a technical leader and pattern owner to shape the vision and roadmap while embedding AI/……
4d
ASRC Federal Holding Company
3.7
Junior Platform Engineer
Dayton, OH
$74K - $118K (Glassdoor est.)
Easy Apply
Experience: Minimum 1 year of professional experience in DevOps, platform engineering, infrastructure operations, or related field; recent graduates with……
30d+
William O'Neil Securities
Senior Monitoring & Observability Engineer, Los Angeles
Plano, TX
$110K - $120K (Employer provided)
Easy Apply
Please answer yes or no and briefly describe your experience with incident response, outage troubleshooting, escalation support, root cause analysis, or post-……
13d
System One
3.8
Site Reliability Engineer (SRE)
McLean, VA
$107K - $152K (Glassdoor est.)
Easy Apply
Experience mentoring technical teams or helping promote DevOps/SRE practices across engineering groups. Support system design reviews to identify reliability……
24h
Synchrony
4.1
AVP, Reliability Engineer - OnePay
Alpharetta, GA
$100K - $170K (Employer provided)
Best Led Companies
Experience driving reliability improvements through resiliency patterns, performance tuning, and operational readiness practices in partner-integrated……
20d
Navy Federal Credit Union
3.6
Platform Engineer
Vienna, VA
$78K - $123K (Employer provided)
BS degree in Computer Science or a related technical field or equivalent experience. Virtual Server Engineering uses Site Reliability Engineering (SRE)……
24h
HTC Global Services
3.2
Senior Site Reliability Engineer – Observability, SLOs & Kubernetes Reliability
Celebration, FL
$92K - $136K (Glassdoor est.)
Easy Apply
Contribute to chaos engineering and resilience testing efforts using tools such as Gremlin, Harness Chaos Engineering, or similar.…
24h
Axle Informatics
3.6
Computer Programmer
Rockville, MD
$75K - $80K (Employer provided)
Easy Apply
With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower……
30d+
Tentek
4.1
Linux Systems Engineer
Glendale, CA
$95.00 - $100.00 Per Hour (Employer provided)
Easy Apply
Systems Reliability Engineers use a software engineering approach to architect, design, automate, monitor, and build applications at scale.…
30d+
Tentek
4.1
Senior Site Reliability Engineer
Glendale, CA
$95.00 - $100.00 Per Hour (Employer provided)
Easy Apply
Systems Reliability Engineers use a software engineering approach to architect, design, automate, monitor, and build applications at scale.…
30d+
The Maxis Group
3.8
DevOps Engineer
Austin, TX
$104K - $151K (Glassdoor est.)
Easy Apply
We need to hire a DevOps engineer to scale our deployments across AWS, Azure, GCP, Oracle Cloud, and on-premises environments. Infrastructure as code with Helm.…
30d+
System One
3.8
Senior Data Dog Cloud Engineer (Observability)
United States
$63.00 Per Hour (Employer provided)
Easy Apply
Instrument applications and services using agents, OpenTelemetry, and language-specific APM tooling (Java, . 8+ years in infrastructure/platform engineering (or……
6d
Prosper.com
2.7
Sr. Site Reliability Engineer
San Francisco, CA
$163K - $203K (Employer provided)
Easy Apply
Bachelor's degree in a technical field, or equivalent work experience. Strong written communication — your documentation will be consumed by humans and AI……
30d+
VoltaGrid
3.8
DevOps & Site Reliability Engineer
Houston, TX
$107K - $159K (Glassdoor est.)
Easy Apply
You'll work closely with engineering teams to build scalable, observable, and resilient infrastructure while driving a culture of operational excellence.…
30d+
Software Resources
4.6
Senior Software Engineer
Lake Buena Vista, FL
$103K - $143K (Glassdoor est.)
Easy Apply
2 5 years of experience in software engineering or related technical field. Implement technical design for an enterprise-scale network automation / AI platform……
15d
CGI
3.7
Forward Deployed Engineer
Arlington, VA
$90K - $218K (Employer provided)
This is a client-facing engineering role for builders who can translate ambiguous needs into working software, iterate quickly with end users, and drive……
21d

TekChronicles

Lead Site Reliability Engineer

Jersey City, NJ

$70.00 Per Hour (Employer provided)

Is your resume a good match?

Use AI to find out how well the skills on your resume fit this job description.

Lead Site Reliability Engineer

We are looking for a highly experienced Lead Site Reliability Engineer to drive SRE outcomes for business-critical applications in the Risk Technology space. This role requires a strong application, infrastructure, and engineering mindset, with the ability to work closely with application support, development, observability, and technology teams to improve reliability, resiliency, operational readiness, and automation maturity.

The ideal candidate will be responsible for defining and enabling SRE goals, establishing reliability requirements, evaluating SLAs, SLOs, SLIs, and error budgets, identifying critical user journeys, and ensuring that business-critical applications are supported with the right monitoring, alerting, automation, and operational practices.

This is not a traditional DevOps role. We are looking for someone who can go deep into application architecture, workflows, design, code behavior, infrastructure dependencies, and production support challenges, while helping teams mature their reliability engineering practices.

Key Responsibilities

Lead SRE enablement for business-critical applications across Risk Technology.
Partner closely with application support, development, infrastructure, and observability teams to improve reliability and resiliency.
Define SRE priorities, goals, standards, and measurable outcomes for application teams.
Establish and evaluate SLAs, SLOs, SLIs, error budgets, and service health indicators.
Identify and document critical user journeys, application dependencies, failure points, and recovery expectations.
Drive observability improvements by ensuring the right monitors, alerts, dashboards, logs, traces, and metrics are in place.
Review application architecture, workflows, design patterns, and production support processes to identify reliability gaps.
Support code-level analysis, code review discussions, and engineering recommendations from an SRE perspective.
Improve incident management, post-incident reviews, root cause analysis, runbooks, and operational readiness.
Build automation using Python, Ansible, and Terraform to reduce manual effort and improve operational efficiency.
Leverage Amazon/AWS products, including AI-based solutions, to improve SRE efficiency, automation, monitoring, and operational outcomes.
Help application teams adopt industry-standard SRE practices inspired by mature engineering organizations.
Work hands-on with teams to improve production stability, resiliency, scalability, and supportability.

Required Skills and Experience

Minimum 10 years of experience in SRE, production engineering, application reliability, infrastructure engineering, or related technology roles.
Strong understanding of SRE principles, including SLIs, SLOs, SLAs, error budgets, toil reduction, incident management, and reliability engineering.
Deep experience supporting business-critical applications in production environments.
Strong application architecture knowledge with the ability to understand design, workflows, dependencies, and failure scenarios.
Hands-on experience with Python automation.
Hands-on experience with Ansible and Terraform automation.
Strong knowledge of observability practices, including metrics, logs, traces, dashboards, alerting, and service health monitoring.
Ability to partner with application support and development teams to improve reliability from both operational and engineering perspectives.
Strong understanding of cloud, infrastructure, networking, databases, middleware, and application runtime environments.
Experience reviewing code, supporting code quality discussions, and identifying reliability risks in application changes.
Strong problem-solving skills with the ability to deep dive into complex technical issues.
Excellent communication skills with the ability to translate technical risks into business-impacting outcomes.

Preferred Qualifications

Experience in financial services, banking, risk technology, regulatory platforms, or other high-criticality environments.
Exposure to AWS/Amazon services and AI-enabled automation or operational intelligence capabilities.
Experience with Prometheus, Grafana, OpenTelemetry, Splunk, Datadog, Dynatrace, New Relic, or similar observability platforms.
Knowledge of Kubernetes, OpenShift, containers, CI/CD pipelines, and modern distributed systems.
Experience building reliability scorecards, operational readiness reviews, service maturity assessments, and production support standards.
Strong understanding of resiliency patterns, failover, disaster recovery, capacity planning, and performance engineering.

Ideal Candidate Profile

The ideal candidate is a hands-on SRE leader who can think like an engineer, operate like a production owner, and partner like a trusted advisor to application teams. They should be comfortable going deep into application behavior, understanding business workflows, challenging reliability gaps, and enabling practical SRE outcomes that improve stability, resiliency, and operational excellence.

Pay: $70.00 per hour

Work Location: In person

Base pay

The minimum salary is $70.00 and the max salary is $70.00.

$70.00/hr (Employer provided)

Jersey City, NJ

If an employer includes a salary or salary range on their job, we display it as "Employer Provided". If a job has no salary data, Glassdoor displays a "Glassdoor Estimate" if available. To learn more about "Glassdoor Estimates," see our FAQ page.

Conversations @TekChronicles

Kick off the conversation by asking about salaries, interviews or anything else @TekChronicles.

Explore more Bowls

Find your happy place

Read authentic reviews with a Glassdoor account. Only apply to jobs you love.

TekChronicles

Lead Site Reliability Engineer

Jersey City, NJ

$70.00 Per Hour (Employer provided)

TekChronicles

Lead Site Reliability Engineer

Jersey City, NJ

$70.00 Per Hour (Employer provided)

Is your resume a good match?

Use AI to find out how well the skills on your resume fit this job description.

Lead Site Reliability Engineer

Key Responsibilities

Lead SRE enablement for business-critical applications across Risk Technology.
Partner closely with application support, development, infrastructure, and observability teams to improve reliability and resiliency.
Define SRE priorities, goals, standards, and measurable outcomes for application teams.
Establish and evaluate SLAs, SLOs, SLIs, error budgets, and service health indicators.
Identify and document critical user journeys, application dependencies, failure points, and recovery expectations.
Drive observability improvements by ensuring the right monitors, alerts, dashboards, logs, traces, and metrics are in place.
Review application architecture, workflows, design patterns, and production support processes to identify reliability gaps.
Support code-level analysis, code review discussions, and engineering recommendations from an SRE perspective.
Improve incident management, post-incident reviews, root cause analysis, runbooks, and operational readiness.
Build automation using Python, Ansible, and Terraform to reduce manual effort and improve operational efficiency.
Leverage Amazon/AWS products, including AI-based solutions, to improve SRE efficiency, automation, monitoring, and operational outcomes.
Help application teams adopt industry-standard SRE practices inspired by mature engineering organizations.
Work hands-on with teams to improve production stability, resiliency, scalability, and supportability.

Required Skills and Experience

Minimum 10 years of experience in SRE, production engineering, application reliability, infrastructure engineering, or related technology roles.
Strong understanding of SRE principles, including SLIs, SLOs, SLAs, error budgets, toil reduction, incident management, and reliability engineering.
Deep experience supporting business-critical applications in production environments.
Strong application architecture knowledge with the ability to understand design, workflows, dependencies, and failure scenarios.
Hands-on experience with Python automation.
Hands-on experience with Ansible and Terraform automation.
Strong knowledge of observability practices, including metrics, logs, traces, dashboards, alerting, and service health monitoring.
Ability to partner with application support and development teams to improve reliability from both operational and engineering perspectives.
Strong understanding of cloud, infrastructure, networking, databases, middleware, and application runtime environments.
Experience reviewing code, supporting code quality discussions, and identifying reliability risks in application changes.
Strong problem-solving skills with the ability to deep dive into complex technical issues.
Excellent communication skills with the ability to translate technical risks into business-impacting outcomes.

Preferred Qualifications

Experience in financial services, banking, risk technology, regulatory platforms, or other high-criticality environments.
Exposure to AWS/Amazon services and AI-enabled automation or operational intelligence capabilities.
Experience with Prometheus, Grafana, OpenTelemetry, Splunk, Datadog, Dynatrace, New Relic, or similar observability platforms.
Knowledge of Kubernetes, OpenShift, containers, CI/CD pipelines, and modern distributed systems.
Experience building reliability scorecards, operational readiness reviews, service maturity assessments, and production support standards.
Strong understanding of resiliency patterns, failover, disaster recovery, capacity planning, and performance engineering.

Ideal Candidate Profile

Pay: $70.00 per hour

Work Location: In person

Base pay

The minimum salary is $70.00 and the max salary is $70.00.

$70.00/hr (Employer provided)

Jersey City, NJ

Conversations @TekChronicles

Kick off the conversation by asking about salaries, interviews or anything else @TekChronicles.

Explore more Bowls

Find your happy place

Read authentic reviews with a Glassdoor account. Only apply to jobs you love.

Are you open to new opportunities?

3,115 Engineer - SRE jobs in United States

Base pay

Conversations @TekChronicles

Find your happy place

Base pay

Conversations @TekChronicles

Find your happy place