Location: ALEXANDRIA, VA (Onsite)
Duration: Long Term Contract
Essential Skills:
test lead, senior support engineer, a bit of change manager with some expertise of devOps and security.
Role Description:
The Software Support Engineers will be responsible for a comprehensive range of monitoring and checking activities on newly deployed systems. These responsibilities are categorized as follows
2.1 Pre-Deployment Collaboration – Collaborate with Engineering, Software Development and DevOps teams to understand the architecture and dependencies of the new system.
Review deployment plans and identify potential post-deployment risks.
2.2 Post-Deployment Monitoring and Health Checks – System Health Monitoring Continuously monitor key system metrics using automated tools or, in some instances, performing daily checks
Error Rate Analysis Monitor application and system logs for errors, with a focus on identifying new or recurring issues post-deployment.
Dependency Verification Perform checks to ensure that all system dependencies, such as databases, caches, and third-party APIs, are functioning correctly. Configuration Validation Verify that all system configurations are correctly applied in the production environment. Security and Compliance Checks Conduct initial security and compliance scans to identify any immediate vulnerabilities or misconfigurations. 2.3 Incident Management and Reporting – Act as the first point of contact for alerts and anomalies detected in newly deployed systems.
Perform initial troubleshooting to diagnose and resolve issues.
Escalate complex problems to the appropriate development or operations teams.
Document all incidents, including their resolution, to build a knowledge base.
Provide regular reports on system health and performance to stakeholders.
Ensure the change management process is being followed by the team.
2.4 Automation and Tooling – Utilize and enhance existing monitoring dashboards and alerting systems.
Develop and maintain scripts to automate repetitive monitoring tasks and checks. Python proficiency is highly desirable for this task.
2.5 Knowledge Management and Process Improvement – Review existing operational processes and technical documentation for accuracy, clarity, and completeness.
Identify gaps and author new documentation and standard operating procedures (SOPs).
Propose and contribute to changes in the incident management, monitoring, and deployment processes to improve efficiency and system reliability along with documentation
Maintain a centralized and up-to-date knowledge base for supported systems.
Provide the after-action report after each gantry deployment
|
Thanks & regards, Sonu Chauhan Sr. Technical Recruiter |
|
571-678-0927 |