Get an overview of the steps we take to help keep your platform up and running
At Expansive, we know reliable IT operations are the backbone of our platform’s stability and your business continuity. We follow best practices to ensure seamless day-to-day operations, including regular backups, monitoring scheduled jobs, and a robust incident response process.
Key Components of Our IT Operations
1. Regular Backups
Data integrity and availability are critical to our service. To safeguard your data, we take regular backups as part of our IT operations strategy.
- Backup Frequency: We perform automated backups daily, including all critical customer data and system configurations.
- Backup Storage: Backups are securely stored in multiple, geographically distributed, encrypted environments to ensure resilience against data loss.
- Testing Backups: We routinely test our backup restoration processes to confirm data can be reliably restored if needed. Each client will have their data tested at least annually.
See more details of backups here.
2. Monitoring Scheduled Jobs
Scheduled jobs, such as automated tasks and data processing workflows, play a vital role in the smooth operation of our platform. We employ advanced tools to monitor these tasks in real-time.
- Job Types Monitored: Includes database maintenance, report generation, email delivery, and system health checks.
- Automated Monitoring: All scheduled jobs are tracked through monitoring tools that alert us to delays, failures, or abnormal execution times.
- Performance Metrics: We capture key metrics like execution duration, resource utilization, and error rates to identify and resolve issues proactively.
Handling Issues in Scheduled Jobs If a scheduled job fails, our system generates an alert, and our IT team investigates the issue immediately. Based on severity, the issue is escalated to ensure timely resolution.
3. Incident Response Process
Despite robust systems, incidents such as job failures or system anomalies may occur. Our incident response (complimenting our Disaster Recovery plans) process is designed to address these situations efficiently and transparently.
Our Incident Response Workflow:
- Detection: Automated monitoring tools identify and log incidents in real-time.
- Prioritization: Incidents are classified based on impact and urgency, ensuring critical issues are resolved first.
- Communication: Customers are promptly notified of incidents affecting their operations, including progress updates and expected resolution times.
- Resolution: The IT team investigates root causes, applies fixes, and monitors the system to confirm normal operations.
- Post-Incident Review: After resolution, we conduct a review to understand the cause and prevent recurrence, documenting outcomes for continuous improvement.
Example Incident Handling
If a scheduled backup fails, our monitoring system generates an alert. The incident is immediately escalated to our IT team, who:
- Assess the failure's root cause (e.g., connectivity issue or disk space).
- Take corrective action to rerun the backup or resolve the issue.
- Verify backup integrity and provide updates to affected stakeholders.
How This Benefits You
Our comprehensive approach to IT operations ensures:
- Data Safety: Regular, verified backups safeguard your business-critical information.
- Operational Stability: Proactive monitoring of scheduled jobs minimizes disruptions.
- Fast Recovery: Incidents are handled swiftly to restore services with minimal downtime.
By combining proactive monitoring, robust backup practices, and a structured incident response process, we maintain a secure, reliable, and resilient platform to support your operations.