What is failover for business continuity?
Network disruptions that are caused by cyberattacks, natural disasters, and other challenges are a big risk to business operations. What can you do to protect your business from this risk? When you have a failover strategy, the answer is “plenty." A failover helps you avoid system downtime and the revenue loss that goes with it. But what is failover?
What is failover?
Failover is the process of automatically switching to a redundant backup system when the primary equipment or system fails.
Failover systems monitor software, networks, and hardware (including servers, storage devices, network switches, and routers) and initiate the failover process when any of the system components fail or show signs of reduced performance. When systems are in failover mode, applications continue to function normally or close to normal. For you, the transition should be seamless, with zero to minimal downtime.
Failover happens in a data center or in the cloud. Changing a network system configuration, like adding a server, can also trigger failover.
As an example, imagine that a hurricane causes a power outage and the data center’s server shuts down. Failover, which can be automatic or manually activated, switches production and data backup to a data center in another geographical location.
Failover versus redundancy
Failover and redundancy are similar, but not the same. A successful failover requires redundant backup systems that can perform the same functions as the primary system. Simply stated, redundancy is a characteristic of the backup system.
Failover versus failback
While failover switches operations to a backup component or system, failback returns processing to the primary device or system after the problem is resolved.
Why failovers are needed
Without failover, potentially thousands of devices, applications, and servers that are connected to your network cannot function. Not only are your internal operations interrupted but you risk not being able to service your customers, which can be costly to your reputation and your revenue.
Backing up data to the cloud is not enough. You must back up your network operations as well. The need for failover is increased by several factors. First, using software and equipment for several operational tasks multiplies the negative effect of a single component’s failure. For example, many of us experienced the vulnerability of our network systems first-hand during the July 2024 world-wide outage. A flawed security update to a widely used operating system shut down approximately 8.5 million devices. Some businesses came to a standstill. The outage grounded flights, interrupted banking, shut down hospital appointment systems, and took news broadcasts off the air. It short, it wreaked havoc.
Second, the ability for bad actors to exploit technologies and cybersecurity vulnerabilities have increased the likelihood of sophisticated security breaches and ransomware attacks. In the event of a security lapse, failover is crucial. For example, if a ransomware attack shuts down a server, failover to another server can help keep your business operational.
But do note: the challenge is that the bug or other issue that enabled the attack might be in the backup as well. It’s imperative that the issue is identified through a comprehensive security evaluation to avoid replicating the problem.
How does failover work?
The failover process begins with continuous monitoring of the primary network and its components (such as servers, routers, or internet connections). For example, servers monitor each other using heartbeat cables, which are Ethernet cables that are used to connect and maintain communication between multiple servers, connecting the primary and backup servers for failover. The backup server monitors the “pulse” on the primary server. Any pulse change causes the backup server to take over. Then the backup server automatically sends a message to the data center or technician—so they know that they need to bring the primary server back online.
Note that administrators can manually start failover mode for scheduled maintenance or a security issue.
Redundant equipment, software, networks, and servers are crucial for successful failover. When we talk about redundant servers, our discussion leads us to the term, “failover cluster.”
How open architecture is changing the way we innovate
Open architecture is a collaborative process for developing new technologies and solutions. It uses an “open” approach that can reduce vendor lock-in, encourage easier integration, and play a vital role in technology-based innovation.
What is a failover cluster?
A failover cluster is two or more independent servers working together.
When a server fails, the system automatically sends the data processing workload to another server in the cluster.
For example, your failover cluster could have two servers – a primary and a backup. These servers would run the resources and applications for the cluster. The servers communicate with one another and with the users across the network. A storage system is typically shared between servers that are part of a cluster.
Failover clusters provide continuous or high availability (HA). A continuous availability failover cluster eliminates data loss and downtime when the primary system fails. High availability clusters, on the other hand, also prevent data loss, but you may experience minimal downtime.
You can scale your network configuration to include more servers as needed. The failover strategy you choose is also based on your needs.
Types of failover strategies
Just as the number of servers in a failover cluster affects your network configuration, the failover strategy you choose also affects the configuration. There are different types of failover depending on how you want to implement the failover process. Which strategy your IT team chooses can be based on a number of factors: how many sites your business is supporting, your system’s criticality, the failover complexity, cost, resources required, user-facing service considerations (such as latency) and more. Popular failover strategies include:
Active-active
With an active-active cluster, two or more servers simultaneously run a service. The servers are set up identically so they can perform the same tasks. Active-active failover distributes workloads evenly across devices.
If there’s an outage, downtime is virtually zero since both servers are active. Active-active provides the optimal user experience and is a common system configuration for high availability failover.
Active-passive
With an active-passive cluster, there’s also a minimum of two identically configured servers running each service, but the backup device is inactive. The standby server is available if the primary server fails. You’ll connect to the active server—unless there’s an outage. With an active-passive configuration, outage time can be longer because of the time it takes to switch from one device to another. Active-passive is a popular high-availability configuration when reliability, simplicity, and cost-efficiency are important considerations.
Since the active node of an active-passive cluster handles all client requests, an active-passive cluster won’t be able to provide the same level of throughput and response time as an active-active HA cluster
Hot failover
Hot failover is like active-active. The difference is that an active-active configuration can include multiple data centers. Hot failover is the configuration that’s used when you want to distribute workloads evenly across devices and there are only two data centers.
Cold failover
Earlier we defined failover as the process of automatically switching to a redundant backup system, but there is a manual option. With cold failover, an operator manually brings the backup system online when the primary system fails. Because this takes time, there’s downtime and the possibility of data loss. The advantage of cold failover is that it involves less resources and is much less expensive.
Strategy summary
Choosing the best failover strategy depends on many factors. Two major factors include your budget and the cost of down-time, not only in terms of lost revenue and production loss, but with respect to your system’s criticality, or the negative impact a shutdown would have on users.
SD-WAN role in failover
One other type of failover strategy is an SD-WAN failover. Software-Defined Wide Area Networking (or SD-WAN) is a Wide Area Network (WAN) architecture that’s virtual. It uses software to configure your network architecture to help keep your network reliable and secure. Through a centralized control function, it directs traffic across your WAN to connect users to applications, which are in a data center or the cloud.
In SD-WAN failover, there are multiple connections across your network. When a failover is triggered, it intelligently scans to determine which path is best to route data traffic while managing bandwidth to keep your network up and running.
Summary
The prevalence of cybercrime means that you need to prepare for bad actors, bad weather, and everything in between that could interrupt your server connections. Failover helps you avoid the potentially catastrophic results of a total shutdown. You may have thousands of devices, applications, and servers are at risk. For this reason, failover is more than a “nice to have”– it’s a necessity. Instead of production loss, revenue loss, and customer dissatisfaction, your business can continue to operate.
By duplicating your data center infrastructure and cloud availability, you’ll help ensure your applications and data are available and secure. Failover begins with continuous monitoring of your software, networks, and hardware. When the primary equipment or system fails, By you’ll automatically switching to a the duplicate or redundant backup system.
It cannot be understated. Failover and the accompanying failover strategy is essential In terms offor business continuity, a sound failover strategy is essential toand the future of your business.
Learn more about AT&T Business solutions at business.att.com or specifically about business continuity services and disaster recovery.
To connect with an expert who knows business, contact your AT&T Business representative.
Why AT&T Business
See how ultra-fast, reliable fiber and 5G connectivity protected by built-in security give you a new level of confidence in the possibilities of your network. Let our experts work with you to solve your challenges and accelerate outcomes. Your business deserves the AT&T Business difference—a new standard for networking.