Dispatch: Dual WAN Failover on pfSense

Why Dual WAN?

When a new ISP popped onto the neighborhood last year, it caught my attention for a few reasons. Chief among those: It is the only provider that is willing to run fiber directly to my apartment. Also, it is dirt cheap. They are clearly making a move to capture market share. So, as a man who has dorky priorities I asked myself… Por qué no los dos?

See, my lab is maturing. I am now hosting multiple services locally that I’ve come to rely on. For example, my primary repository for this website’s source code lives in a locally hosted GitLab environment. The runners that execute my CI/CD pipelines are local too. As I come to rely on services like this more and more, my interest in reliability and access increases. In other words, my motivation isn’t just:

“Hey, why is YouTube buffering?”

It’s also

“Without this I cannot push my code.”

and

“If I’m not on the LAN, I can’t reach my NAS.”

Hardware Requirements

To make this work I needed a couple pieces of equipment in place.

My existing service is high speed cable that terminates at a Surfboard modem. New fiber service requires adding an ONT into the mix as well.
A key part of the infrastructure, my router supports multiple WAN connections. I have a Netgate 4100 so its just a matter of keeping my interfaces straight so I don’t get confused.

Approach

There are a few ways the router can handle these WAN links. I chose a tiered failover. If one ISP goes down or otherwise becomes unhealthy, I will switch over to the other. It’s important to me that this is automatic, quick, and inflicts the least amount of pain on the users (hi its me, the user).

In practice it is quite fast, but it’s possible it can disrupt stateful sessions. If you were connected to your bank when gateway changes, it will probably introduce friction.

Housekeeping

First things first, I named the interfaces clearly. Welcome to the team, WAN_CABLE and WAN_FIBER! This is one of those small steps that will pay off in the future, time and time again. When I write firewall rules. When I sniff traffic for testing. When I go to set up Dynamic DNS. I want to be certain that I am working with the intended interface.

Concept

Let’s briefly review how this whole thing works. How does pfSense know the health of each WAN link to determine if it should fail over? It routinely monitors both gateways with a program called dpinger. This is a daemon that tries to hit a target. It measures the latency and averages those results over a specified time chunk. It generates a report that pfSense uses to make failover decisions.

The status of each gateway can be found in Status -> Gateways

pfSense gateway monitoring settings. — Gateway monitoring settings in pfSense.

In this case, I have each gateway pinging against popular DNS resolvers by CloudFlare and Google. Rather than stopping at my ISP, I like making sure I can actually reach out to the big boys. We can see that both links are happy. There is no packet loss. Both are showing acceptable latency.

I’d like to draw your attention to the RTTsd, or the standard deviation of the response time. What we see is that the fiber latency is more consistent, even if the most recent sampling seems similar.

This is one of the ways advertised speeds can mislead us. On paper, the cable plan touts higher downstream speeds, but the fiber is consistently better. Lower latency, less jitter, and the upload speed is incomparable.

Configuring the Failover

pfSense applies its failover logic to a Gateway Group. The Gateway Group includes both of the WAN interfaces.

pfSense gateway group settings. — Note my naming convention could be improved! My cable gateway was created first, so it has a generic name.

When you create the Gateway Group, you make a determination about which ISP is your primary. Here you can see my cable is currently my Tier 1 gateway. When I first configured this I had years of experience with the cable ISP and the fiber was an unknown. I will be changing this now that I am confident in the reliability of both links.

How do I call home?

One challenge that comes with the dual-WAN territory is figuring out how to reliably call home when you have two active WAN IP addresses but only one is routing traffic. Resolving this is surprisingly easy with Dynamic DNS because pfSense will report the latest IP regardless of which gateway it is associated with.

Validate our work

This is the fun part. It’s our turn to embody chaos. Start unplugging! The moment a WAN interface loses link, the status is updated. Within a couple of seconds of the Tier 1 gateway going down, traffic begins flowing across the other interface. It’s smooth and it just works.

Future steps

One of my upcoming goals for the lab is implementing a consolidated metrics dashboard. I plan to collect more information about these connections. I’d like to keep an eye on how often we are actually failing over. It’s been working on its own in the background for months, but hey… we aren’t here for it works are we?

Why Dual WAN?#

Hardware Requirements#

Approach#

Housekeeping#

Concept#

Configuring the Failover#

Validate our work#

Future steps#