CARP is a wonderful way to setup a high availability environment for two firewalls. There is a pretty good documentation at the OPNsense documentation in case you don’t want to annoy your family during upgrades, tests,…
But of course there a limitations – you need a fixed IP for the WAN interface and it must be assigned statically. While I have a public IP my provider insists of using DHCP to get it assigned for the according routes to be promoted accordingly.
While it’s far from being perfect I came up with the following idea which works good enough to keep the family happy at least:
- forget about setting up CARP on the WAN interface
- ensure the WAN interfaces on both firewall have the same MAC address
-> I run both firewalls on separated Proxmox environments
-> my provider otherwise needs a few minutes to adjust to the new MAC
-> leads to new problem with ARP, but there’s a way - use a script to ensure the backup firewall is acting “correctly”
- WAN interface needs to be down (make ARP happy)
- RADVD shouldn’t be running (in case you use IPv6)
the priority setting never worked for me and the backup host was used by many clients still - HAPROXY should be stopped
- setup a default gateway which points to the CARP address of the LAN interface (DHCP has always higher priority)
-> this ensures that the backup firewall still has Internet access
Putting the master firewall in maintenance mode I usually lose none or perhaps 1-2 pings and nobody encounters any impact. Same for switching back.
The script below is located as 30-wan in /usr/local/etc/rc.syshook.d/carp/ :
#!/usr/local/bin/php
<?php
require_once("config.inc");
require_once("interfaces.inc");
require_once("util.inc");
//require_once("plugins.inc.d/openvpn.inc");
$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';
log_error("STEFF: {$subsystem}");
if ($type != 'MASTER' && $type != 'BACKUP') {
log_error("Carp '$type' event unknown from source '{$subsystem}'");
exit(1);
}
if (!strstr($subsystem, '@')) {
log_error("Carp '$type' event triggered from wrong source '{$subsystem}'");
exit(1);
}
if ($type == 'MASTER') {
//interface_bring_up('vtnet0');
shell_exec("/usr/local/etc/rc.linkup start vtnet0");
shell_exec("/sbin/ifconfig vtnet0 up");
shell_exec("/usr/local/sbin/pluginctl -s radvd start");
shell_exec("/usr/local/sbin/pluginctl -s haproxy start");
log_error("STEFF vtnet0 and radvd up");
exit(1);
}
if ($type == 'BACKUP') {
//interface_bring_down('vtnet0');
shell_exec("/usr/local/etc/rc.linkup stop vtnet0");
shell_exec("/sbin/ifconfig vtnet0 down");
shell_exec("/usr/local/sbin/pluginctl -s radvd stop");
shell_exec("/usr/local/sbin/pluginctl -s haproxy stop");
log_error("STEFF vtnet0 and radvd down");
exit(1);
}
There are still some problems:
- rarely, but sometimes the master goes offline and the slave kicks in too quick – mainly it seems to happen if the master doesn’t get a reply while checking for a changed WAN address right away
- sometimes I have to manually disable and re-enable the DHCP IPv6 gateway on the slave to restore it’s Internet access
-> a reboot also helps here