CARP with DHCP (kind of…)

CARP is a wonderful way to setup a high availability environment for two firewalls. There is a pretty good documentation at the OPNsense documentation in case you don’t want to annoy your family during upgrades, tests,…

But of course there a limitations – you need a fixed IP for the WAN interface and it must be assigned statically. While I have a public IP my provider insists of using DHCP to get it assigned for the according routes to be promoted accordingly.

While it’s far from being perfect I came up with the following idea which works good enough to keep the family happy at least:

  1. forget about setting up CARP on the WAN interface
  2. ensure the WAN interfaces on both firewall have the same MAC address
    -> I run both firewalls on separated Proxmox environments
    -> my provider otherwise needs a few minutes to adjust to the new MAC
    -> leads to new problem with ARP, but there’s a way
  3. use a script to ensure the backup firewall is acting “correctly”
    • WAN interface needs to be down (make ARP happy)
    • RADVD shouldn’t be running (in case you use IPv6)
      the priority setting never worked for me and the backup host was used by many clients still
    • HAPROXY should be stopped
  4. setup a default gateway which points to the CARP address of the LAN interface (DHCP has always higher priority)
    -> this ensures that the backup firewall still has Internet access

Putting the master firewall in maintenance mode I usually lose none or perhaps 1-2 pings and nobody encounters any impact. Same for switching back.

The script below is located as 30-wan in /usr/local/etc/rc.syshook.d/carp/ :

#!/usr/local/bin/php
<?php

require_once("config.inc");
require_once("interfaces.inc");
require_once("util.inc");
//require_once("plugins.inc.d/openvpn.inc");

$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';

log_error("STEFF: {$subsystem}");

if ($type != 'MASTER' && $type != 'BACKUP') {
    log_error("Carp '$type' event unknown from source '{$subsystem}'");
    exit(1);
}

if (!strstr($subsystem, '@')) {
    log_error("Carp '$type' event triggered from wrong source '{$subsystem}'");
    exit(1);
}


if ($type == 'MASTER') {
        //interface_bring_up('vtnet0');
        shell_exec("/usr/local/etc/rc.linkup start vtnet0");
        shell_exec("/sbin/ifconfig vtnet0 up");
        shell_exec("/usr/local/sbin/pluginctl -s radvd start");
        shell_exec("/usr/local/sbin/pluginctl -s haproxy start");
        log_error("STEFF vtnet0 and radvd up");
        exit(1);
} 
if ($type == 'BACKUP') {
        //interface_bring_down('vtnet0');
        shell_exec("/usr/local/etc/rc.linkup stop vtnet0");
        shell_exec("/sbin/ifconfig vtnet0 down");
        shell_exec("/usr/local/sbin/pluginctl -s radvd stop");
        shell_exec("/usr/local/sbin/pluginctl -s haproxy stop");
        log_error("STEFF vtnet0 and radvd down");
        exit(1);
} 

There are still some problems:

  • rarely, but sometimes the master goes offline and the slave kicks in too quick – mainly it seems to happen if the master doesn’t get a reply while checking for a changed WAN address right away
  • sometimes I have to manually disable and re-enable the DHCP IPv6 gateway on the slave to restore it’s Internet access
    -> a reboot also helps here