Avahi is the equivalent to Apple’s “Bonjour” zeroconf network service. It installs by default with the ubuntu-desktop meta-package, which I generally use to get, you guessed it, a full desktop on virtualization host servers. This never caused me any issues until today.
Today, though – on a server with dual network interfaces, both used as bridge ports on its br0 adapter – Avahi apparently decided “screw the configuration you specified in /etc/network/interfaces, I’m going to give your production virt host bridge an autoconf address. Because I want to be helpful.”
When it did so, the host dropped off the network, I got alarms on my monitoring service, and I couldn’t so much as arp the host, much less log into it. So I drove down to the affected office and did an ifconfig br0, which showed me the following damning bit of evidence:
me@box:~$ ifconfig br0 br0 Link encap:Ethernet HWaddr 00:0a:e4:ae:7e:4c inet6 addr: fe80::20a:e4ff:feae:7e4c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11 errors:0 dropped:0 overruns:0 frame:0 TX packets:96 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3927 (3.8 KB) TX bytes:6970 (6.8 KB) br0:avahi Link encap:Ethernet HWaddr 00:0a:e4:ae:7e:4c inet addr:169.254.6.229 Bcast:169.254.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Oh, Avahi, you son-of-a-bitch. Was there anything wrong with the actual NIC? Certainly didn’t look like it – had link lights on the NIC and on the switch, and sure enough, ifdown br0 ; ifup br0 brought it right back online again.
Can we confirm that avahi really was the culprit?
/var/log/syslog:Jan 9 09:10:58 virt0 avahi-daemon[1357]: Withdrawing address record for [redacted IP] on br0. /var/log/syslog:Jan 9 09:10:58 virt0 avahi-daemon[1357]: Leaving mDNS multicast group on interface br0.IPv4 with address [redacted IP]. /var/log/syslog:Jan 9 09:10:58 virt0 avahi-daemon[1357]: Interface br0.IPv4 no longer relevant for mDNS. /var/log/syslog:Jan 9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Found user 'avahi-autoipd' (UID 111) and group 'avahi-autoipd' (GID 121). /var/log/syslog:Jan 9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Successfully called chroot(). /var/log/syslog:Jan 9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Successfully dropped root privileges. /var/log/syslog:Jan 9 09:10:59 virt0 avahi-autoipd(br0)[12460]: Starting with address 169.254.6.229 /var/log/syslog:Jan 9 09:11:03 virt0 avahi-autoipd(br0)[12460]: Callout BIND, address 169.254.6.229 on interface br0 /var/log/syslog:Jan 9 09:11:03 virt0 avahi-daemon[1357]: Joining mDNS multicast group on interface br0.IPv4 with address 169.254.6.229. /var/log/syslog:Jan 9 09:11:03 virt0 avahi-daemon[1357]: New relevant interface br0.IPv4 for mDNS. /var/log/syslog:Jan 9 09:11:03 virt0 avahi-daemon[1357]: Registering new address record for 169.254.6.229 on br0.IPv4. /var/log/syslog:Jan 9 09:11:07 virt0 avahi-autoipd(br0)[12460]: Successfully claimed IP address 169.254.6.229
I know I said this already, but – oh, avahi, you worthless son of a bitch!
Next step was to kill it and disable it.
me@box:~$ sudo stop avahi-daemon me@box:~$ echo manual | sudo tee /etc/init/avahi-daemon.override
Grumble grumble grumble. Now I’m just wondering why I’ve never had this problem before… I suspect it’s something to do with having dual NICs on the bridge, and one of them not being plugged in (I only added them both so it wouldn’t matter which one actually got plugged in if the box ever got moved somewhere).
Thanks, I was having a lot of trouble with this one. I was losing my br0 config once a day. Around when the lease would expire. I wasn’t sure which stupid network thing was doing it, systemd-resolved, networkd-dispatcher, network-manager, netplan?, it had started happening recently when I updated from 18.04 to 20.04. Maybe I can find the upgrade log and confirm that it installed and/or activated avahi at that time. For now, they seem to deny it is the fault of Avahi: https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1586528 I’m not running network-manager now though (was the first thing I removed, I think it was changing my MAC on br0 though, which is weird, and I also don’t get an address in resolved from dhclient… had to set that manually in the resolved conf file, /etc/systemd/resolved.conf), so I know it’s not that. My guesses are 1) something to do with running multiple dhclient instances or 2) from resolved
May 31 02:24:25 HOSTNAME avahi-daemon[3505]: Registering new address record for 169.254.5.81 on br0.IPv4.
May 31 02:24:25 HOSTNAME avahi-daemon[3505]: New relevant interface br0.IPv4 for mDNS.
May 31 02:24:25 HOSTNAME avahi-daemon[3505]: Joining mDNS multicast group on interface br0.IPv4 with address 169.254.5.81.
May 31 02:24:20 HOSTNAME systemd[1]: Finished resolvconf-pull-resolved.service.
May 31 02:24:20 HOSTNAME systemd[1]: resolvconf-pull-resolved.service: Succeeded.
May 31 02:24:20 HOSTNAME systemd[1]: Starting resolvconf-pull-resolved.service…
May 31 02:24:20 HOSTNAME systemd[1]: Started Network Name Resolution.
May 31 02:24:20 HOSTNAME systemd-resolved[2958315]: Using system hostname ‘Tiamat’.
May 31 02:24:20 HOSTNAME systemd-resolved[2958315]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.i>
May 31 02:24:20 HOSTNAME systemd-resolved[2958315]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
May 31 02:24:20 HOSTNAME systemd-resolved[2958315]: Positive Trust Anchors:
May 31 02:24:19 HOSTNAME systemd[1]: Starting Network Name Resolution…
May 31 02:24:19 HOSTNAME systemd[1]: Stopped Network Name Resolution.
May 31 02:24:19 HOSTNAME systemd[1]: systemd-resolved.service: Succeeded.
May 31 02:24:19 HOSTNAME systemd[1]: Stopping Network Name Resolution…
The lack of resolvd address from dhclient probably has to do with this apparmor error that I thought was supposed to be fixed:
May 31 04:35:10 HOSTNAME audit[2997513]: AVC apparmor=”DENIED” operation=”sendmsg” profile=”/{,usr/}sbin/dhclient” name=”/log” pid=2997513 comm=”dhclient” requested_mask=”w” denied_mask=”w” fsuid=0 ouid=0
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1413232
Probably unrelated to the actual IP change though.
Arch has an interesting note: https://wiki.archlinux.org/index.php/Systemd-resolved
“Note: If Avahi has been installed, consider disabling avahi-daemon.service and avahi-daemon.socket to prevent conflicts with systemd-resolved.”
Disabling those seems to have fixed the problem (set lease to one minute to check for several cycles). Re-enabling those services causes it to change IP again, didn’t even log it that time. You definitely set me on the right track, so thanks again.