KVM QEMU Guest VMs randomly lose network connection

General Tech Bugs & Fixes 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Bugs & Fixes related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I'm working on setting up a server with KVM/QEMU and all Linux servers. We are going to use this server for web development, git, VoIP PBX, etc. (We were using XenServer and Windows Server 2016, but I'm a Linux fan.) I've come across some issues with virtual machines seemingly randomly losing network connection or going to sleep or something like that. I can't seem to pin down what the issue is.

I've looked through a lot of forums and posts even here on Server Fault, but nothing quite fits what I'm trying to do. I'll attach an image below of our network setup. We have 2 locations, and a VPN between them with firewalls. The machine in question is a Dell PowerEdge R710. I've successfully installed Ubuntu 18.10 and KVM/QEMU on it as a host OS (18.10 because of an issue with Virtual Manager not showing all network connections in 18.04.) I use Virtual Manager to manage installing/monitoring new VMs from my laptop (Dev Computer 1) over ssh.

I have 6 guest VMs all installed with either Ubuntu 18.04 or Debian 9 (our VoIP PBX) and they all work great except for the occasional network hiccup. All are connected through a bonded bridge in the host machine (including the host itself). There are 4 NICs all bonded and I've used the bond as an interface for the bridge. I'm using netplan for the network configuration and I'll post the config yaml below. I'm using static IP configurations for all the guest VMs that simply set an IP for the default "ens3" interface through netplan, but I can post that too if it will help.

Some interesting things I've noticed:

  1. I can always ssh into the host machine, it never seems to lose connection.
  2. When one of the 6 machines loses network connection, I can still ssh into it from the host machine, but it will sometimes hang for a bit while reestablishing connection.
  3. If I ssh into the offending VM from the host and do a ping to the gateway (firewall) it will snap out of it and we can connect to it again.
  4. Occasionally the guest VMs will be unable to see each other, but if I ssh into whichever can't see the other and run a ping it will usually start working after a few "Destination Host Unreachable" messages.

I can get any other command outputs or logs that would be necessary to further diagnose this, and I'd really appreciate anyone who may know more about this looking into it. I'm a huge Linux fan, and want this to work the way I know it can, but these random disconnects are not making this solution look very good. Thanks to any who take time to read this!

Network Map

Host machine netplan configuration:

network:
    version: 2
    renderer: networkd
    ethernets:
        eno1:
            dhcp4: false
            dhcp6: false
        eno2:
            dhcp4: false
            dhcp6: false
        eno3:
            dhcp4: false
            dhcp6: false
        eno4:
            dhcp4: false
            dhcp6: false
    bonds:
        bond0:
            interfaces:
                - eno1
                - eno2
                - eno3
                - eno4
            addresses: [192.168.5.20/24]
            dhcp4: false
            gateway4: 192.168.5.1
            nameservers:
                addresses: [192.168.1.6,1.1.1.1]
    bridges:
        br0:
            addresses: [192.168.5.21/24]
            dhcp4: false
            gateway4: 192.168.5.1
            nameservers:
                addresses: [192.168.1.6,1.1.1.1]
            interfaces:
profilepic.png
manpreet 2 years ago

 

I have an almost identical configuration currently in production. Ubuntu 18.04+KVM/QEMU on an R710 and I have not experienced this issue.

While it's possible that it's a difference of Ubuntu versions, with you being on 18.10, or an actual hardware issue you're having, the only notable difference I see in this configuration is the bond - which I am not using. My bridge configuration looks like the one below:

    bridges:
        br0:
            dhcp4: yes
            interfaces:
                - eno1

It's only using eno1 as that's the only interface with a cable running to it. It may be worthwhile, purely for troubleshooting purposes, to attempt using a similar configuration so see if it resolves the issue.

If that is the issue, the things that stick out to me as being potentially flawed in your configuration are the redundant parameters in your bond/bridge. To my understanding, parameters like the addresses, gateway, and nameservers should be innately inherited from the interface in use. Potentially attempt setting all of these settings in either the bridge or the bond, but not both.

Lastly, considering it appears we are on near-identical hardware, running some sort of test on the VM host to confirm that the network card itself is not bad.

Hope this helps!


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.