OpenStack Nova error: libVirt cannot get CPU affinity of process 30619

Nova-compute 無法運行,在 log 中發現:

2015-02-11 15:34:05.511 30827 TRACE nova   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 187, in doit
2015-02-11 15:34:05.511 30827 TRACE nova     result = proxy_call(self._autowrap, f, *args, **kwargs)
2015-02-11 15:34:05.511 30827 TRACE nova   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 147, in proxy_call
2015-02-11 15:34:05.511 30827 TRACE nova     rv = execute(f,*args,**kwargs)
2015-02-11 15:34:05.511 30827 TRACE nova   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 76, in tworker
2015-02-11 15:34:05.511 30827 TRACE nova     rv = meth(*args,**kwargs)
2015-02-11 15:34:05.511 30827 TRACE nova   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 2096, in vcpus
2015-02-11 15:34:05.511 30827 TRACE nova     if ret == -1: raise libvirtError ('virDomainGetVcpus() failed', dom=self)
2015-02-11 15:34:05.511 30827 TRACE nova libvirtError: cannot get CPU affinity of process 30619: No such process
2015-02-11 15:34:05.511 30827 TRACE nova

雖然看不太懂為什麼會出現這種錯誤,不過猜測大概與早上將 compute node 強制重開機有關。

看一下 virsh list 結果一大堆 VM 卡在 runningshutdown 的狀態。 但是並沒有任何 KVM process 在 running。

雖然不知道為什麼,但是我的直覺告訴我,只要解決 libVirt 這個狀態,就能解決 nova-compute 的問題

Solution

找了一下,發現在 /var/run/libvirt/qemu/ 這個資料夾中,有很多 instance-XXXXX.pid 與 instance-XXXXX.xml

root@compute-02: /var/run/libvirt/qemu# ls
instance-00000093.pid  instance-000002bb.xml  instance-00000307.pid  instance-00000327.xml  instance-00000340.pid
instance-00000093.xml  instance-000002be.pid  instance-00000307.xml  instance-0000032b.pid  instance-00000340.xml
instance-000000ac.pid  instance-000002be.xml  instance-00000322.pid  instance-0000032b.xml  instance-00000348.pid
instance-000000ac.xml  instance-000002c5.pid  instance-00000322.xml  instance-00000333.pid  instance-00000348.xml
instance-00000271.pid  instance-000002c5.xml  instance-00000325.pid  instance-00000333.xml  instance-00000357.pid
instance-00000271.xml  instance-000002c8.pid  instance-00000325.xml  instance-0000033d.pid  instance-00000357.xml
instance-000002bb.pid  instance-000002c8.xml  instance-00000327.pid  instance-0000033d.xml

這些檔案應該是要 instance running 時,libVirt 自己產生。我猜測 virsh list 應該是會來這個目錄讀取檔案。反正現在也沒有 VM 正在 running,乾脆把他們全部砍掉。

root@compute-02: /var/run/libvirt/qemu# rm -rf *

接下來重啟 libVirt ,然後就發現 virsh list 恢復到原本的狀態。

root@compute-02:~# service libvirt-bin restart
libvirt-bin stop/waiting
libvirt-bin start/running, process 29451
root@compute-02:~# virsh list
 Id    Name                           State
----------------------------------------------------

root@compute-02:~#

接著啟動 nova-compute 就可以恢復運作囉。

root@compute-02:~# service nova-compute start
nova-compute start/running, process 32664
root@compute-02:~# service nova-compute status
nova-compute start/running, process 32664