Debugging Saltstack States

利用 Saltstack 安裝 OpenStack Keystone。按照 OpenStack 官方安裝手冊,裝完 Keystone 之後,必須要為 Keystone 自己註冊 service 與 endpoint。

這一個步驟,我把他寫成 Saltstack states 如下:

waiting-keystone-startup:
  cmd.run:
    - name: sleep 1

registry-keystone-service:
  keystone.service_present:
    - name: keystone
    - service_type: identity
    - description: "OpenStack Identity"
    - require:
      - cmd: waiting-keystone-startup

registry-keystone-endpoint:
  keystone.endpoint_present:
    - name: keystone
    - publicurl: http://host-01:5000/v2.0
    - internalurl: http://host-01:5000/v2.0
    - adminurl: http://host-01:35357/v2.0
    - region: regionOne
    - require:
      - keystone: registry-keystone-service

結果執行時發生 KeyError:

user@host-01:/srv/salt/openstack/keystone$ salt '*' state.sls openstack.keystone.registry
host-01:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1161, in _thread_return
        return_data = func(*args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/salt/modules/state.py", line 681, in sls
        ret = st_.state.call_high(high_)
      File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2067, in call_high
        ret = dict(list(disabled.items()) + list(self.call_chunks(chunks).items()))
      File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1623, in call_chunks
        running = self.call_chunk(low, running, chunks)
      File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1769, in call_chunk
        self._mod_init(low)
      File "/usr/lib/python2.7/dist-packages/salt/state.py", line 612, in _mod_init
        self.states['{0}.{1}'.format(low['state'], low['fun'])]  # pylint: disable=W0106
      File "/usr/lib/python2.7/dist-packages/salt/utils/lazy.py", line 90, in __getitem__
        raise KeyError(key)
    KeyError: 'keystone.service_present'

問題主因

先說結果: Keystone 本身是安裝在其他 node 上,安裝後我沒有去確認 Keystone 是否正常運作。問題的主因在於 Saltstack minion 在執行 keystone.service_present,要先載入 keystone.auth。而 Keystone 並沒有真正運行,造成 module 載入失敗。

解決辦法只要確認 Keystone 有正常運作,就可以避免此問題。


以下是我在追尋問題的過程與一些心得。

通常 Saltatck 發生 KeyError,在過往的經驗,都是帶入了無法錯誤的參數,因此我仔細檢查我的 code。 但是不論是 Saltstack 官方的文件或是看 source code(/usr/lib/python2.7/dist-packages/salt/states/keystone.py),都找不到有任何我帶錯參數的地方。

因此,只能開啟 Debug mode 看看能不能抓到任何蛛絲馬跡.

Trun on Debug mode

編輯 /etc/salt/minion,設定:log_level: debug,然後重新啟動 salt-minion。

接下來我們就要觀察 /var/log/salt/minion

Start debuging

首先,重複執行 job (ex: salt '*' state.sls openstack.keystone.registry )

等待錯誤回報之後,打開 /var/log/salt/minion

Job 執行時,log 的開頭是:

2015-10-16 12:37:17,375 [salt.minion      ][INFO    ][19647] User root Executing command state.sls with jid 20151016043717370854
2015-10-16 12:37:17,375 [salt.minion      ][DEBUG   ][19647] Command details {'tgt_type': 'glob', 'jid': '20151016043717370854', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['openstack.keystone.registry'], 'fun': 'state.sls'}
2015-10-16 12:37:17,380 [salt.minion      ][INFO    ][21811] Starting a new job with PID 21811
2015-10-16 12:37:17,387 [salt.utils.lazy  ][DEBUG   ][21811] LazyLoaded state.sls
2015-10-16 12:37:17,395 [salt.utils.lazy  ][DEBUG   ][21811] LazyLoaded saltutil.is_running
2015-10-16 12:37:17,396 [salt.utils.lazy  ][DEBUG   ][21811] LazyLoaded grains.get
2015-10-16 12:37:17,397 [salt.crypt       ][DEBUG   ][21811] Re-using SAuth for ('/etc/salt/pki/minion', 'host-01', 'tcp://192.168.0.1:4506')
2015-10-16 12:37:17,414 [salt.config      ][DEBUG   ][21811] Reading configuration from /etc/salt/minion

可以看到 Job ID, … 資訊。接著往下看,可以發現,minion 的運作流程:

  1. 驗證
  2. 載入所需要的模組(函式)
2015-10-16 12:37:17,387 [salt.utils.lazy  ][DEBUG   ][21811] LazyLoaded state.sls
2015-10-16 12:37:17,395 [salt.utils.lazy  ][DEBUG   ][21811] LazyLoaded saltutil.is_running
2015-10-16 12:37:17,396 [salt.utils.lazy  ][DEBUG   ][21811] LazyLoaded grains.get
  1. 載入 configuration
2015-10-16 12:37:17,414 [salt.config      ][DEBUG   ][21811] Reading configuration from /etc/salt/minion
2015-10-16 12:37:17,450 [salt.config      ][DEBUG   ][21811] Including configuration from '/etc/salt/minion.d/_schedule.conf'
2015-10-16 12:37:17,450 [salt.config      ][DEBUG   ][21811] Reading configuration from /etc/salt/minion.d/_schedule.conf
2015-10-16 12:37:17,451 [salt.config      ][DEBUG   ][21811] Including configuration from '/etc/salt/minion.d/keystone.conf'
2015-10-16 12:37:17,451 [salt.config      ][DEBUG   ][21811] Reading configuration from /etc/salt/minion.d/keystone.conf
  1. 從 master 取得所需的檔案
2015-10-16 12:37:17,645 [salt.fileclient  ][DEBUG   ][21811] Fetching file from saltenv 'base', ** attempting ** 'salt://openstack/keystone/registry.sls'
2015-10-16 12:37:17,649 [salt.fileclient  ][INFO    ][21811] Fetching file from saltenv 'base', ** done ** 'openstack/keystone/registry.sls'
  1. 從檔案, piller value, grains value 中,render data
2015-10-16 12:37:17,662 [salt.template    ][DEBUG   ][21811] Rendered data from file: /var/cache/salt/minion/files/base/openstack/keystone/registry.sls:
waiting-keystone-startup:
  cmd.run:
    \- name: sleep 1

registry-keystone-service:
  keystone.service\_present:
    \- service_type: identity
    \- description: "OpenStack Identity"
    \- require:
      \- cmd: waiting-keystone-startup

registry-keystone-endpoint:
  keystone.endpoint\_present:
    - name: keystone
    - publicurl: http://host-01:5000/v2.0
    - internalurl: http://host-01:5000/v2.0
    - adminurl: http://host-01:35357/v2.0
    - region: regionOne
    - require:
      - keystone: registry-keystone-service
  1. 接下來就開始依照順序/相依性,先 load module/function,再執行 job。
2015-10-16 12:37:17,669 [salt.utils.lazy  ][DEBUG   ][21811] LazyLoaded cmd.run
2015-10-16 12:37:17,669 [salt.state       ][INFO    ][21811] Running state [sleep 1] at time 12:37:17.669452
2015-10-16 12:37:17,670 [salt.state       ][INFO    ][21811] Executing state cmd.run for sleep 1
2015-10-16 12:37:17,670 [salt.loaded.int.module.cmdmod][INFO    ][21811] Executing command 'sleep 1' in directory '/root'
2015-10-16 12:37:18,674 [salt.state       ][INFO    ][21811] {'pid': 21834, 'retcode': 0, 'stderr': '', 'stdout': ''}
2015-10-16 12:37:18,675 [salt.state       ][INFO    ][21811] Completed state [sleep 1] at time 12:37:18.674880

透過這樣一連串的步驟,就發現了問題的所在。 按照我寫的 states, cmd.run 執行完之後,就要執行 keystone.service_present。 在 log 中看到接下來發生的錯誤:

2015-10-16 12:37:18,727 [salt.loader      ][DEBUG   ][21811] Error loading module.tls: ['PyOpenSSL version 0.14 or later must be installed before  this module can be used.']
2015-10-16 12:37:18,818 [salt.loader      ][DEBUG   ][21811] Error loading module.nacl: libnacl import error, perhaps missing python libnacl package
2015-10-16 12:37:18,920 [salt.loader      ][DEBUG   ][21811] Error loading module.npm: npm execution module could not be loaded because the npm binary could not be located
2015-10-16 12:37:18,943 [salt.utils.lazy  ][DEBUG   ][21811] Could not LazyLoad keystone.auth

以及

2015-10-16 12:37:19,081 [salt.loader      ][DEBUG   ][21811] Error loading states.npm: 'npm' binary not found on system
2015-10-16 12:37:19,087 [salt.utils.lazy  ][DEBUG   ][21811] Could not LazyLoad keystone.service_present
2015-10-16 12:37:19,087 [salt.minion      ][WARNING ][21811] The minion function caused an exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1161, in _thread_return
    return_data = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/salt/modules/state.py", line 681, in sls
    ret = st_.state.call_high(high_)
  File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2067, in call_high
    ret = dict(list(disabled.items()) + list(self.call_chunks(chunks).items()))
  File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1623, in call_chunks
    running = self.call_chunk(low, running, chunks)
  File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1769, in call_chunk
    self._mod_init(low)
  File "/usr/lib/python2.7/dist-packages/salt/state.py", line 612, in _mod_init
    self.states['{0}.{1}'.format(low['state'], low['fun'])]  # pylint: disable=W0106
  File "/usr/lib/python2.7/dist-packages/salt/utils/lazy.py", line 90, in __getitem__
    raise KeyError(key)
KeyError: 'keystone.service_present'

Log 中出現了三個 error :

解決了以上三個問題之後,再次執行了同樣的 job,雖然已經沒有上面三個錯誤,但結果還是得到 KeyError。 後來才發現是因為 Keystone 並沒有正常運行,造成 keystone.auth 載入失敗,keystone.service_present 也因此連帶無法載入。


Reference