25/03/15

Linux Cluster - Debugging Resource Failures

If you have a resource that fails to start, and there's nothing obvious in the logs (look for "lrmd", "LRM operation", etc.), you can try starting it manually to diagnose the problem further. Likewise for failed stop and monitor ops.

First, you have to unmanage the resource, so Pacemaker won't try to do anything with it, with:

# crm resource unmanage  <resource>
Configure environment:
# export OCF_ROOT=/usr/lib/ocf
# export OCF_RESKEY_<param>=<value>
# ... (likewise for all other resource parameters, run        
       "crm configure show <resource>" to verify what
       params you need to set here)
Run the op:
# /usr/lib/ocf/resource.d/heartbeat/<ra> start ; echo $? 
Look for helpful error messages, and check the return code.
If that doesn't help, try using sh -x or bash -x to see exactly what the RA is doing. Do a stop first just in case, then try the start again:
# /usr/lib/ocf/resource.d/heartbeat/<ra> stop
# sh -x /usr/lib/ocf/resource.d/heartbeat/<ra> start ; echo $?
Once you've figured out what the problem is and solved it, give the resource back to Pacemaker:
# crm resource manage <resource>

Ref: http://clusterlabs.org/wiki/Debugging_Resource_Failures

Nessun commento:

Posta un commento