Revisions of warewulf-nhc
Ana Guerrero (anag+factory)
accepted
request 1127173
from
Christian Goll (mslacken)
(revision 2)
- updated to 1.4.3 with following new features: * toggle BASH tracing or NHC debugging via SIGUSR1/SIGUSR2, respectively * check_nvsmi_healthmon(): New check from CSC for GPU health monitoring via nvidia-smi * Provide added detail to tracing info (-x mode) * Based on feedback from Moe Jette of SchedMD, pull node job data directly from Slurm via squeue instead of the previous method that only worked for single-node jobs. * Support for recent additions to the Slurm node states (e.g., "planned") * Pathname expansion has been disabled on startup, and re-enabled only when being actively used, to avoid "unintended" expansions of wildcards at random points throughout the code. * Correct clobbering of BASH built-in variables and add tests to prevent future recurrence * Switch "system UID" boundary handling to a more accurate source of truth, and ensure that the code matches the math, naming, and intent. * Reorder resource manager detection to improve accurate detection, especially with respect to Slurm vs. PBS (all variants) - removed test-test_lbnl_file.nhc-Put-all-process-substitution.patch
Dominique Leuenberger (dimstar_suse)
accepted
request 786942
from
Christian Goll (mslacken)
(revision 1)
node health checker which can be used by slurm
Displaying all 2 revisions