Revisions of slurm

Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1076522 from Egbert Eich's avatar Egbert Eich (eeich) (revision 88)
- updated to 23.02.1 with the following changes:
  * job_container/tmpfs - cleanup job container even if namespace mount is
    already unmounted.
  * openapi/dbv0.0.38 - Fix not displaying an error when updating QOS or
    associations fails.
  * Fix nodes remaining as PLANNED after slurmctld save state recovery.
  * Add cgroup.conf EnableControllers option for cgroup/v2.
  * Get correct cgroup root to allow slurmd to run in containers like Docker.
  * slurmctld - add missing PrivateData=jobs check to step ContainerID lookup
    requests originated from 'scontrol show step container-id=<id>' or certain
    scrun operations when container state can't be directly queried.
  * Fix nodes un-draining after being drained due to unkillable step.
  * Fix remote licenses allowed percentages reset to 0 during upgrade.
  * sacct - Avoid truncating time strings when using SLURM_TIME_FORMAT with
    the --parsable option.
  * Fix regression in 22.05.0rc1 that broke Nodes=ALL in a NodeSet.
  * openapi/v0.0.39 - fix jobs submitted via slurmrestd being allocated fewer
    CPUs than tasks when requesting multiple tasks.
  * Fix job not being scheduled on valid nodes and potentially being rejected
    when using parentheses at the beginning of square brackets in a feature
    request, for example: "feat1&[(feat2|feat3)]".
  * Fix regression in 23.02.0rc1 which made --gres-flags=enforce-binding no
    longer enforce optimal core-gpu job placement.
  * mpi/pmix - Fix v5 to load correctly when libpmix.so isn't in the normal
    lib path.
  * data_parser/v0.0.39 - fix regression where "memory_per_node" would be
    rejected for job submission.
  * data_parser/v0.0.39 - fix regression where "memory_per_cpu" would be
    rejected for job submission.
  * slurmctld - add an assert to check for magic number presence before deleting
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1072592 from Christian Goll's avatar Christian Goll (mslacken) (revision 87)
added: right-pmix-path.patch (forwarded request 1072591 from mslacken)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1072087 from Christian Goll's avatar Christian Goll (mslacken) (revision 86)
- slurm-plugins need to require pmix-pluginlib (bsc#1209260) (forwarded request 1072084 from mslacken)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1070214 from Egbert Eich's avatar Egbert Eich (eeich) (revision 85)
- Fixing dependencies for slurm--plugin-ext-sensors-rrd again. (forwarded request 1070212 from eeich)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1068523 from Egbert Eich's avatar Egbert Eich (eeich) (revision 84)
- Add missing Provides: and Obsoletes: to slurm-cray, slurm-hdf5
  and slurm-testsuite to avoid package conflicts.
- Add dependency for the general plugin package to the
  AcctGatherProfile HDF5 plugin.
- Adjust node RealMemory in slurm.conf of test suite for 8G test
  nodes. (forwarded request 1068522 from eeich)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1068320 from Egbert Eich's avatar Egbert Eich (eeich) (revision 83)
- updated to 23.02.0
  * Highlights
    + slurmctld - Add new RPC rate limiting feature. This is enabled through
      SlurmctldParameters=rl_enable, otherwise disabled by default.
    + Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave
      the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure
      to rotate logs please update your scripts to use SIGUSR2 instead.
    + Change cloud nodes to show by default. PrivateData=cloud is no longer
      needed.
    + sreport - Count planned (FKA reserved) time for jobs running in
      IGNORE_JOBS reservations. Previously was lumped into IDLE time.
    + job_container/tmpfs - Support running with an arbitrary list of private
      mount points (/tmp and /dev/shm are the default, but not required).
    + job_container/tmpfs - Set more environment variables in InitScript.
    + Make all cgroup directories created by Slurm owned by root. This was the
      behavior in cgroup/v2 but not in cgroup/v1 where by default the step
      directories ownership were set to the user and group of the job.
    + accounting_storage/mysql - change purge/archive to calculate record ages
      based on end time, rather than start or submission times.
    + job_submit/lua - add support for log_user() from slurm_job_modify().
    + Run the following scripts in slurmscriptd instead of slurmctld:
      ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
      and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
    + Only permit changing log levels with 'srun --slurmd-debug' by root
      or SlurmUser.
    + slurmctld will fatal() when reconfiguring the job_submit plugin fails.
    + Add PowerDownOnIdle partition option to power down nodes after nodes
      become idle.
    + Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix
      from slurmcriptd to Syslog logging. Previously was only happening when
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1063957 from Egbert Eich's avatar Egbert Eich (eeich) (revision 82)
- testsuite: on laster SUSE versions claim ownership of directory
  /etc/security/limits.d. (forwarded request 1063954 from eeich)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1031255 from Egbert Eich's avatar Egbert Eich (eeich) (revision 80)
- Test Suite fixes:
  * Update README_Testsuite.md.
  * Clean up left over files when de-installing test suite.
  * Adjustment to test suite package: for SLE mark the openmpi4
    devel package and slurm-hdf5 optional.
  * Add -ffat-lto-objects to the build flags when LTO is set to
    make sure the object files we ship with the test suite still
    work correctly.
  * Improve setup-testsuite.sh: copy ssh fingerprints from all nodes.

- set environment variable SUSE_ZNOW to 0 in %build to avoid module load
  failures due to unresolved symbols as module take advantage of lazy
  bindings (bsc#1200030).
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1030432 from Egbert Eich's avatar Egbert Eich (eeich) (revision 79)
- updated to 22.05.5
- NOTE: Slurm validates that libraries are of the same version. Unfortunately,
  due to an oversight, we failed to notice that the slurmstepd loads the
  hash_k12 library only after a job has completed. This means that if the
  hash_k12 library is upgraded before a job finishes, the slurmstepd will load
  the new library when the job finishes, and will fail due to a mismatch of
  versions.  This results in nodes with slurmstepd processes stuck
  indefinitely. These processes require manual intervention to clean up. There
  is no clean way to resolve these hung slurmstepd processes.
  The only recommended way to upgrade between minor versions of 22.05 with
  RPM’s or upgrades that replace current binaries and libraries is to drain the
  nodes of running jobs first.
- Fixes a number of moderate severity issues, noteable are:
  * Load hash plugin at slurmstepd launch time to prevent issues loading the
    plugin at step completion if the Slurm installation is upgraded.
  * Update nvml plugin to match the unique id format for MIG devices in new
    Nvidia drivers.
  * Fix multi-node step launch failure when nodes in the controller aren't in
    natural order. This can happen with inconsistent node naming (such as
    node15 and node052) or with dynamic nodes which can register in any order.
  * job_container/tmpfs - cleanup containers even when the .ns file isn't
    mounted anymore.
  * Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog
    and epilog scripts to complete or timeout. Previously, slurmd waited 120
    seconds before timing out and killing prolog and epilog scripts. (forwarded request 1010642 from mslacken)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1006180 from Egbert Eich's avatar Egbert Eich (eeich) (revision 78)
- Do not deduplicate files of testsuite Slurm configuration.
  This directory is supposed to be mounted over /etc/slurm
  therefore it must not contain softlinks to the files in
  this directory.
- Improve .a and .o file collection for test suite: find these
  files even if there are multiple ones in a single line. (forwarded request 1005746 from eeich)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1005247 from Egbert Eich's avatar Egbert Eich (eeich) (revision 77)
- Fix build for older product version. (forwarded request 1005246 from eeich)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 992362 from Egbert Eich's avatar Egbert Eich (eeich) (revision 76)
- Fix a potential security vulnerability in the test package
  (bsc#1201674, CVE-2022-31251).

- Patch NOFILE Limit in the slurmd.service copy for the testsuite. (forwarded request 992353 from eeich)
Richard Brown's avatar Richard Brown (RBrownFactory) accepted request 990643 from Factory Maintainer's avatar Factory Maintainer (factory-maintainer) (revision 75)
Automatic submission by obs-autosubmit
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 988733 from Egbert Eich's avatar Egbert Eich (eeich) (revision 74)
- Package the Slurm testsuite for QA purposes.
  * Fixes for test suite:
    Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
    Fix-test-21.41.patch
    Fix-test-38.11.patch
    Fix-test-32.8.patch
    Fix-test-3.13.patch
    Fix-test7.2-to-find-libpmix-under-lib64-as-well.patch
  * Add documentation:
    README_Testsuite.md
- Allow log in as user 'slurm'. This allows admins to run certain
  priviledged commands more easily without becoming root. (forwarded request 988732 from eeich)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 976280 from Egbert Eich's avatar Egbert Eich (eeich) (revision 72)
- Add a comment about the CommunicationParameters=block_null_hash
  option warning users who migrate - just in case.

- Update to 21.08.8 which fixes CVE-2022-29500 (bsc#1199278),
  CVE-2022-29501 (bsc#1199279), and CVE-2022-29502 (bsc#1199281).
Displaying revisions 21 - 40 of 108
openSUSE Build Service is sponsored by