Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
openSUSE:Leap:15.5:Update
patchinfo.30856
_patchinfo
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File _patchinfo of Package patchinfo.30856
<patchinfo incident="30856"> <issue tracker="bnc" id="1215437">[RN, Slurm] Release notes for an update to 23.02.5</issue> <packager>eeich</packager> <rating>moderate</rating> <category>recommended</category> <summary>Recommended update for slurm_23_02</summary> <description>This update for slurm_23_02 fixes the following issues: - Updated to version 23.02.5 with the following changes: * Bug Fixes: + Revert a change in 23.02 where `SLURM_NTASKS` was no longer set in the job's environment when `--ntasks-per-node` was requested. The method that is is being set, however, is different and should be more accurate in more situations. + Change pmi2 plugin to honor the `SrunPortRange` option. This matches the new behavior of the pmix plugin in 23.02.0. Note that neither of these plugins makes use of the `MpiParams=ports=` option, and previously were only limited by the systems ephemeral port range. + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if a node features plugin is configured. + Fix and prevent reoccurring reservations from overlapping. + `job_container/tmpfs` - Avoid attempts to share BasePath between nodes. + With `CR_Cpu_Memory`, fix node selection for jobs that request gres and `--mem-per-cpu`. + Fix a regression from 22.05.7 in which some jobs were allocated too few nodes, thus overcommitting cpus to some tasks. + Fix a job being stuck in the completing state if the job ends while the primary controller is down or unresponsive and the backup controller has not yet taken over. + Fix `slurmctld` segfault when a node registers with a configured `CpuSpecList` while `slurmctld` configuration has the node without `CpuSpecList`. + Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state after not registering by `ResumeTimeout`. + `slurmstepd` - Avoid cleanup of `config.json-less` containers spooldir getting skipped. + Fix scontrol segfault when 'completing' command requested repeatedly in interactive mode. + Properly handle a race condition between `bind()` and `listen()` calls in the network stack when running with SrunPortRange set. + Federation - Fix revoked jobs being returned regardless of the `-a`/`--all` option for privileged users. + Federation - Fix canceling pending federated jobs from non-origin clusters which could leave federated jobs orphaned from the origin cluster. + Fix sinfo segfault when printing multiple clusters with `--noheader` option. + Federation - fix clusters not syncing if clusters are added to a federation before they have registered with the dbd. + `node_features/helpers` - Fix node selection for jobs requesting changeable. features with the `|` operator, which could prevent jobs from running on some valid nodes. + `node_features/helpers` - Fix inconsistent handling of `&` and `|`, where an AND'd feature was sometimes AND'd to all sets of features instead of just the current set. E.g. `foo|bar&baz` was interpreted as `{foo,baz}` or `{bar,baz}` instead of how it is documented: `{foo} or {bar,baz}`. + Fix job accounting so that when a job is requeued its allocated node count is cleared. After the requeue, sacct will correctly show that the job has 0 `AllocNodes` while it is pending or if it is canceled before restarting. + `sacct` - `AllocCPUS` now correctly shows 0 if a job has not yet received an allocation or if the job was canceled before getting one. + Fix intel OneAPI autodetect: detect the `/dev/dri/renderD[0-9]+` GPUs, and do not detect `/dev/dri/card[0-9]+`. + Fix node selection for jobs that request `--gpus` and a number of tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs. + Remove `MYSQL_OPT_RECONNECT` completely. + Fix cloud nodes in `POWERING_UP` state disappearing (getting set to `FUTURE`) when an `scontrol reconfigure` happens. + `openapi/dbv0.0.39` - Avoid assert / segfault on missing coordinators list. + `slurmrestd` - Correct memory leak while parsing OpenAPI specification templates with server overrides. + Fix overwriting user node reason with system message. + Prevent deadlock when `rpc_queue` is enabled. + `slurmrestd` - Correct OpenAPI specification generation bug where fields with overlapping parent paths would not get generated. + Fix memory leak as a result of a partition info query. + Fix memory leak as a result of a job info query. + For step allocations, fix `--gres=none` sometimes not ignoring gres from the job. + Fix `--exclusive` jobs incorrectly gang-scheduling where they shouldn't. + Fix allocations with `CR_SOCKET`, gres not assigned to a specific socket, and block core distribion potentially allocating more sockets than required. + Revert a change in 23.02.3 where Slurm would kill a script's process group as soon as the script ended instead of waiting as long as any process in that process group held the stdout/stderr file descriptors open. That change broke some scripts that relied on the previous behavior. Setting time limits for scripts (such as `PrologEpilogTimeout`) is strongly encouraged to avoid Slurm waiting indefinitely for scripts to finish. + Fix `slurmdbd -R` not returning an error under certain conditions. + `slurmdbd` - Avoid potential NULL pointer dereference in the mysql plugin. + Fix regression in 23.02.3 which broken X11 forwarding for hosts when MUNGE sends a localhost address in the encode host field. This is caused when the node hostname is mapped to 127.0.0.1 (or similar) in `/etc/hosts`. + `openapi/[db]v0.0.39` - fix memory leak on parsing error. + `data_parser/v0.0.39` - fix updating qos for associations. + `openapi/dbv0.0.39` - fix updating values for associations with null users. + Fix minor memory leak with `--tres-per-task` and licenses. + Fix cyclic socket cpu distribution for tasks in a step where `--cpus-per-task` < usable threads per core. + `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of node's energy field `current_watts` to a dictionary to account for unset value instead of dumping 4294967294. + `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's field "priority" to a dictionary to account for unset value instead of dumping 4294967294. + slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code' code field in `v0.0.39_job_exit`_code will be set to -127 instead of being left unset where job does not have a relevant return code. * Other Changes: + Remove --uid / --gid options from salloc and srun commands. These options did not work correctly since the CVE-2022-29500 fix in combination with some changes made in 23.02.0. + Add the `JobId` to `debug()` messages indicating when `cpus_per_task/mem_per_cpu` or `pn_min_cpus` are being automatically adjusted. + Change the log message warning for rate limited users from verbose to info. + `slurmstepd` - Cleanup per task generated environment for containers in spooldir. + Format batch, extern, interactive, and pending step ids into strings that are human readable. + `slurmrestd` - Reduce memory usage when printing out job CPU frequency. + `data_parser/v0.0.39` - Add `required/memory_per_cpu` and `required/memory_per_node` to `sacct --json` and `sacct --yaml` and `GET /slurmdb/v0.0.39/jobs` from slurmrestd. + `gpu/oneapi` - Store cores correctly so CPU affinity is tracked. + Allow `slurmdbd -R` to work if the root assoc id is not 1. + Limit periodic node registrations to 50 instead of the full `TreeWidth`. Since unresolvable `cloud/dynamic` nodes must disable fanout by setting `TreeWidth` to a large number, this would cause all nodes to register at once. </description> </patchinfo>
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor