Major Updates in Slurm Version 17.11
Slurm version 17.11 was released in November 2017. See the RELEASE_NOTES and NEWS files included with the distribution for a more complete description of changes.
- NOTE FOR THOSE RUNNING 17.11.[0|1]: It was found a seeded MySQL auto_increment value would be lost eventually if used. This was found in the tres_table which tracks static TRES under 1001. This was fixed in MariaDB >=10.2.4, but at the time of writing this was still around in MySQL. Regardless if you are tracking licenses or GRES in the database (i.e. AccountingStorageTRES=gres/gpu) you might have this this issue. This would mean the id for gres/gpu could have been issued 5 instead of 1001. This is fine uptil 17.11 where a new static TRES was added taking up the id of 5. If you are already running 17.11 you can easily check to see if you hit this problem by running 'sacctmgr list tres'. If you see any Name for the Type 'billing' TRES (id=5) you are unfortunately hit with the bug. The fix for this issue requires manual intervention with the database. Most likely if you started a slurmctld up against the slurmdbd the overwritten TRES is now at a different id. You can fix the double issue by altering all the tables with the new TRES id back to 5, remove that entry in the tres_table, and then change the Type of billing back to the original Type and restart the slurmdbd which should finish the conversion. SchedMD can assist with this. Supported sites please open a ticket at https://bugs.schedmd.com/. Non-supported sites please contact SchedMD at email@example.com if you would like to discuss commercial support options.
- NOTE: The slurm.spec file used to build RPM packages has been aggressively refactored, and some package names may now be different. Notably, the three daemons (slurmctld, slurmd/slurmstepd, slurmdbd) each have their own separate package with the binary and the appropriate systemd service file, which will be installed automatically (but not enabled). The slurm-plugins, slurm-munge, and slurm-lua package has been removed, and the contents moved in to the main slurm package. The slurm-sql package has been removed, and merged in with the slurm (job_comp_mysql.so) and slurm-slurmdbd (accounting_storage_mysql) packages. The example configuration files have been moved to slurm-example-configs.
- NOTE: The refactored slurm.spec file now requires systemd to build. When building on older distributions, you must use the older variant which has been preserved as contribs/slurm.spec-legacy.
- NOTE: The slurmctld is now set to fatal if there are any problems with any state files. To avoid this use the new '-i' flag.
NOTE: systemd services files are installed automatically, but not
enabled. You will need to manually enable them on the appropriate systems:
- Controller: systemctl enable slurmctld
- Database: systemctl enable slurmdbd
- Compute Nodes: systemctl enable slurmd
- NOTE: If you are not using Munge, but are using the "service" scripts to start Slurm daemons, then you will need to remove this check from the etc/slurm*service scripts.
- NOTE: If you are upgrading with any jobs from 14.03 or earlier (i.e. quick upgrade from 14.03 -> 15.08 -> 17.02) you will need to wait until after those jobs are gone before you upgrade to 17.02 or 17.11.
- NOTE: If you interact with any memory values in a job_submit plugin, you will need to test against NO_VAL64 instead of NO_VAL, and change your printf format as well.
- NOTE: The SLURM_ID_HASH used for Cray systems has changed to fully use the entire 64 bits of the hash. Previously the stepid was multiplied by 10,000,000,000 to make it easy to read both the jobid as well as the stepid in the hash separated by at least a couple of zeros, but this lead to overflow on the hash with steps like the batch and extern step where they used all 32 bits to represent the step. While the new method ruins the easy readability it fixes the more important overflow issue. This most likely will go unnoticed by most, just a note of the change.
- NOTE: Starting in 17.11 the slurm commands and daemons dynamically link to libslurmfull.so instead of statically linking. This dramatically reduces the footprint of Slurm. If for some reason this creates issues with your build you can configure slurm with --without-shared-libslurm.
- NOTE: Spank options handled in local and allocator contexts should be able to handle being called multiple times. An option could be set multiple times through environment variables and command line options. Environment variables are processed first.
- NOTE: IBM BlueGene/Q and Cray/ALPS modes are deprecated and will be removed in an upcoming release. You must add the --enable-deprecated option to configure to build these targets.
- NOTE: Built-in BLCR support is deprecated, no longer built automatically, and will be removed in an upcoming release. You must add --with-blcr and --enable-deprecated options to configure to build this plugin.
- NOTE: srun will now only read in the environment variables SLURM_JOB_NODES and SLURM_JOB_NODELIST instead of SLURM_NNODES and SLURM_NODELIST. These latter variables have been obsolete for some time please update any scripts still using them.
- Support for federated clusters to manage a single work-flow across a set of clusters.
- Support for heterogeneous job allocations (various processor types, memory sizes, etc. by job component). Support for heterogeneous job steps within a single MPI_COMM_WORLD is not yet supported for most configurations.
- X11 support is now fully integrated with the main Slurm code. Remove any X11 plugin configured in your plugstack.conf file to avoid errors being logged about conflicting options.
- Added new advanced reservation flag of "flex", which permits jobs requesting the reservation to begin prior to the reservation's start time and use resources inside or outside of the reservation. A typical use case is to prevent jobs not explicitly requesting the reservation from using those reserved resources rather than forcing jobs requesting the reservation to use those resources in the time frame reserved.
- The sprio command has been modified to report a job's priority information for every partition the job has been submitted to.
- Group ID lookup performed at job submit time to avoid lookup on all compute nodes. Enable with LaunchParameters=send_gids configuration parameter.
- Slurm commands and daemons dynamically link to libslurmfull.so instead of statically linking. This dramatically reduces the footprint of Slurm. If for some reason this creates issues with your build you can configure slurm with --without-shared-libslurm.
- In switch plugin, added plugin_id symbol to plugins and wrapped switch_jobinfo_t with dynamic_plugin_data_t in interface calls in order to pass switch information between clusters with different switch types.
- Changed default ProctrackType to cgroup.
- Changed default sched_min_interval from 0 to 2 microseconds.
- CRAY: --enable-native-cray is no longer an option and is on by default. If you want to run with ALPS please configure with --disable-native-cray.
Added new 'scontrol write batch_script
' command to fetch a job's batch script. Removed the ability to see the script as part of the 'scontrol -dd show job' command.
- Add new "billing" TRES which allows jobs to be limited based on the job's billable TRES calculated by the job's partition's TRESBillingWeights.
- Regular user use of "scontrol top" command is now disabled. Use the configuration parameter "SchedulerParameters=enable_user_top" to enable that functionality. The configuration parameter "SchedulerParameters=disable_user_top" will be silently ignored.
- Change default to let pending jobs run outside of reservation after reservation is gone to put jobs in held state. Added NO_HOLD_JOBS_AFTER_END reservation flag to use old default.
- Support for PMIx v2.0 as well as UCX support.
Last modified 22 January 2018