Node Features Plugin Programmer Guide

Overview

This document describes the node features plugin that is responsible for managing a node's active features. This is typically used for changing a node's characteristics at boot time. For example, an Intel Knights Landing (KNL) processor can be booted in various MCDRAM and NUMA modes. This document is intended as a resource to programmers wishing to write their own node features plugin.

const char plugin_name[]="launch Slurm plugin"

const char plugin_type[]="node_features/[knl_cray]"

  • knl_cray — Use Cray's capmc command to manage an Intel KNL processor.
  • knl_generic — Use Intel commands to manage KNL processor.

const uint32_t plugin_version=SLURM_VERSION_NUMBER
If specified, identifies the version of Slurm used to build this plugin and any attempt to load the plugin from a different version of Slurm will result in an error. If not specified, then the plugin may be loadeed by Slurm commands and daemons from any version, however this may result in difficult to diagnose failures due to changes in the arguments to plugin functions or changes in other Slurm functions used by the plugin.

The programmer is urged to study src/plugins/node_features/knl_cray/node_features_knl_cray.c for a sample implementation of a Slurm node features plugin.

API Functions

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().

int node_features_p_reconfig(void)

Description:
Note that the configuration has changed, read configuration parameters again.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

uint32_t node_features_p_boot_time(void)

Description:
Return the estimated node reboot time in units of seconds. Used as a basis for optimizing scheduling decisions.

Returns:
Estimated boot time in seconds.

int node_features_p_get_node(char *node_list)

Description:
Update active and available features on specified nodes. Executed from the slurmctld daemon only and directly updates internal node data structures.

Arguments:
node_list: Regular expression identifying the nodes to be updated. Update information about all nodes is value is NULL.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int node_features_p_job_valid(char *job_features)

Description:
Determine of the user's job constraint string is valid. This may be used to limit the type of operators supported (Slurm's active feature logic only supports the AND operator) and prevent illegal combintations of node features (e.g. multiple NUMA modes). Executed from the slurmctld daemon only when either the job submit or modify operation is invoked.

Arguments:
job_features: Job constraints specified by the user (-c/--constraint options).

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

char *node_features_p_job_xlate(char *job_features)

Description:
Translate a job's feature request to the node features needed at boot time. Job features not required by this plugin (e.g. rack number) will not be returned. For example, a user requested features may be "cache&quad&knl&rack1". Since the "knl" and "rack1" represent physical characteristics of the node and are not used by the node features plugin to boot the node, this function's return value will be "cache,quad". Executed from the slurmctld daemon only.

Arguments:
job_features: Job constraints specified by the user (-c/--constraint options).

Returns:
Node features used by this plugin when configuring or booting a node. A string with it's memory allocated by xmalloc (i.e. the return value must be released using Slurm's xfree function).

bool node_features_p_node_power(void)

Description:
Report if the PowerSave mode is required to boot nodes. Executed from the slurmctld daemon only.

Returns:
True if the plugin requires PowerSave mode for booting nodes.

void node_features_p_node_state(char **avail_modes, char **current_mode)

Description:
Get this node's available and current features (e.g. MCDRAM and NUMA settings from BIOS for a KNL processor, for example avail_modes="cache,flat,equal,a2a,quad,hemi,snc2,snc4" and current_mode="cache,quad"). Executed from the slurmd daemon only.

Arguments:
avail_modes: Nodes state features which are available. Value is allocated or appended to as appropriate with xmalloc functions.
current_modes: Nodes state features which are currently in effect. Value is allocated or appended to as appropriate with xmalloc functions.

char *node_features_p_node_xlate(char *new_features, char *orig_features)

Description:
Translate a node's new feature specification as needed to preserve any original features (i.e. features outside of the domain of this plugin). For example, a node's new features may be "cache,quad", while it's original features may have been "flat,hemi,knl,rack1". The available features with respect to this plugin are "flat,hemi", while features outside of the domain of this plugin are "knl,rack1". In this case, this function's return value will be "cache,quad,knl,rack1". Executed from the slurmctld daemon only.

Arguments:
new_features: Node's reported features.
orig_features: Node's previous feature state.

Returns:
Node's currently features value A string with it's memory allocated by xmalloc (i.e. the return value must be released using Slurm's xfree function).

void node_features_p_step_config(bool mem_sort, bitstr_t *numa_bitmap)

Description:
Perform any desired initialization operations prior to launching a job step.

Arguments:
mem_sort: If true, run zonesort before launching a job step.
numa_bitmap: Identify NUMA nodes on which to execute zonesort. If NULL, then execute zonesort on all NUMA nodes

char *node_features_p_user_update(uid_t uid)

Description:
Determine if the specified user can modify the currently available node features.

Arguments:
uid: User ID of user making request.

Returns:
True if user can change node active features to other available features.

Last modified 13 January 2017