Slurm Generic Resource (GRES) Plugin API

Overview

This document describes Slurm generic resource plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own Slurm job submit plugins.

Slurm generic resource plugins must conform to the Slurm Plugin API with the following specifications:

const char gres_name[]="gres_name"

The gres_name should match minor in plugin_type described below.

const char plugin_type[]="major/minor"

The major type must be "gres." The minor type can be any suitable name for the type of accounting package.

const char plugin_name[]
Some descriptive name for the plugin. There is no requirement with respect to its format.

const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and any attempt to load the plugin from a different version of Slurm will result in an error. If not specified, then the plugin may be loadeed by Slurm commands and daemons from any version, however this may result in difficult to diagnose failures due to changes in the arguments to plugin functions or changes in other Slurm functions used by the plugin.

We include samples in the Slurm distribution for

  • gpu — Manage GPUs (Graphics Processing Units).
  • nic — Manage NICs (Network Interface Cards, this plugin does nothing today).

API Functions

All of the following functions are required. Functions which are not implemented must be stubbed.

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns: None.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().

int node_config_load(List gres_conf_list)

Description:
This function is called by the slurmd daemon after the slurm.conf and gres.conf files have been read. It can be used to validate the configuration by testing the actual hardware resources available or just confirm that an entry for the resource was included in the gres.conf file.

Arguments:
gres_conf_list (input/output) a list of configuration records generated by reading the slurm.conf and gres.conf files

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void job_set_env(char ***job_env_ptr, void *gres_ptr, int node_inx)

Description:
This function is called by the slurmd daemon after the getting a job credential and can be used to set environment variables for the job based upon GRES state information in that credential.

Arguments:
job_env_ptr (input/output) pointer to the job's environment variable structure.
gres_ptr (input) pointer to the job's GRES allocation information.
node_inx (input) zero origin node index, used to interpret node specific GRES data.

void step_set_env(char ***job_env_ptr, void *gres_ptr)

Description:
This function is called by the slurmd daemon after the getting a job step credential and can be used to set environment variables for the job step based upon GRES state information in that credential.

Arguments:
job_env_ptr (input/output) pointer to the job step's environment variable structure.
gres_ptr (input) pointer to the step's GRES allocation information.

extern void send_stepd(int fd)

Description:
This function is called by the slurmd daemon to send any needed information to the slurmstepd step shepherd.

Arguments:
fd (input) file descriptor to write information to.

extern void recv_stepd(int fd)

Description:
This function is called by the slurmstepd step shepherd to read any needed information from the slurmd daemon.

Arguments:
fd (input) file descriptor to read information from.

extern int job_info(gres_job_state_t *job_gres_data, uint32_t node_inx, enum gres_job_data_type data_type, void *data)

Description:
This function is used to extract plugin specific data from the job's GRES data structure. Note that data types GRES_JOB_DATA_COUNT and GRES_JOB_DATA_BITMAP are processed in common code rather than within the plugin and return data types of uint32_t* and bitstr_t** respectively.

Arguments:
job_gres_data (input) Information about the job's GRES resources.
(input) Zero origin index within the job's resource allocation for which data is desired.
gres_job_data_type data_type (input) Type of information to be gathered from the data structure.
data (output) Pointer to data within job_gres_data. No data is copied or needs to be freed. Data type depends upon the value of gres_job_data_type data_type.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

extern int step_info(gres_step_state_t *step_gres_data, uint32_t node_inx, enum gres_step_data_type data_type, void *data)

Description:
This function is used to extract plugin specific data from the step's GRES data structure. Note that data types GRES_STEP_DATA_COUNT and GRES_STEP_DATA_BITMAP are processed in common code rather than within the plugin and return data types of uint32_t* and bitstr_t** respectively.

Arguments:
step_gres_data (input) Information about the step's GRES resources.
node_inx (input) Zero origin index within the job's resource allocation for which data is desired.
gres_step_data_type data_type (input) Type of information to be gathered from the data structure.
data (output) Pointer to data within step_gres_data. No data is copied or needs to be freed. Data type depends upon the value of gres_step_data_type data_type.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

Last modified 27 March 2015