Launch Plugin API

Overview

This document describes the launch plugin that is responsible for launching a parallel task in Slurm and the API that defines them. It is intended as a resource to programmers wishing to write their own launch plugin.

const char plugin_name[]="launch Slurm plugin"

const char plugin_type[]="launch/[aprun|poe|runjob|slurm"

  • aprun — Use Cray's aprun command to launch tasks - used on Cray systems with ALPS installed.
  • poe — Use IBM's poe command to launch tasks - used on systems IBM's parallel environment (PE) installed.
  • runjob — Use IBM's runjob command to launch tasks - used on BlueGene/Q systems.
  • slurm — Use Slurm's default launching infrastructure

const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and any attempt to load the plugin from a different version of Slurm will result in an error. If not specified, then the plugin may be loadeed by Slurm commands and daemons from any version, however this may result in difficult to diagnose failures due to changes in the arguments to plugin functions or changes in other Slurm functions used by the plugin.

The programmer is urged to study src/plugins/launch/slurm/launch_slurm.c for a sample implementation of a Slurm launch plugin.

API Functions

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns: None.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().

int launch_p_setup_srun_opt(char **rest)

Description:
Sets up the srun operation.

Arguments:
rest: extra parameters on the command line not processed by srun

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int launch_p_handle_multi_prog_verify(int command_pos)

Description:
Is called to verify a multi-prog file if verifying needs to be done.

Arguments:
command_pos: to be used with global opt variable to tell which spot the command is in opt.argv.

Returns:
1 if handled, or
0 if not.

int launch_p_create_job_step(srun_job_t *job, bool use_all_cpus, void (*signal_function)(int), sig_atomic_t *destroy_job)

Description:
Creates the job step.

Arguments:
job: the job to run.
use_all_cpus: choice whether to use all cpus.
signal_function: function that handles the signals coming in.
destroy_job: pointer to a global flag signifying if the job was canceled while allocating.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

launch_p_step_launch(srun_job_t *job, slurm_step_io_fds_t *cio_fds, uint32_t *global_rc)

Description:
Launches the job step.

Arguments:
job: the job to launch.
cio_fds: filled in io descriptors
global_rc: srun global return code.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int launch_p_step_wait(srun_job_t *job, bool got_alloc)

Description:
Waits for the job to be finished.

Arguments:
job: the job to wait for.
got_alloc: if the resource allocation was created inside srun.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int launch_p_step_terminate(void)

Description:
Terminates the job step.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void launch_p_print_status(void)

Description:
Gets the status of the job.

void launch_p_fwd_signal(int signal)

Description:
Sends a forward signal to any underlying tasks.

Arguments:
signal: the signal that needs to be sent.

Last modified 11 February 2016