8.1.1. Per-App-Context Mapping Policies

8.1.1.1. Overview

Today every prte_job_t carries a single prte_job_map_t (mapping/ranking/binding policy triple) and a single resolved prte_rmaps_options_t that is passed unchanged to whichever rmaps component wins selection. All app contexts in the job are mapped by that one component under that one policy.

This document specifies the changes required to allow each prte_app_context_t to carry its own mapping, ranking, and binding directives so that different apps within the same job can be mapped by different components under different policies.

8.1.1.2. Goals

  1. Every directive expressible at job level via --map-by, --rank-by, and --bind-to must be expressible at app-context level.

  2. When an app carries no per-app directives, it inherits the job-level policy unchanged — no behaviour change for existing usage.

  3. The rmaps component selected for each app context is determined by that app’s resolved mapping policy, not by the job’s.

  4. The existing component interface (map_job(prte_job_t *, prte_rmaps_options_t *)) is preserved with a minimal, backward-compatible extension.

  5. Global rank assignment (vpid computation) remains a single coordinated pass across all apps after all apps have been placed.

8.1.1.3. Attributes Required on prte_app_context_t

Add the following new attribute keys to src/util/attr.h in the PRTE_APP_* range (next available keys after PRTE_APP_PPR = 25):

/* Mapping policy string for this app, same syntax as --map-by.
 * When present overrides the job-level mapping policy for this app. */
#define PRTE_APP_MAPBY              26  // char* - e.g. "core:pe=2:oversubscribe"

/* Ranking policy string for this app, same syntax as --rank-by. */
#define PRTE_APP_RANKBY             27  // char* - e.g. "fill"

/* Binding policy string for this app, same syntax as --bind-to. */
#define PRTE_APP_BINDTO             28  // char* - e.g. "core"

/* File to use for sequential or rankfile mapping for this app.
 * Distinct from PRTE_APP_HOSTFILE which lists nodes; this is the
 * ordering/affinity file consumed by the seq and rank_file components. */
#define PRTE_APP_MAP_FILE           29  // char* - path to seq or rankfile

/* Device name for dist mapping for this app. */
#define PRTE_APP_DIST_DEVICE        30  // char* - e.g. "mlx5_0"

/* Use hwthreads as CPUs for this app. */
#define PRTE_APP_HWT_CPUS           31  // bool

/* Use cores as CPUs for this app (explicit, not relying on absence of HWT). */
#define PRTE_APP_CORE_CPUS          32  // bool

/* PE-list (CPU set) for this app, same syntax as pe-list= modifier. */
#define PRTE_APP_CPUSET             33  // char* - comma-delimited CPU ranges

/* Max procs to bind to a target object before moving to the next. */
#define PRTE_APP_BINDING_LIMIT      34  // uint16_t

PRTE_APP_MAX_KEY must be raised to accommodate the new keys.

8.1.1.3.1. Relationship to existing attributes

The already-defined PRTE_APP_PES_PER_PROC (24) and PRTE_APP_PPR (25) are preserved as-is; their semantics are unchanged.

PRTE_APP_MAPBY supersedes PRTE_APP_PPR and PRTE_APP_PES_PER_PROC when present — the full PRTE_APP_MAPBY string is the canonical per-app mapping directive and is parsed by the same machinery as the job-level --map-by string (see §Parsing below).

8.1.1.4. Changes to prte_rmaps_options_t (src/mca/rmaps/rmaps_types.h)

Add one field:

typedef struct {
    /* ... existing fields unchanged ... */

    /* When >= 0, the component must map only the app context at this index
     * within jdata->apps and must skip all others.
     * When < 0 (default, set to -1), the component maps all app contexts
     * as it does today. */
    int app_idx;

} prte_rmaps_options_t;

The new field is initialized to -1 (map all apps) in prte_rmaps_base_map_job via the existing memset(&options, 0, ...) plus an explicit assignment immediately after:

memset(&options, 0, sizeof(prte_rmaps_options_t));
options.app_idx = -1;   /* map all apps by default */

8.1.1.5. New Parsing Functions (src/mca/rmaps/base/rmaps_base_frame.c)

The existing prte_rmaps_base_set_mapping_policy(prte_job_t *jdata, char *spec) and check_modifiers() store their results into jdata->map->mapping and jdata->attributes. They must not be changed.

Add three new functions with identical parsing logic but storing results into app->attributes:

/* Parse a --map-by style string and store the result on app->attributes.
 * All mapping policy values, modifiers, and options that check_modifiers()
 * handles at the job level are handled here at the app level.  PPR pattern
 * is stored as PRTE_APP_PPR; pe count as PRTE_APP_PES_PER_PROC; cpuset as
 * PRTE_APP_CPUSET; file as PRTE_APP_MAP_FILE; hwthread/core CPU mode as
 * PRTE_APP_HWT_CPUS / PRTE_APP_CORE_CPUS.  The parsed mapping policy enum
 * value is stored as PRTE_APP_MAPBY (as a uint16_t, not the original string,
 * once resolved) — see §Resolution below for how this is read back. */
int prte_rmaps_base_set_app_mapping_policy(prte_app_context_t *app, char *spec);

/* Parse a --rank-by style string and store the result as PRTE_APP_RANKBY
 * (uint16_t ranking policy) on app->attributes.  Accepts the same ranking
 * objects as the job-level --rank-by: SLOT, NODE, FILL, and SPAN.  The
 * PRTE_RANKING_GIVEN directive bit is set so the resolve step knows the
 * value was supplied explicitly (and must not be re-derived from the app's
 * mapping policy).  An unrecognized object returns PRTE_ERR_SILENT after a
 * diagnostic. */
int prte_rmaps_base_set_app_ranking_policy(prte_app_context_t *app, char *spec);

/* Parse a --bind-to style string and store the result as PRTE_APP_BINDTO
 * (uint16_t binding policy) on app->attributes.  Accepts the same binding
 * objects as the job-level --bind-to: NONE, HWTHREAD, CORE, L1CACHE,
 * L2CACHE, L3CACHE, NUMA, and PACKAGE.  The ":"-delimited modifiers
 * if-supported, overload-allowed, no-overload, and LIMIT=N are parsed and
 * recorded: if-supported/overload directives become directive bits within
 * the PRTE_APP_BINDTO uint16_t (PRTE_BIND_IF_SUPPORTED,
 * PRTE_BIND_ALLOW_OVERLOAD / PRTE_BIND_OVERLOAD_GIVEN), while LIMIT=N is
 * stored separately as PRTE_APP_BINDING_LIMIT (uint16_t).  An unrecognized
 * object or modifier returns PRTE_ERR_BAD_PARAM (or PRTE_ERR_SILENT for a
 * malformed LIMIT value) after a diagnostic. */
int prte_rmaps_base_set_app_binding_policy(prte_app_context_t *app, char *spec);

These are declared in src/mca/rmaps/base/base.h.

Both functions mirror the job-level prte_rmaps_base_set_ranking_policy() and the binding-policy parser exactly, differing only in that they write their result onto app->attributes rather than jdata->map. Every ranking object and binding object/modifier expressible at the job level is therefore expressible per app, satisfying Goal 1 for --rank-by and --bind-to to the same degree as --map-by.

8.1.1.5.1. Attribute storage convention

To avoid storing both a raw string and a parsed integer for the same concept, the new attributes store the parsed policy value (uint16_t), not the original string. The string attribute keys defined in §Attributes are used only for the schizo/CLI layer to record the unparsed directive before the base layer processes it; once parsed, the uint16_t value replaces the string in the attribute.

Alternatively — and this is the recommended approach for simplicity — the string attributes are only used by the schizo/CLI parsing layer; the base layer’s new prte_rmaps_base_set_app_* functions accept the string, parse it, and store the result as additional attributes:

Parsed result

Attribute

Type

Mapping policy enum

PRTE_APP_MAPBY

PMIX_UINT16

Ranking policy enum

PRTE_APP_RANKBY

PMIX_UINT16

Binding policy enum

PRTE_APP_BINDTO

PMIX_UINT16

PPR pattern string

PRTE_APP_PPR

PMIX_STRING

CPUs per rank

PRTE_APP_PES_PER_PROC

PMIX_UINT16

CPU set string

PRTE_APP_CPUSET

PMIX_STRING

Map/rankfile path

PRTE_APP_MAP_FILE

PMIX_STRING

Use hwthreads

PRTE_APP_HWT_CPUS

PMIX_BOOL

Use cores

PRTE_APP_CORE_CPUS

PMIX_BOOL

Binding limit

PRTE_APP_BINDING_LIMIT

PMIX_UINT16

8.1.1.5.1.1. These attributes must be stored with PRTE_ATTR_GLOBAL, never PRTE_ATTR_LOCAL

This is a correctness requirement, not a stylistic one, and it is easy to get wrong. The per-app directives are set on the app context while the spawn request is being processed, but the request is then serialized and relayed to the DVM master before prte_rmaps_base_map_job() runs. Only GLOBAL attributes are packed; LOCAL attributes are silently dropped during that transfer.

If the prte_rmaps_base_set_app_* helpers store these attributes as PRTE_ATTR_LOCAL, they vanish before mapping: the any_per_app scan (see below) finds nothing, the per-app dispatch path is never taken, and every app is mapped, ranked, and bound by the job-level policy regardless of its own directives — with no error reported. The single-app case can appear to “work” only because its directive coincides with the job-level policy, which masks the defect.

Store every PRTE_APP_* attribute listed above with PRTE_ATTR_GLOBAL, matching the convention already used for PRTE_APP_PPR and PRTE_APP_PES_PER_PROC in the spawn handler.

8.1.1.6. Changes to prte_rmaps_base_map_job() (src/mca/rmaps/base/rmaps_base_map_job.c)

This is the primary structural change.

8.1.1.6.1. Step 1 — resolve job-level defaults (unchanged)

The existing inheritance, policy-resolution, and process-count logic runs as now and populates jdata->map->mapping, jdata->map->ranking, jdata->map->binding, and the job-level options struct. This path is unchanged and provides the fallback for apps that carry no per-app directives.

8.1.1.6.2. Step 2 — check whether any app has per-app directives

After job-level resolution, scan the apps array:

bool any_per_app = false;
for (n = 0; n < jdata->apps->size; n++) {
    app = pmix_pointer_array_get_item(jdata->apps, n);
    if (NULL == app) continue;
    if (prte_get_attribute(&app->attributes, PRTE_APP_MAPBY, NULL, PMIX_UINT16) ||
        prte_get_attribute(&app->attributes, PRTE_APP_RANKBY, NULL, PMIX_UINT16) ||
        prte_get_attribute(&app->attributes, PRTE_APP_BINDTO, NULL, PMIX_UINT16)) {
        any_per_app = true;
        break;
    }
}

If any_per_app is false, the existing single-dispatch path runs unchanged.

8.1.1.6.3. Step 3 — per-app dispatch loop (new path)

When any_per_app is true, replace the single component-dispatch block with a loop over app contexts:

for (n = 0; n < jdata->apps->size; n++) {
    app = pmix_pointer_array_get_item(jdata->apps, n);
    if (NULL == app) continue;

    /* Build a per-app copy of options starting from the job-level defaults */
    prte_rmaps_options_t app_options = options;   /* shallow copy */
    app_options.app_idx = n;

    /* Override with app-level directives where present */
    rc = prte_rmaps_base_resolve_app_options(jdata, app, &app_options);
    if (PRTE_SUCCESS != rc) {
        PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_MAP_FAILED);
        goto cleanup;
    }

    /* Compute process count for this app (if not already set) */
    rc = prte_rmaps_base_compute_nprocs(jdata, app, &app_options);
    if (PRTE_SUCCESS != rc) {
        PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_MAP_FAILED);
        goto cleanup;
    }

    /* Select and invoke the component appropriate for this app's policy */
    did_map = false;
    PMIX_LIST_FOREACH(mod, &prte_rmaps_base.selected_modules,
                      prte_rmaps_base_selected_module_t) {
        rc = mod->module->map_job(jdata, &app_options);
        if (PRTE_SUCCESS == rc) {
            did_map = true;
            break;
        }
        if (PRTE_ERR_RESOURCE_BUSY == rc) {
            /* oversubscription detected */
            PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_MAP_FAILED);
            goto cleanup;
        }
        /* PRTE_ERR_TAKE_NEXT_OPTION → try next component */
    }
    if (!did_map) {
        pmix_show_help("help-prte-rmaps-base.txt", "failed-map", true, ...);
        PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_MAP_FAILED);
        goto cleanup;
    }
}

8.1.1.6.4. New helper: prte_rmaps_base_resolve_app_options()

Extract into a separate static (or base-exported) function the logic for building app_options from the job-level options plus any per-app overrides:

static int prte_rmaps_base_resolve_app_options(prte_job_t *jdata,
                                               prte_app_context_t *app,
                                               prte_rmaps_options_t *opts)

This function:

  1. Reads PRTE_APP_MAPBY (uint16_t) from app->attributes; if present, stores the masked policy (PRTE_GET_MAPPING_POLICY) into opts->map and refreshes the fields that are derived from the mapping policy — opts->maptype, opts->mapdepth, opts->mapspan, opts->ordered — exactly as the job-level path does after it resolves the job map. The raw value (with its directive bits) is kept locally so the rank/bind defaults in steps 4–5 can read its SPAN directive.

  2. If opts->map is PRTE_MAPPING_PPR, reads PRTE_APP_PPR from app->attributes; if absent falls back to PRTE_JOB_PPR on jdata.

  3. Reads PRTE_APP_PES_PER_PROC, PRTE_APP_HWT_CPUS, PRTE_APP_CORE_CPUS, PRTE_APP_CPUSET, PRTE_APP_MAP_FILE, PRTE_APP_DIST_DEVICE, PRTE_APP_BINDING_LIMIT and overrides the corresponding opts fields.

  4. Ranking. If PRTE_APP_RANKBY is present, stores the masked policy (PRTE_GET_RANKING_POLICY) into opts->rank. Otherwise, if the app supplied its own PRTE_APP_MAPBY (step 1), derives the ranking default from that app’s mapping policy — mirroring the NULL-spec path of prte_rmaps_base_set_ranking_policy() (by-node map → by-node rank, by-slot map → by-slot rank, object map → by-fill, SPAN → by-span). If the app changed neither, the job-level ranking carried in opts stands.

  5. Binding. If PRTE_APP_BINDTO is present, stores the masked policy (PRTE_GET_BINDING_POLICY) into opts->bind and lifts the overload directive into opts->overload. Otherwise, if the app supplied its own PRTE_APP_MAPBY, derives the binding default from that app’s mapping policy — bind to the mapped object (numa/package/cache/core/hwthread), or to core (hwthread when hwthreads are in use) for object-less mappings such as by-node and by-slot.

The crucial point for steps 4–5: when an app overrides its mapping policy but gives no explicit ranking or binding, the defaults must follow that app’s mapping, not the job-level mapping. Inheriting opts->rank/opts->bind unchanged would silently rank and bind the app as if it had been mapped by the job-wide policy.

Two small pure helpers, prte_rmaps_base_derive_ranking(mapping) and prte_rmaps_base_derive_binding(mapping, use_hwthreads), encode the map → rank and map → bind defaults and are reused for both the explicit and defaulted cases.

Masking note: the PRTE_APP_MAPBY/RANKBY/BINDTO attributes carry the policy value with its high-bit directive flags (GIVEN, overload, IS_SET) attached. opts->map/rank/bind are compared against the bare PRTE_MAPPING_*/PRTE_RANK_BY_*/PRTE_BIND_TO_* enums elsewhere, so the resolver must mask off the directive bits (and route the overload bit to opts->overload) rather than assigning the raw attribute value.

The function must be idempotent and must not modify jdata->map — any per-app mapping policy lives only in opts and in app->attributes.

8.1.1.7. Changes to rmaps Components

Every component must be updated to honour options->app_idx.

8.1.1.7.1. Contract

When options->app_idx >= 0, the component processes only the app context at that index in jdata->apps and returns PRTE_ERR_TAKE_NEXT_OPTION for any app it cannot handle. When options->app_idx < 0, the component processes all app contexts as today.

8.1.1.7.2. Required change in each component

In the top-level loop of each map_job function, replace:

for (n = 0; n < jdata->apps->size; n++) {
    app = pmix_pointer_array_get_item(jdata->apps, n);
    if (NULL == app) continue;
    ...
}

with:

for (n = 0; n < jdata->apps->size; n++) {
    app = pmix_pointer_array_get_item(jdata->apps, n);
    if (NULL == app) continue;
    /* honour per-app dispatch */
    if (options->app_idx >= 0 && n != options->app_idx) continue;
    ...
}

This is the only mandatory change to each component. All five components require this change: round_robin, ppr, seq, rank_file, lsf.

8.1.1.7.3. Component selection logic in the per-app path

The mapping policy in options->map determines which component accepts the app. Each component’s map_job already returns PRTE_ERR_TAKE_NEXT_OPTION when the policy is not one it handles. No changes to per-component policy checking are required beyond the app_idx guard above.

The existing component-selection priority ordering (determined by each component’s query priority) is preserved.

8.1.1.8. Changes to Ranking

prte_rmaps_base_compute_vpids() in rmaps_base_ranking.c runs after all apps have been placed. It currently reads jdata->map->ranking to determine the single global ranking strategy.

With per-app ranking the function signature becomes:

int prte_rmaps_base_compute_vpids(prte_job_t *jdata,
                                  prte_rmaps_options_t *options,
                                  int app_idx,
                                  uint32_t *next_vpid);
  1. app_idx selects the app context to rank. When app_idx < 0 the function ranks all apps in one pass exactly as it does today (the job-level path passes -1).

  2. When called from the per-app loop it is invoked once per app with the app’s resolved opts->rank (which honours that app’s PRTE_APP_RANKBY) and that app’s index.

  3. next_vpid carries the running global rank counter between per-app calls: each invocation begins assigning at *next_vpid and updates it to the first unassigned rank on return. Global rank assignment is therefore still monotonically increasing across apps in app-index order, so pptr->name.rank values remain contiguous and non-overlapping across the whole job.

  4. Per-app ranking controls only the order in which processes within that app are assigned their ranks relative to each other (by SLOT, NODE, FILL, or SPAN as that app requested); the starting rank for each app is the first unassigned rank after all previous apps.

8.1.1.9. Changes to Binding

No structural changes are required. prte_rmaps_base_bind_proc() already takes the per-call options struct. Because prte_rmaps_base_setup_proc() is called from within each component’s inner loop with the current options in scope, per-app binding is automatically derived from opts->bind which was set by prte_rmaps_base_resolve_app_options().

The full binding directive is carried per app:

  • opts->bind receives the app’s binding object and the if-supported / overload directive bits decoded from PRTE_APP_BINDTO.

  • opts->limit receives the app’s PRTE_APP_BINDING_LIMIT (the LIMIT=N modifier), defaulting to the job-level value when the app does not set one.

  • opts->cpus_per_rank (PRTE_APP_PES_PER_PROC), opts->use_hwthreads (PRTE_APP_HWT_CPUS / PRTE_APP_CORE_CPUS), and opts->cpuset (PRTE_APP_CPUSET) are likewise resolved per app and feed the binder.

Thus an app may bind to a different object, with different overload/limit behaviour, than its siblings in the same job.

8.1.1.10. Command-line / PMIx-spawn wiring

Per-app directives reach the app context through the PMIx spawn machinery, not through a schizo-only path. The expected command-line representation is:

prun app1 --map-by core : app2 --map-by node --rank-by fill

where : is the MPMD separator between app contexts.

The flow, end to end:

  1. Per-app parsesrc/prted/prte_app_parse.c splits the command line at each : (prte_parse_locals()) and parses each app segment in its own create_app() call. Each segment’s --map-by/--rank-by/--bind-to is recorded on that app’s pmix_app_t.info[] array as PMIX_MAPBY / PMIX_RANKBY / PMIX_BINDTO. Because each app is parsed independently, directives are already correctly scoped to their app context; no MPMD-aware schizo bookkeeping is required.

  2. Spawn — the tool builds the pmix_app_t array (one entry per app, each carrying its own info[]) and calls PMIx_Spawn. Whichever spawn assembly path is used (src/prted/prte.c for the proxy HNP, src/prted/prun_common.c for the tool), each app’s info must be converted from its own app->info list — not from the job-level info — so the per-app keys are preserved.

  3. Server-side translationsrc/prted/pmix/pmix_server_dyn.c (prte_pmix_xfer_app()) walks each app’s info[] and converts the PMIX_MAPBY / PMIX_RANKBY / PMIX_BINDTO keys into the PRTE_APP_* attributes by calling:

    prte_rmaps_base_set_app_mapping_policy(app, info->value.data.string);
    prte_rmaps_base_set_app_ranking_policy(app, info->value.data.string);
    prte_rmaps_base_set_app_binding_policy(app, info->value.data.string);
    

    These helpers parse the string and store the result with PRTE_ATTR_GLOBAL (see §”These attributes must be stored with PRTE_ATTR_GLOBAL”). This is the same path used by a third-party caller of PMIx_Spawn that supplies PMIX_MAPBY etc. in a per-app info[] array, so the CLI and the programmatic spawn API share one implementation.

  4. Job-level directives — options given before any : apply to the whole job and continue to flow through the existing job-level prte_rmaps_base_set_mapping_policy() / set_ranking_policy() / set_binding_policy() functions, which store onto jdata->map. An app that carries no per-app directive inherits these.

No changes to src/mca/schizo/prte/ are required for per-app map/rank/bind: the option definitions already exist, and the per-app association happens in prte_app_parse.c.

8.1.1.10.1. Tool-level argv pre-scan guards

Before schizo runs, the tool launchers themselves walk the raw argv to normalise option spellings. prun (src/tools/prun/prun.c) and the prte HNP launcher (src/prted/prte.c) both rename --rank-by--rankby and --bind-to--bindto, and both currently reject a second occurrence of either option with the multi-instances help message.

Because a per-app MPMD command line repeats --rank-by/--bind-to once per app context, this guard must be removed. --rank-by and --bind-to are made to behave like --map-by, which is already renamed unconditionally with no such guard. Detecting an erroneous duplicate (two job-level --rank-by with no intervening MPMD separator) is left to the schizo MPMD parser, which has the app-context boundaries the flat argv pre-scan lacks.

Both launchers carry an identical copy of this pre-scan loop, so the shared logic is factored into a single helper rather than relaxed twice:

/* src/mca/schizo/base/schizo_base_stubs.c */
char *prte_schizo_base_normalize_argv(char **argv);

It renames all four deprecated option spellings (--map-by, --rank-by, --bind-to, --runtime-options) in place and returns any --personality value found (a pointer into argv, NULL if none). prun and prte each replace their inline loop with a single call to it.

8.1.1.11. Migration Notes

8.1.1.11.1. Existing per-app PPR and PES_PER_PROC

PRTE_APP_PPR (25) and PRTE_APP_PES_PER_PROC (24) are already stored on prte_app_context_t and checked in the ppr component. Their handling is absorbed into the general PRTE_APP_MAPBY path:

  • A standalone PRTE_APP_PPR without PRTE_APP_MAPBY continues to be read by the ppr component as today (backward compatible).

  • PRTE_APP_MAPBY containing a ppr:N:obj spec stores into PRTE_APP_PPR in addition to setting PRTE_APP_MAPBY = PRTE_MAPPING_PPR.

8.1.1.11.2. PRTE_JOB_FILE versus PRTE_APP_MAP_FILE

The existing PRTE_JOB_FILE attribute stores the rankfile/seq file path on the job. The new PRTE_APP_MAP_FILE (29) stores it on an individual app context. The seq and rank_file components must be updated to check PRTE_APP_MAP_FILE on the current app before falling back to PRTE_JOB_FILE on the job.

8.1.1.12. Files Modified

File

Change

src/util/attr.h

Add PRTE_APP_MAPBY through PRTE_APP_BINDING_LIMIT (keys 26–34); raise PRTE_APP_MAX_KEY

src/mca/rmaps/rmaps_types.h

Add app_idx field to prte_rmaps_options_t

src/mca/rmaps/base/base.h

Declare new parsing and resolution functions

src/mca/rmaps/base/rmaps_base_frame.c

Add prte_rmaps_base_set_app_mapping_policy(), prte_rmaps_base_set_app_ranking_policy(), prte_rmaps_base_set_app_binding_policy() — storing every PRTE_APP_* attribute with PRTE_ATTR_GLOBAL (not LOCAL)

src/mca/rmaps/base/rmaps_base_map_job.c

Add per-app detection, per-app dispatch loop, per-app compute_vpids calls, and prte_rmaps_base_resolve_app_options() (with prte_rmaps_base_derive_ranking() / derive_binding() defaulting, directive-bit masking, and map-derived field refresh)

src/mca/rmaps/base/rmaps_base_ranking.c

Add app_idx parameter to prte_rmaps_base_compute_vpids()

src/mca/rmaps/round_robin/rmaps_rr.c

Add app_idx guard in app-context loop

src/mca/rmaps/ppr/rmaps_ppr.c

Add app_idx guard; remove duplicate per-app PPR/PES override (now handled centrally)

src/mca/rmaps/seq/rmaps_seq.c

Add app_idx guard; check PRTE_APP_MAP_FILE before PRTE_JOB_FILE

src/mca/rmaps/rank_file/rmaps_rank_file.c

Add app_idx guard; check PRTE_APP_MAP_FILE before PRTE_JOB_FILE

src/mca/rmaps/lsf/rmaps_lsf.c

Add app_idx guard

src/prted/prte_app_parse.c

Record per-app --map-by/--rank-by/--bind-to as PMIX_MAPBY/PMIX_RANKBY/PMIX_BINDTO on each app’s info[] (already present)

src/prted/pmix/pmix_server_dyn.c

Translate per-app PMIX_MAPBY/RANKBY/BINDTO info into PRTE_APP_* attributes via the set_app_*_policy helpers (already present)

src/prted/prte.c / src/prted/prun_common.c

Ensure each pmix_app_t.info is built from that app’s app->info (per-app), not the job-level info

src/mca/schizo/base/schizo_base_stubs.c (and base.h)

Add shared prte_schizo_base_normalize_argv() helper (no multi-instances guard) used by both tool launchers

src/tools/prun/prun.c

Replace inline argv pre-scan loop with a call to prte_schizo_base_normalize_argv()

src/prted/prte.c

Same replacement (the HNP launcher’s argv pre-scan was a copy of prun’s)

src/mca/rmaps/rmaps_types.h

Rename version macro to PRTE_RMAPS_BASE_VERSION_5_0_0; retain 4_0_0 as deprecated alias

src/mca/rmaps/rmaps.h

Rename module struct/typedef to prte_rmaps_base_module_5_0_0_t

src/mca/rmaps/round_robin/rmaps_rr_component.c

Reference PRTE_RMAPS_BASE_VERSION_5_0_0

src/mca/rmaps/ppr/rmaps_ppr_component.c

Reference PRTE_RMAPS_BASE_VERSION_5_0_0

src/mca/rmaps/seq/rmaps_seq_component.c

Reference PRTE_RMAPS_BASE_VERSION_5_0_0

src/mca/rmaps/rank_file/rmaps_rank_file_component.c

Reference PRTE_RMAPS_BASE_VERSION_5_0_0

src/mca/rmaps/lsf/rmaps_lsf_component.c

Reference PRTE_RMAPS_BASE_VERSION_5_0_0

test/unit/rmaps/ (new directory)

Unit test suite: eight .c files + Makefile.am

test/unit/Makefile.am (new)

SUBDIRS = rmaps

test/Makefile.am (new)

SUBDIRS = unit

Makefile.am

Add test to SUBDIRS (after src)

config/prte_config_files.m4

Add test/Makefile, test/unit/Makefile, test/unit/rmaps/Makefile to AC_CONFIG_FILES

8.1.1.13. Resolved Design Decisions

8.1.1.13.1. Oversubscription is job-level only

OVERSUBSCRIBE and NOOVERSUBSCRIBE are not per-app-context directives. They govern whether the job as a whole is permitted to exceed node slot counts, and that decision must be consistent across all apps sharing the same nodes.

Consequence: if a --map-by string supplied for an individual app context includes an OVERSUBSCRIBE or NOOVERSUBSCRIBE modifier, prte_rmaps_base_set_app_mapping_policy() must treat it as an error, emit a diagnostic via pmix_show_help(), and return PRTE_ERR_BAD_PARAM. The mapping event aborts with PRTE_JOB_STATE_MAP_FAILED.

The PRTE_APP_MAPBY attribute must therefore never carry an oversubscription modifier; those modifiers remain valid only in the job-level --map-by string where they are stored on jdata->map->mapping as today.

8.1.1.13.2. Display map is job-level only

prte_rmaps_base_display_map() is called once after all app contexts have been mapped, and it displays the complete job map. It is not meaningful to display a partial map mid-loop.

PRTE_JOB_DISPLAY_MAP and PRTE_JOB_DISPLAY_DEVEL_MAP remain job-level attributes. There are no per-app-context display-map attributes.

If a caller (e.g. schizo, PMIx spawn) sets a display-map directive on an individual app context, prte_rmaps_base_map_job() must promote it to the job level: after resolving all per-app options but before entering the dispatch loop, scan the apps array and, if any app carries such a directive, set PRTE_JOB_DISPLAY_MAP on jdata->attributes. The per-app copy of the directive is then ignored.

8.1.1.13.3. Spawn inheritance is job-level only

INHERIT and NOINHERIT control whether a spawned child job copies its parent’s mapping/ranking/binding policies. This is a property of the job as a whole — a child either inherits from its parent or it does not.

Consequently, INHERIT and NOINHERIT modifiers are not permitted in a per-app --map-by string. prte_rmaps_base_set_app_mapping_policy() must reject them with PRTE_ERR_BAD_PARAM and a diagnostic, aborting the mapping event, exactly as it does for oversubscription modifiers.

If the CLI or PMIx spawn path presents INHERIT/NOINHERIT at the app level (e.g., via per-app info[] keys), prte_rmaps_base_map_job() must scan all apps before entering the dispatch loop and promote the directive to the job level:

  • If all apps that carry the directive agree (all INHERIT or all NOINHERIT), apply it to jdata->attributes as PRTE_JOB_INHERIT or PRTE_JOB_NOINHERIT.

  • If any two apps carry conflicting directives, emit a diagnostic and abort with PRTE_JOB_STATE_MAP_FAILED.

8.1.1.13.4. NOLOCAL may be applied per app context

The NOLOCAL (PRTE_MAPPING_NO_USE_LOCAL) directive prevents placement of an app’s processes on the HNP node. It is meaningful on a per-app basis — one app in a job may need to avoid the head node while another does not.

NOLOCAL is therefore a valid modifier in a per-app --map-by string. prte_rmaps_base_set_app_mapping_policy() stores it as a directive bit within the PRTE_APP_MAPBY uint16_t attribute (using PRTE_SET_MAPPING_DIRECTIVE).

prte_rmaps_base_resolve_app_options() propagates the bit into opts->map for the current app. prte_rmaps_base_get_target_nodes() already tests PRTE_MAPPING_NO_USE_LOCAL in the mapping policy it receives; no further changes to that function are required.

8.1.1.13.5. All --rank-by objects are valid per app

Ranking has no job-wide-consistency requirement analogous to oversubscription or inheritance: it only fixes the order in which an app’s own processes receive their global ranks. Every --rank-by object — SLOT, NODE, FILL, SPAN — is therefore accepted per app with no forbidden modifiers.

prte_rmaps_base_set_app_ranking_policy() sets the PRTE_RANKING_GIVEN directive bit when it stores PRTE_APP_RANKBY. This is what lets prte_rmaps_base_resolve_app_options() distinguish an app that explicitly requested a ranking from one that should fall back to the default derived from its mapping policy (by-node mapping → by-node ranking, etc.). An app that sets PRTE_APP_MAPBY but not PRTE_APP_RANKBY gets a ranking derived from its own per-app mapping policy, not from the job-level mapping policy.

8.1.1.13.6. All --bind-to objects and modifiers are valid per app

Binding is intrinsically a per-process property, so every --bind-to object (NONE, HWTHREAD, CORE, L1CACHE, L2CACHE, L3CACHE, NUMA, PACKAGE) and every binding modifier (if-supported, overload-allowed, no-overload, LIMIT=N) is accepted per app with no forbidden modifiers.

The no-overload modifier records PRTE_BIND_OVERLOAD_GIVEN without PRTE_BIND_ALLOW_OVERLOAD so that an app can explicitly forbid overload even when the job-level default would have permitted it; resolve_app_options() copies these bits into opts->bind for the app, overriding the job-level binding directive in its entirety rather than merging bit-by-bit.

8.1.1.14. Framework Version Increment

The app_idx field added to prte_rmaps_options_t changes the contract between the base layer and every component: components are now required to honour options->app_idx and skip apps that do not match. This is a breaking interface change for any out-of-tree component built against the previous headers.

The framework version must be incremented from 4.0.0 to 5.0.0:

/* src/mca/rmaps/rmaps_types.h */
#define PRTE_RMAPS_BASE_VERSION_5_0_0 PRTE_MCA_BASE_VERSION_3_0_0("rmaps", 5, 0, 0)

The old macro PRTE_RMAPS_BASE_VERSION_4_0_0 should be retained as a deprecated alias pointing at the new value so that out-of-tree components that have not been updated produce a link-time or runtime mismatch rather than a silent ABI violation.

All five in-tree component files must be updated to reference PRTE_RMAPS_BASE_VERSION_5_0_0 in their component struct:

Component file

Change

round_robin/rmaps_rr_component.c

PRTE_RMAPS_BASE_VERSION_4_0_0PRTE_RMAPS_BASE_VERSION_5_0_0

ppr/rmaps_ppr_component.c

same

seq/rmaps_seq_component.c

same

rank_file/rmaps_rank_file_component.c

same

lsf/rmaps_lsf_component.c

same

The module struct typedef and convenience alias in rmaps.h must also be updated:

/* src/mca/rmaps/rmaps.h */
struct prte_rmaps_base_module_5_0_0_t { ... };
typedef struct prte_rmaps_base_module_5_0_0_t prte_rmaps_base_module_5_0_0_t;
typedef prte_rmaps_base_module_5_0_0_t prte_rmaps_base_module_t;

8.1.1.15. Unit Tests

PRRTE currently has no unit test suite for the rmaps framework. This work introduces non-trivial new logic in prte_rmaps_base_map_job() and prte_rmaps_base_resolve_app_options() that must be verified independently of a live DVM. A new unit test tree is required.

8.1.1.15.1. Offline end-to-end verification (no launch)

In addition to the unit tests below, the whole per-app path — CLI parse → pmix_app_t.info[]PRTE_APP_* attributes → map_job dispatch → placement/ranking/binding — can and must be exercised end to end without launching anything:

prterun --rtos donotlaunch --display map \
        --prtemca hwloc_use_topo_file test/unit/rmaps/test-topo.xml \
        -H node0:N,node1:M,node2:L \
        --map-by node -n 4 hostname : --map-by slot --rank-by node -n 4 hostname

--rtos donotlaunch runs the mapper/ranker/binder and prints the map without forking any process; --prtemca hwloc_use_topo_file supplies a simulated node topology (so binding resolves against real objects); -H declares the simulated nodes (slot counts only need to be ≥ the procs placed on each node). The printed map shows each process’s app index, rank, and bound object. See the “Testing the mapper without launching” section of AGENTS.md for the full description.

This offline check is what catches the class of failure described in §”These attributes must be stored with PRTE_ATTR_GLOBAL”: a per-app directive that parses correctly but is silently dropped before mapping shows up immediately here as an app whose placement/rank/binding does not change when its per-app policy changes. Verification must include a multi-app MPMD case — a single-app job can pass even when per-app attributes are being lost, because its lone directive coincides with the job-level policy.

8.1.1.15.2. Location

test/unit/rmaps/
    Makefile.am
    test_rmaps_main.c          — harness: init/finalize, run all suites
    test_resolve_options.c     — prte_rmaps_base_resolve_app_options()
    test_policy_parse.c        — prte_rmaps_base_set_app_mapping_policy() etc.
    test_dispatch.c            — per-app dispatch loop in map_job
    test_round_robin.c         — round_robin component with app_idx
    test_ppr.c                 — ppr component with app_idx
    test_seq.c                 — seq component with app_idx
    test_rank_file.c           — rank_file component with app_idx

test/unit/rmaps/ is added to the SUBDIRS list in test/unit/Makefile.am (creating that file if it does not yet exist) and wired into the top-level make check target.

8.1.1.15.3. Test harness requirements

The tests link against the PRRTE static libraries but do not require a running DVM. They use the same minimal-init pattern as the PMIx unit tests: call prte_init() in server-tool mode, bypassing daemon launch. A lightweight stub replaces PRTE_ACTIVATE_JOB_STATE for the mapping-failed path so tests can assert the failure code without triggering the state machine.

8.1.1.15.4. Coverage required per test file

``test_policy_parse.c``prte_rmaps_base_set_app_mapping_policy(), prte_rmaps_base_set_app_ranking_policy(), prte_rmaps_base_set_app_binding_policy():

Mapping (set_app_mapping_policy):

  • Valid single-word policies (core, node, slot, ppr:2:core, hwthread, etc.) store the correct uint16_t in app->attributes.

  • All valid modifiers (NOLOCAL, PE=N, ORDERED, HWTCPUS, CORECPUS, FILE=path) parse and store correctly.

  • Forbidden modifiers (OVERSUBSCRIBE, NOOVERSUBSCRIBE, INHERIT, NOINHERIT) return PRTE_ERR_BAD_PARAM.

  • Malformed strings (missing value after =, unknown keyword, conflicting HWTCPUS/CORECPUS) return appropriate error codes.

Ranking (set_app_ranking_policy):

  • Each object (slot, node, fill, span) stores the matching PRTE_RANK_BY_* value in PRTE_APP_RANKBY with PRTE_RANKING_GIVEN set.

  • An unrecognized object returns PRTE_ERR_SILENT and leaves app->attributes unchanged.

Binding (set_app_binding_policy):

  • Each object (none, hwthread, core, l1cache, l2cache, l3cache, numa, package) stores the matching PRTE_BIND_TO_* value in PRTE_APP_BINDTO.

  • Modifiers if-supported and overload-allowed set PRTE_BIND_IF_SUPPORTED / PRTE_BIND_ALLOW_OVERLOAD (with PRTE_BIND_OVERLOAD_GIVEN); no-overload sets PRTE_BIND_OVERLOAD_GIVEN without the allow bit.

  • LIMIT=N stores N as PRTE_APP_BINDING_LIMIT; a non-numeric LIMIT value returns PRTE_ERR_SILENT.

  • An unrecognized object or modifier returns PRTE_ERR_BAD_PARAM.

``test_resolve_options.c``prte_rmaps_base_resolve_app_options():

  • App with no per-app attributes: app_options is identical to the job-level options.

  • App with PRTE_APP_MAPBY set: opts->map reflects the app value, not the job value.

  • App with PRTE_APP_NOLOCAL directive bit set: opts->map carries PRTE_MAPPING_NO_USE_LOCAL.

  • App with PRTE_APP_PES_PER_PROC / PRTE_APP_HWT_CPUS / PRTE_APP_CPUSET: correct override.

  • Fallback chain: PRTE_APP_PPR absent → PRTE_JOB_PPR used.

``test_dispatch.c`` — the per-app detection and dispatch loop in prte_rmaps_base_map_job():

  • Job with no per-app directives: single-dispatch path taken (verified by mock component call count = 1).

  • Job with at least one app carrying PRTE_APP_MAPBY: per-app path taken; mock component called once per app.

  • OVERSUBSCRIBE in a per-app --map-by string: mapping aborts with PRTE_JOB_STATE_MAP_FAILED.

  • Conflicting INHERIT/NOINHERIT across apps: mapping aborts.

  • Any app with display-map directive: PRTE_JOB_DISPLAY_MAP promoted to job level.

  • ``NOLOCAL`` on app[0], not on app[1], shared HNP node (see below).

``test_round_robin.c``, ``test_ppr.c``, ``test_seq.c``, ``test_rank_file.c``:

Each file tests its component with the app_idx field in both modes:

  • app_idx = -1: component maps all apps (baseline, existing behaviour).

  • app_idx = 0 on a two-app job: only app[0] is mapped; app[1] procs remain unplaced.

  • app_idx = 1 on a two-app job: only app[1] is mapped; app[0] procs remain unplaced.

  • app_idx set to an index with NULL in the apps array: component skips gracefully.

8.1.1.15.5. Dedicated test: NOLOCAL on app[0] with shared HNP node

Purpose. Verify that excluding the HNP node from app[0]’s target list via NOLOCAL leaves no persistent side-effect on the prte_node_t that would prevent app[1] (which carries no NOLOCAL) from placing processes on that same node.

Setup. Construct a synthetic allocation of three nodes: the HNP node (node0, rank 0) and two worker nodes (node1, node2). Create a job with two app contexts:

  • app[0]: 4 processes, PRTE_APP_MAPBY = PRTE_MAPPING_BYSLOT with PRTE_MAPPING_NO_USE_LOCAL set in the directive bits.

  • app[1]: 2 processes, PRTE_APP_MAPBY = PRTE_MAPPING_BYSLOT, no NOLOCAL.

Execution. Run the per-app dispatch loop (the same code path exercised by prte_rmaps_base_map_job() when any_per_app is true).

Assertions.

  1. None of app[0]’s four processes are assigned to node0. All four land on node1 and node2.

  2. At least one of app[1]’s two processes is assigned to node0, confirming that the HNP node was not permanently marked as excluded or had its slot count incorrectly zeroed by app[0]’s mapping pass.

  3. jdata->map->nodes contains all three nodes (the job map is the union of nodes used by any app).

  4. node0’s available slot count after both apps have been mapped reflects only the slots consumed by app[1]’s processes, not a spurious reduction from app[0].

What this catches. If prte_rmaps_base_get_target_nodes() constructs its target list by removing nodes in-place from a shared list, or if it sets a persistent flag on prte_node_t (e.g., modifying node->slots_inuse or a “do not use” flag without resetting it), app[1]’s target list will be missing node0 and assertion 2 will fail. The test thereby confirms that node list construction is stateless with respect to per-app NOLOCAL decisions.

8.1.1.15.6. Makefile.am for the test suite

# test/unit/rmaps/Makefile.am
#
# Copyright (c) 2026      Nanook Consulting  All rights reserved.
# $COPYRIGHT$
# Additional copyrights may follow
# $HEADER$

AM_CPPFLAGS = \
    -I$(top_srcdir)/src \
    -I$(top_srcdir)/include \
    -I$(top_srcdir)

check_PROGRAMS = test_rmaps

test_rmaps_SOURCES = \
    test_rmaps_main.c      \
    test_policy_parse.c    \
    test_resolve_options.c \
    test_dispatch.c        \
    test_round_robin.c     \
    test_ppr.c             \
    test_seq.c             \
    test_rank_file.c

test_rmaps_LDADD = $(top_builddir)/src/libprrte.la

TESTS = test_rmaps

8.1.1.16. make check Integration

PRRTE currently has no make check target. This section specifies everything required to wire the new unit test suite into the Automake check framework so that make check builds and runs the tests from the top of the source tree.

8.1.1.16.1. 1. Add test/unit/ to the build tree

Create test/unit/Makefile.am:

# test/unit/Makefile.am
#
# $COPYRIGHT$
# Additional copyrights may follow
# $HEADER$

SUBDIRS = rmaps

Create test/Makefile.am:

# test/Makefile.am
#
# $COPYRIGHT$
# Additional copyrights may follow
# $HEADER$

SUBDIRS = unit

8.1.1.16.2. 2. Wire test/ into the top-level build

In Makefile.am, add test to SUBDIRS:

# Makefile.am  (excerpt — existing SUBDIRS line)
SUBDIRS = config contrib src include docs test

The test entry must come after src so that libprrte.la is built before the test programs that link against it.

8.1.1.16.3. 3. Register test/ Makefiles in the configure system

In config/prte_config_files.m4, extend the AC_CONFIG_FILES call:

AC_DEFUN([PRTE_CONFIG_FILES],[
    AC_CONFIG_FILES([
        src/Makefile
        ...existing entries...
        test/Makefile
        test/unit/Makefile
        test/unit/rmaps/Makefile
    ])
])

8.1.1.16.4. 4. Automake check mechanics

Automake’s make check target automatically builds everything listed in check_PROGRAMS and then runs everything listed in TESTS. Because TESTS = test_rmaps is set in test/unit/rmaps/Makefile.am, running make check from the top of the build tree will:

  1. Build test_rmaps (and its dependency libprrte.la if not already built).

  2. Execute ./test_rmaps.

  3. Report pass/fail based on the exit code.

No additional Automake variables or test-driver configuration are required for a simple binary-exit-code test. If a TAP-based driver is preferred in future, AM_TESTS_ENVIRONMENT and LOG_DRIVER can be added at that point without changing the test source files.

8.1.1.16.5. 5. autogen.pl / configure impact

Adding test/Makefile.am, test/unit/Makefile.am, and test/unit/rmaps/Makefile.am to the source tree and listing them in config/prte_config_files.m4 is sufficient. No new configure.ac macros are needed. Developers must re-run ./autogen.pl && ./configure after pulling these files for the first time.

8.1.1.16.6. 6. Isolation from normal builds

The test programs are listed under check_PROGRAMS, not bin_PROGRAMS or noinst_PROGRAMS. Automake only builds check_PROGRAMS when make check is explicitly invoked; a plain make or make install does not build them. This keeps the normal build fast and does not install test binaries.

8.1.1.16.7. 7. Developer workflow

# After configure:
make -j$(nproc)          # normal build, does not compile tests
make check               # build and run all unit tests
make check -C test/unit/rmaps   # run only the rmaps suite

8.1.1.17. Open Questions

None.