8.1.1. Per-App-Context Mapping Policies
8.1.1.1. Overview
Today every prte_job_t carries a single prte_job_map_t (mapping/ranking/binding
policy triple) and a single resolved prte_rmaps_options_t that is passed unchanged
to whichever rmaps component wins selection. All app contexts in the job are mapped
by that one component under that one policy.
This document specifies the changes required to allow each prte_app_context_t to
carry its own mapping, ranking, and binding directives so that different apps within
the same job can be mapped by different components under different policies.
8.1.1.2. Goals
Every directive expressible at job level via
--map-by,--rank-by, and--bind-tomust be expressible at app-context level.When an app carries no per-app directives, it inherits the job-level policy unchanged — no behaviour change for existing usage.
The rmaps component selected for each app context is determined by that app’s resolved mapping policy, not by the job’s.
The existing component interface (
map_job(prte_job_t *, prte_rmaps_options_t *)) is preserved with a minimal, backward-compatible extension.Global rank assignment (vpid computation) remains a single coordinated pass across all apps after all apps have been placed.
8.1.1.3. Attributes Required on prte_app_context_t
Add the following new attribute keys to src/util/attr.h in the
PRTE_APP_* range (next available keys after PRTE_APP_PPR = 25):
/* Mapping policy string for this app, same syntax as --map-by.
* When present overrides the job-level mapping policy for this app. */
#define PRTE_APP_MAPBY 26 // char* - e.g. "core:pe=2:oversubscribe"
/* Ranking policy string for this app, same syntax as --rank-by. */
#define PRTE_APP_RANKBY 27 // char* - e.g. "fill"
/* Binding policy string for this app, same syntax as --bind-to. */
#define PRTE_APP_BINDTO 28 // char* - e.g. "core"
/* File to use for sequential or rankfile mapping for this app.
* Distinct from PRTE_APP_HOSTFILE which lists nodes; this is the
* ordering/affinity file consumed by the seq and rank_file components. */
#define PRTE_APP_MAP_FILE 29 // char* - path to seq or rankfile
/* Device name for dist mapping for this app. */
#define PRTE_APP_DIST_DEVICE 30 // char* - e.g. "mlx5_0"
/* Use hwthreads as CPUs for this app. */
#define PRTE_APP_HWT_CPUS 31 // bool
/* Use cores as CPUs for this app (explicit, not relying on absence of HWT). */
#define PRTE_APP_CORE_CPUS 32 // bool
/* PE-list (CPU set) for this app, same syntax as pe-list= modifier. */
#define PRTE_APP_CPUSET 33 // char* - comma-delimited CPU ranges
/* Max procs to bind to a target object before moving to the next. */
#define PRTE_APP_BINDING_LIMIT 34 // uint16_t
PRTE_APP_MAX_KEY must be raised to accommodate the new keys.
8.1.1.3.1. Relationship to existing attributes
The already-defined PRTE_APP_PES_PER_PROC (24) and PRTE_APP_PPR (25) are
preserved as-is; their semantics are unchanged.
PRTE_APP_MAPBY supersedes PRTE_APP_PPR and PRTE_APP_PES_PER_PROC when
present — the full PRTE_APP_MAPBY string is the canonical per-app mapping
directive and is parsed by the same machinery as the job-level --map-by
string (see §Parsing below).
8.1.1.4. Changes to prte_rmaps_options_t (src/mca/rmaps/rmaps_types.h)
Add one field:
typedef struct {
/* ... existing fields unchanged ... */
/* When >= 0, the component must map only the app context at this index
* within jdata->apps and must skip all others.
* When < 0 (default, set to -1), the component maps all app contexts
* as it does today. */
int app_idx;
} prte_rmaps_options_t;
The new field is initialized to -1 (map all apps) in prte_rmaps_base_map_job
via the existing memset(&options, 0, ...) plus an explicit assignment immediately
after:
memset(&options, 0, sizeof(prte_rmaps_options_t));
options.app_idx = -1; /* map all apps by default */
8.1.1.5. New Parsing Functions (src/mca/rmaps/base/rmaps_base_frame.c)
The existing prte_rmaps_base_set_mapping_policy(prte_job_t *jdata, char *spec)
and check_modifiers() store their results into jdata->map->mapping and
jdata->attributes. They must not be changed.
Add three new functions with identical parsing logic but storing results into
app->attributes:
/* Parse a --map-by style string and store the result on app->attributes.
* All mapping policy values, modifiers, and options that check_modifiers()
* handles at the job level are handled here at the app level. PPR pattern
* is stored as PRTE_APP_PPR; pe count as PRTE_APP_PES_PER_PROC; cpuset as
* PRTE_APP_CPUSET; file as PRTE_APP_MAP_FILE; hwthread/core CPU mode as
* PRTE_APP_HWT_CPUS / PRTE_APP_CORE_CPUS. The parsed mapping policy enum
* value is stored as PRTE_APP_MAPBY (as a uint16_t, not the original string,
* once resolved) — see §Resolution below for how this is read back. */
int prte_rmaps_base_set_app_mapping_policy(prte_app_context_t *app, char *spec);
/* Parse a --rank-by style string and store the result as PRTE_APP_RANKBY
* (uint16_t ranking policy) on app->attributes. Accepts the same ranking
* objects as the job-level --rank-by: SLOT, NODE, FILL, and SPAN. The
* PRTE_RANKING_GIVEN directive bit is set so the resolve step knows the
* value was supplied explicitly (and must not be re-derived from the app's
* mapping policy). An unrecognized object returns PRTE_ERR_SILENT after a
* diagnostic. */
int prte_rmaps_base_set_app_ranking_policy(prte_app_context_t *app, char *spec);
/* Parse a --bind-to style string and store the result as PRTE_APP_BINDTO
* (uint16_t binding policy) on app->attributes. Accepts the same binding
* objects as the job-level --bind-to: NONE, HWTHREAD, CORE, L1CACHE,
* L2CACHE, L3CACHE, NUMA, and PACKAGE. The ":"-delimited modifiers
* if-supported, overload-allowed, no-overload, and LIMIT=N are parsed and
* recorded: if-supported/overload directives become directive bits within
* the PRTE_APP_BINDTO uint16_t (PRTE_BIND_IF_SUPPORTED,
* PRTE_BIND_ALLOW_OVERLOAD / PRTE_BIND_OVERLOAD_GIVEN), while LIMIT=N is
* stored separately as PRTE_APP_BINDING_LIMIT (uint16_t). An unrecognized
* object or modifier returns PRTE_ERR_BAD_PARAM (or PRTE_ERR_SILENT for a
* malformed LIMIT value) after a diagnostic. */
int prte_rmaps_base_set_app_binding_policy(prte_app_context_t *app, char *spec);
These are declared in src/mca/rmaps/base/base.h.
Both functions mirror the job-level prte_rmaps_base_set_ranking_policy() and
the binding-policy parser exactly, differing only in that they write their
result onto app->attributes rather than jdata->map. Every ranking
object and binding object/modifier expressible at the job level is therefore
expressible per app, satisfying Goal 1 for --rank-by and --bind-to to
the same degree as --map-by.
8.1.1.5.1. Attribute storage convention
To avoid storing both a raw string and a parsed integer for the same concept, the new attributes store the parsed policy value (uint16_t), not the original string. The string attribute keys defined in §Attributes are used only for the schizo/CLI layer to record the unparsed directive before the base layer processes it; once parsed, the uint16_t value replaces the string in the attribute.
Alternatively — and this is the recommended approach for simplicity — the
string attributes are only used by the schizo/CLI parsing layer; the base
layer’s new prte_rmaps_base_set_app_* functions accept the string, parse it,
and store the result as additional attributes:
Parsed result |
Attribute |
Type |
|---|---|---|
Mapping policy enum |
|
|
Ranking policy enum |
|
|
Binding policy enum |
|
|
PPR pattern string |
|
|
CPUs per rank |
|
|
CPU set string |
|
|
Map/rankfile path |
|
|
Use hwthreads |
|
|
Use cores |
|
|
Binding limit |
|
|
8.1.1.5.1.1. These attributes must be stored with PRTE_ATTR_GLOBAL, never PRTE_ATTR_LOCAL
This is a correctness requirement, not a stylistic one, and it is easy to get
wrong. The per-app directives are set on the app context while the spawn
request is being processed, but the request is then serialized and relayed to
the DVM master before prte_rmaps_base_map_job() runs. Only GLOBAL
attributes are packed; LOCAL attributes are silently dropped during that
transfer.
If the prte_rmaps_base_set_app_* helpers store these attributes as
PRTE_ATTR_LOCAL, they vanish before mapping: the any_per_app scan
(see below) finds nothing, the per-app dispatch path is never taken, and every
app is mapped, ranked, and bound by the job-level policy regardless of its own
directives — with no error reported. The single-app case can appear to “work”
only because its directive coincides with the job-level policy, which masks the
defect.
Store every PRTE_APP_* attribute listed above with PRTE_ATTR_GLOBAL,
matching the convention already used for PRTE_APP_PPR and
PRTE_APP_PES_PER_PROC in the spawn handler.
8.1.1.6. Changes to prte_rmaps_base_map_job() (src/mca/rmaps/base/rmaps_base_map_job.c)
This is the primary structural change.
8.1.1.6.1. Step 1 — resolve job-level defaults (unchanged)
The existing inheritance, policy-resolution, and process-count logic runs as
now and populates jdata->map->mapping, jdata->map->ranking,
jdata->map->binding, and the job-level options struct. This path is
unchanged and provides the fallback for apps that carry no per-app directives.
8.1.1.6.2. Step 2 — check whether any app has per-app directives
After job-level resolution, scan the apps array:
bool any_per_app = false;
for (n = 0; n < jdata->apps->size; n++) {
app = pmix_pointer_array_get_item(jdata->apps, n);
if (NULL == app) continue;
if (prte_get_attribute(&app->attributes, PRTE_APP_MAPBY, NULL, PMIX_UINT16) ||
prte_get_attribute(&app->attributes, PRTE_APP_RANKBY, NULL, PMIX_UINT16) ||
prte_get_attribute(&app->attributes, PRTE_APP_BINDTO, NULL, PMIX_UINT16)) {
any_per_app = true;
break;
}
}
If any_per_app is false, the existing single-dispatch path runs unchanged.
8.1.1.6.3. Step 3 — per-app dispatch loop (new path)
When any_per_app is true, replace the single component-dispatch block with
a loop over app contexts:
for (n = 0; n < jdata->apps->size; n++) {
app = pmix_pointer_array_get_item(jdata->apps, n);
if (NULL == app) continue;
/* Build a per-app copy of options starting from the job-level defaults */
prte_rmaps_options_t app_options = options; /* shallow copy */
app_options.app_idx = n;
/* Override with app-level directives where present */
rc = prte_rmaps_base_resolve_app_options(jdata, app, &app_options);
if (PRTE_SUCCESS != rc) {
PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_MAP_FAILED);
goto cleanup;
}
/* Compute process count for this app (if not already set) */
rc = prte_rmaps_base_compute_nprocs(jdata, app, &app_options);
if (PRTE_SUCCESS != rc) {
PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_MAP_FAILED);
goto cleanup;
}
/* Select and invoke the component appropriate for this app's policy */
did_map = false;
PMIX_LIST_FOREACH(mod, &prte_rmaps_base.selected_modules,
prte_rmaps_base_selected_module_t) {
rc = mod->module->map_job(jdata, &app_options);
if (PRTE_SUCCESS == rc) {
did_map = true;
break;
}
if (PRTE_ERR_RESOURCE_BUSY == rc) {
/* oversubscription detected */
PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_MAP_FAILED);
goto cleanup;
}
/* PRTE_ERR_TAKE_NEXT_OPTION → try next component */
}
if (!did_map) {
pmix_show_help("help-prte-rmaps-base.txt", "failed-map", true, ...);
PRTE_ACTIVATE_JOB_STATE(jdata, PRTE_JOB_STATE_MAP_FAILED);
goto cleanup;
}
}
8.1.1.6.4. New helper: prte_rmaps_base_resolve_app_options()
Extract into a separate static (or base-exported) function the logic for
building app_options from the job-level options plus any per-app
overrides:
static int prte_rmaps_base_resolve_app_options(prte_job_t *jdata,
prte_app_context_t *app,
prte_rmaps_options_t *opts)
This function:
Reads
PRTE_APP_MAPBY(uint16_t) fromapp->attributes; if present, stores the masked policy (PRTE_GET_MAPPING_POLICY) intoopts->mapand refreshes the fields that are derived from the mapping policy —opts->maptype,opts->mapdepth,opts->mapspan,opts->ordered— exactly as the job-level path does after it resolves the job map. The raw value (with its directive bits) is kept locally so the rank/bind defaults in steps 4–5 can read itsSPANdirective.If
opts->mapisPRTE_MAPPING_PPR, readsPRTE_APP_PPRfromapp->attributes; if absent falls back toPRTE_JOB_PPRonjdata.Reads
PRTE_APP_PES_PER_PROC,PRTE_APP_HWT_CPUS,PRTE_APP_CORE_CPUS,PRTE_APP_CPUSET,PRTE_APP_MAP_FILE,PRTE_APP_DIST_DEVICE,PRTE_APP_BINDING_LIMITand overrides the correspondingoptsfields.Ranking. If
PRTE_APP_RANKBYis present, stores the masked policy (PRTE_GET_RANKING_POLICY) intoopts->rank. Otherwise, if the app supplied its ownPRTE_APP_MAPBY(step 1), derives the ranking default from that app’s mapping policy — mirroring the NULL-spec path ofprte_rmaps_base_set_ranking_policy()(by-node map → by-node rank, by-slot map → by-slot rank, object map → by-fill,SPAN→ by-span). If the app changed neither, the job-level ranking carried inoptsstands.Binding. If
PRTE_APP_BINDTOis present, stores the masked policy (PRTE_GET_BINDING_POLICY) intoopts->bindand lifts the overload directive intoopts->overload. Otherwise, if the app supplied its ownPRTE_APP_MAPBY, derives the binding default from that app’s mapping policy — bind to the mapped object (numa/package/cache/core/hwthread), or to core (hwthread when hwthreads are in use) for object-less mappings such as by-node and by-slot.
The crucial point for steps 4–5: when an app overrides its mapping policy but
gives no explicit ranking or binding, the defaults must follow that app’s
mapping, not the job-level mapping. Inheriting opts->rank/opts->bind
unchanged would silently rank and bind the app as if it had been mapped by the
job-wide policy.
Two small pure helpers, prte_rmaps_base_derive_ranking(mapping) and
prte_rmaps_base_derive_binding(mapping, use_hwthreads), encode the
map → rank and map → bind defaults and are reused for both the explicit and
defaulted cases.
Masking note: the PRTE_APP_MAPBY/RANKBY/BINDTO attributes carry the
policy value with its high-bit directive flags (GIVEN, overload,
IS_SET) attached. opts->map/rank/bind are compared against the
bare PRTE_MAPPING_*/PRTE_RANK_BY_*/PRTE_BIND_TO_* enums elsewhere,
so the resolver must mask off the directive bits (and route the overload bit to
opts->overload) rather than assigning the raw attribute value.
The function must be idempotent and must not modify jdata->map — any
per-app mapping policy lives only in opts and in app->attributes.
8.1.1.7. Changes to rmaps Components
Every component must be updated to honour options->app_idx.
8.1.1.7.1. Contract
When options->app_idx >= 0, the component processes only the app context
at that index in jdata->apps and returns PRTE_ERR_TAKE_NEXT_OPTION for any
app it cannot handle. When options->app_idx < 0, the component processes all
app contexts as today.
8.1.1.7.2. Required change in each component
In the top-level loop of each map_job function, replace:
for (n = 0; n < jdata->apps->size; n++) {
app = pmix_pointer_array_get_item(jdata->apps, n);
if (NULL == app) continue;
...
}
with:
for (n = 0; n < jdata->apps->size; n++) {
app = pmix_pointer_array_get_item(jdata->apps, n);
if (NULL == app) continue;
/* honour per-app dispatch */
if (options->app_idx >= 0 && n != options->app_idx) continue;
...
}
This is the only mandatory change to each component. All five components
require this change: round_robin, ppr, seq, rank_file, lsf.
8.1.1.7.3. Component selection logic in the per-app path
The mapping policy in options->map determines which component accepts the
app. Each component’s map_job already returns PRTE_ERR_TAKE_NEXT_OPTION
when the policy is not one it handles. No changes to per-component policy
checking are required beyond the app_idx guard above.
The existing component-selection priority ordering (determined by each component’s query priority) is preserved.
8.1.1.8. Changes to Ranking
prte_rmaps_base_compute_vpids() in rmaps_base_ranking.c runs after all
apps have been placed. It currently reads jdata->map->ranking to determine
the single global ranking strategy.
With per-app ranking the function signature becomes:
int prte_rmaps_base_compute_vpids(prte_job_t *jdata,
prte_rmaps_options_t *options,
int app_idx,
uint32_t *next_vpid);
app_idxselects the app context to rank. Whenapp_idx < 0the function ranks all apps in one pass exactly as it does today (the job-level path passes-1).When called from the per-app loop it is invoked once per app with the app’s resolved
opts->rank(which honours that app’sPRTE_APP_RANKBY) and that app’s index.next_vpidcarries the running global rank counter between per-app calls: each invocation begins assigning at*next_vpidand updates it to the first unassigned rank on return. Global rank assignment is therefore still monotonically increasing across apps in app-index order, sopptr->name.rankvalues remain contiguous and non-overlapping across the whole job.Per-app ranking controls only the order in which processes within that app are assigned their ranks relative to each other (by SLOT, NODE, FILL, or SPAN as that app requested); the starting rank for each app is the first unassigned rank after all previous apps.
8.1.1.9. Changes to Binding
No structural changes are required. prte_rmaps_base_bind_proc() already
takes the per-call options struct. Because prte_rmaps_base_setup_proc()
is called from within each component’s inner loop with the current options
in scope, per-app binding is automatically derived from opts->bind which
was set by prte_rmaps_base_resolve_app_options().
The full binding directive is carried per app:
opts->bindreceives the app’s binding object and the if-supported / overload directive bits decoded fromPRTE_APP_BINDTO.opts->limitreceives the app’sPRTE_APP_BINDING_LIMIT(theLIMIT=Nmodifier), defaulting to the job-level value when the app does not set one.opts->cpus_per_rank(PRTE_APP_PES_PER_PROC),opts->use_hwthreads(PRTE_APP_HWT_CPUS/PRTE_APP_CORE_CPUS), andopts->cpuset(PRTE_APP_CPUSET) are likewise resolved per app and feed the binder.
Thus an app may bind to a different object, with different overload/limit behaviour, than its siblings in the same job.
8.1.1.10. Command-line / PMIx-spawn wiring
Per-app directives reach the app context through the PMIx spawn machinery, not through a schizo-only path. The expected command-line representation is:
prun app1 --map-by core : app2 --map-by node --rank-by fill
where : is the MPMD separator between app contexts.
The flow, end to end:
Per-app parse —
src/prted/prte_app_parse.csplits the command line at each:(prte_parse_locals()) and parses each app segment in its owncreate_app()call. Each segment’s--map-by/--rank-by/--bind-tois recorded on that app’spmix_app_t.info[]array asPMIX_MAPBY/PMIX_RANKBY/PMIX_BINDTO. Because each app is parsed independently, directives are already correctly scoped to their app context; no MPMD-aware schizo bookkeeping is required.Spawn — the tool builds the
pmix_app_tarray (one entry per app, each carrying its owninfo[]) and callsPMIx_Spawn. Whichever spawn assembly path is used (src/prted/prte.cfor the proxy HNP,src/prted/prun_common.cfor the tool), each app’sinfomust be converted from its ownapp->infolist — not from the job-level info — so the per-app keys are preserved.Server-side translation —
src/prted/pmix/pmix_server_dyn.c(prte_pmix_xfer_app()) walks each app’sinfo[]and converts thePMIX_MAPBY/PMIX_RANKBY/PMIX_BINDTOkeys into thePRTE_APP_*attributes by calling:prte_rmaps_base_set_app_mapping_policy(app, info->value.data.string); prte_rmaps_base_set_app_ranking_policy(app, info->value.data.string); prte_rmaps_base_set_app_binding_policy(app, info->value.data.string);
These helpers parse the string and store the result with
PRTE_ATTR_GLOBAL(see §”These attributes must be stored withPRTE_ATTR_GLOBAL”). This is the same path used by a third-party caller ofPMIx_Spawnthat suppliesPMIX_MAPBYetc. in a per-appinfo[]array, so the CLI and the programmatic spawn API share one implementation.Job-level directives — options given before any
:apply to the whole job and continue to flow through the existing job-levelprte_rmaps_base_set_mapping_policy()/set_ranking_policy()/set_binding_policy()functions, which store ontojdata->map. An app that carries no per-app directive inherits these.
No changes to src/mca/schizo/prte/ are required for per-app map/rank/bind:
the option definitions already exist, and the per-app association happens in
prte_app_parse.c.
8.1.1.10.1. Tool-level argv pre-scan guards
Before schizo runs, the tool launchers themselves walk the raw argv to
normalise option spellings. prun (src/tools/prun/prun.c) and the prte
HNP launcher (src/prted/prte.c) both rename --rank-by → --rankby and
--bind-to → --bindto, and both currently reject a second occurrence of
either option with the multi-instances help message.
Because a per-app MPMD command line repeats --rank-by/--bind-to once per
app context, this guard must be removed. --rank-by and --bind-to are made
to behave like --map-by, which is already renamed unconditionally with no
such guard. Detecting an erroneous duplicate (two job-level --rank-by with
no intervening MPMD separator) is left to the schizo MPMD parser, which has the
app-context boundaries the flat argv pre-scan lacks.
Both launchers carry an identical copy of this pre-scan loop, so the shared logic is factored into a single helper rather than relaxed twice:
/* src/mca/schizo/base/schizo_base_stubs.c */
char *prte_schizo_base_normalize_argv(char **argv);
It renames all four deprecated option spellings (--map-by, --rank-by,
--bind-to, --runtime-options) in place and returns any --personality
value found (a pointer into argv, NULL if none). prun and prte
each replace their inline loop with a single call to it.
8.1.1.11. Migration Notes
8.1.1.11.1. Existing per-app PPR and PES_PER_PROC
PRTE_APP_PPR (25) and PRTE_APP_PES_PER_PROC (24) are already
stored on prte_app_context_t and checked in the ppr component.
Their handling is absorbed into the general PRTE_APP_MAPBY path:
A standalone
PRTE_APP_PPRwithoutPRTE_APP_MAPBYcontinues to be read by the ppr component as today (backward compatible).PRTE_APP_MAPBYcontaining appr:N:objspec stores intoPRTE_APP_PPRin addition to settingPRTE_APP_MAPBY = PRTE_MAPPING_PPR.
8.1.1.11.2. PRTE_JOB_FILE versus PRTE_APP_MAP_FILE
The existing PRTE_JOB_FILE attribute stores the rankfile/seq file path on
the job. The new PRTE_APP_MAP_FILE (29) stores it on an individual app
context. The seq and rank_file components must be updated to check
PRTE_APP_MAP_FILE on the current app before falling back to PRTE_JOB_FILE
on the job.
8.1.1.12. Files Modified
File |
Change |
|---|---|
|
Add |
|
Add |
|
Declare new parsing and resolution functions |
|
Add |
|
Add per-app detection, per-app dispatch loop, per-app |
|
Add |
|
Add |
|
Add |
|
Add |
|
Add |
|
Add |
|
Record per-app |
|
Translate per-app |
|
Ensure each |
|
Add shared |
|
Replace inline argv pre-scan loop with a call to |
|
Same replacement (the HNP launcher’s argv pre-scan was a copy of |
|
Rename version macro to |
|
Rename module struct/typedef to |
|
Reference |
|
Reference |
|
Reference |
|
Reference |
|
Reference |
|
Unit test suite: eight |
|
|
|
|
|
Add |
|
Add |
8.1.1.13. Resolved Design Decisions
8.1.1.13.1. Oversubscription is job-level only
OVERSUBSCRIBE and NOOVERSUBSCRIBE are not per-app-context directives.
They govern whether the job as a whole is permitted to exceed node slot counts,
and that decision must be consistent across all apps sharing the same nodes.
Consequence: if a --map-by string supplied for an individual app context
includes an OVERSUBSCRIBE or NOOVERSUBSCRIBE modifier,
prte_rmaps_base_set_app_mapping_policy() must treat it as an error, emit a
diagnostic via pmix_show_help(), and return PRTE_ERR_BAD_PARAM. The
mapping event aborts with PRTE_JOB_STATE_MAP_FAILED.
The PRTE_APP_MAPBY attribute must therefore never carry an oversubscription
modifier; those modifiers remain valid only in the job-level --map-by
string where they are stored on jdata->map->mapping as today.
8.1.1.13.2. Display map is job-level only
prte_rmaps_base_display_map() is called once after all app contexts have
been mapped, and it displays the complete job map. It is not meaningful to
display a partial map mid-loop.
PRTE_JOB_DISPLAY_MAP and PRTE_JOB_DISPLAY_DEVEL_MAP remain job-level
attributes. There are no per-app-context display-map attributes.
If a caller (e.g. schizo, PMIx spawn) sets a display-map directive on an
individual app context, prte_rmaps_base_map_job() must promote it to the
job level: after resolving all per-app options but before entering the
dispatch loop, scan the apps array and, if any app carries such a directive,
set PRTE_JOB_DISPLAY_MAP on jdata->attributes. The per-app copy of the
directive is then ignored.
8.1.1.13.3. Spawn inheritance is job-level only
INHERIT and NOINHERIT control whether a spawned child job copies its
parent’s mapping/ranking/binding policies. This is a property of the job as
a whole — a child either inherits from its parent or it does not.
Consequently, INHERIT and NOINHERIT modifiers are not permitted in a
per-app --map-by string. prte_rmaps_base_set_app_mapping_policy() must
reject them with PRTE_ERR_BAD_PARAM and a diagnostic, aborting the mapping
event, exactly as it does for oversubscription modifiers.
If the CLI or PMIx spawn path presents INHERIT/NOINHERIT at the app level
(e.g., via per-app info[] keys), prte_rmaps_base_map_job() must scan all
apps before entering the dispatch loop and promote the directive to the job
level:
If all apps that carry the directive agree (all
INHERITor allNOINHERIT), apply it tojdata->attributesasPRTE_JOB_INHERITorPRTE_JOB_NOINHERIT.If any two apps carry conflicting directives, emit a diagnostic and abort with
PRTE_JOB_STATE_MAP_FAILED.
8.1.1.13.4. NOLOCAL may be applied per app context
The NOLOCAL (PRTE_MAPPING_NO_USE_LOCAL) directive prevents placement of an
app’s processes on the HNP node. It is meaningful on a per-app basis — one
app in a job may need to avoid the head node while another does not.
NOLOCAL is therefore a valid modifier in a per-app --map-by string.
prte_rmaps_base_set_app_mapping_policy() stores it as a directive bit within
the PRTE_APP_MAPBY uint16_t attribute (using PRTE_SET_MAPPING_DIRECTIVE).
prte_rmaps_base_resolve_app_options() propagates the bit into opts->map
for the current app. prte_rmaps_base_get_target_nodes() already tests
PRTE_MAPPING_NO_USE_LOCAL in the mapping policy it receives; no further
changes to that function are required.
8.1.1.13.5. All --rank-by objects are valid per app
Ranking has no job-wide-consistency requirement analogous to oversubscription
or inheritance: it only fixes the order in which an app’s own processes receive
their global ranks. Every --rank-by object — SLOT, NODE, FILL,
SPAN — is therefore accepted per app with no forbidden modifiers.
prte_rmaps_base_set_app_ranking_policy() sets the PRTE_RANKING_GIVEN
directive bit when it stores PRTE_APP_RANKBY. This is what lets
prte_rmaps_base_resolve_app_options() distinguish an app that explicitly
requested a ranking from one that should fall back to the default derived from
its mapping policy (by-node mapping → by-node ranking, etc.). An app that sets
PRTE_APP_MAPBY but not PRTE_APP_RANKBY gets a ranking derived from its
own per-app mapping policy, not from the job-level mapping policy.
8.1.1.13.6. All --bind-to objects and modifiers are valid per app
Binding is intrinsically a per-process property, so every --bind-to object
(NONE, HWTHREAD, CORE, L1CACHE, L2CACHE, L3CACHE,
NUMA, PACKAGE) and every binding modifier (if-supported,
overload-allowed, no-overload, LIMIT=N) is accepted per app with no
forbidden modifiers.
The no-overload modifier records PRTE_BIND_OVERLOAD_GIVEN without
PRTE_BIND_ALLOW_OVERLOAD so that an app can explicitly forbid overload even
when the job-level default would have permitted it; resolve_app_options()
copies these bits into opts->bind for the app, overriding the job-level
binding directive in its entirety rather than merging bit-by-bit.
8.1.1.14. Framework Version Increment
The app_idx field added to prte_rmaps_options_t changes the contract
between the base layer and every component: components are now required to
honour options->app_idx and skip apps that do not match. This is a
breaking interface change for any out-of-tree component built against the
previous headers.
The framework version must be incremented from 4.0.0 to 5.0.0:
/* src/mca/rmaps/rmaps_types.h */
#define PRTE_RMAPS_BASE_VERSION_5_0_0 PRTE_MCA_BASE_VERSION_3_0_0("rmaps", 5, 0, 0)
The old macro PRTE_RMAPS_BASE_VERSION_4_0_0 should be retained as a
deprecated alias pointing at the new value so that out-of-tree components
that have not been updated produce a link-time or runtime mismatch rather
than a silent ABI violation.
All five in-tree component files must be updated to reference
PRTE_RMAPS_BASE_VERSION_5_0_0 in their component struct:
Component file |
Change |
|---|---|
|
|
|
same |
|
same |
|
same |
|
same |
The module struct typedef and convenience alias in rmaps.h must also be
updated:
/* src/mca/rmaps/rmaps.h */
struct prte_rmaps_base_module_5_0_0_t { ... };
typedef struct prte_rmaps_base_module_5_0_0_t prte_rmaps_base_module_5_0_0_t;
typedef prte_rmaps_base_module_5_0_0_t prte_rmaps_base_module_t;
8.1.1.15. Unit Tests
PRRTE currently has no unit test suite for the rmaps framework. This work
introduces non-trivial new logic in prte_rmaps_base_map_job() and
prte_rmaps_base_resolve_app_options() that must be verified independently
of a live DVM. A new unit test tree is required.
8.1.1.15.1. Offline end-to-end verification (no launch)
In addition to the unit tests below, the whole per-app path — CLI parse →
pmix_app_t.info[] → PRTE_APP_* attributes → map_job dispatch →
placement/ranking/binding — can and must be exercised end to end without
launching anything:
prterun --rtos donotlaunch --display map \
--prtemca hwloc_use_topo_file test/unit/rmaps/test-topo.xml \
-H node0:N,node1:M,node2:L \
--map-by node -n 4 hostname : --map-by slot --rank-by node -n 4 hostname
--rtos donotlaunch runs the mapper/ranker/binder and prints the map without
forking any process; --prtemca hwloc_use_topo_file supplies a simulated
node topology (so binding resolves against real objects); -H declares the
simulated nodes (slot counts only need to be ≥ the procs placed on each node).
The printed map shows each process’s app index, rank, and bound object. See
the “Testing the mapper without launching” section of AGENTS.md for the
full description.
This offline check is what catches the class of failure described in
§”These attributes must be stored with PRTE_ATTR_GLOBAL”: a per-app
directive that parses correctly but is silently dropped before mapping shows up
immediately here as an app whose placement/rank/binding does not change when
its per-app policy changes. Verification must include a multi-app MPMD
case — a single-app job can pass even when per-app attributes are being lost,
because its lone directive coincides with the job-level policy.
8.1.1.15.2. Location
test/unit/rmaps/
Makefile.am
test_rmaps_main.c — harness: init/finalize, run all suites
test_resolve_options.c — prte_rmaps_base_resolve_app_options()
test_policy_parse.c — prte_rmaps_base_set_app_mapping_policy() etc.
test_dispatch.c — per-app dispatch loop in map_job
test_round_robin.c — round_robin component with app_idx
test_ppr.c — ppr component with app_idx
test_seq.c — seq component with app_idx
test_rank_file.c — rank_file component with app_idx
test/unit/rmaps/ is added to the SUBDIRS list in test/unit/Makefile.am
(creating that file if it does not yet exist) and wired into the top-level
make check target.
8.1.1.15.3. Test harness requirements
The tests link against the PRRTE static libraries but do not require a running
DVM. They use the same minimal-init pattern as the PMIx unit tests: call
prte_init() in server-tool mode, bypassing daemon launch. A lightweight
stub replaces PRTE_ACTIVATE_JOB_STATE for the mapping-failed path so tests
can assert the failure code without triggering the state machine.
8.1.1.15.4. Coverage required per test file
``test_policy_parse.c`` — prte_rmaps_base_set_app_mapping_policy(),
prte_rmaps_base_set_app_ranking_policy(), prte_rmaps_base_set_app_binding_policy():
Mapping (set_app_mapping_policy):
Valid single-word policies (
core,node,slot,ppr:2:core,hwthread, etc.) store the correct uint16_t inapp->attributes.All valid modifiers (
NOLOCAL,PE=N,ORDERED,HWTCPUS,CORECPUS,FILE=path) parse and store correctly.Forbidden modifiers (
OVERSUBSCRIBE,NOOVERSUBSCRIBE,INHERIT,NOINHERIT) returnPRTE_ERR_BAD_PARAM.Malformed strings (missing value after
=, unknown keyword, conflictingHWTCPUS/CORECPUS) return appropriate error codes.
Ranking (set_app_ranking_policy):
Each object (
slot,node,fill,span) stores the matchingPRTE_RANK_BY_*value inPRTE_APP_RANKBYwithPRTE_RANKING_GIVENset.An unrecognized object returns
PRTE_ERR_SILENTand leavesapp->attributesunchanged.
Binding (set_app_binding_policy):
Each object (
none,hwthread,core,l1cache,l2cache,l3cache,numa,package) stores the matchingPRTE_BIND_TO_*value inPRTE_APP_BINDTO.Modifiers
if-supportedandoverload-allowedsetPRTE_BIND_IF_SUPPORTED/PRTE_BIND_ALLOW_OVERLOAD(withPRTE_BIND_OVERLOAD_GIVEN);no-overloadsetsPRTE_BIND_OVERLOAD_GIVENwithout the allow bit.LIMIT=NstoresNasPRTE_APP_BINDING_LIMIT; a non-numericLIMITvalue returnsPRTE_ERR_SILENT.An unrecognized object or modifier returns
PRTE_ERR_BAD_PARAM.
``test_resolve_options.c`` — prte_rmaps_base_resolve_app_options():
App with no per-app attributes:
app_optionsis identical to the job-leveloptions.App with
PRTE_APP_MAPBYset:opts->mapreflects the app value, not the job value.App with
PRTE_APP_NOLOCALdirective bit set:opts->mapcarriesPRTE_MAPPING_NO_USE_LOCAL.App with
PRTE_APP_PES_PER_PROC/PRTE_APP_HWT_CPUS/PRTE_APP_CPUSET: correct override.Fallback chain:
PRTE_APP_PPRabsent →PRTE_JOB_PPRused.
``test_dispatch.c`` — the per-app detection and dispatch loop in prte_rmaps_base_map_job():
Job with no per-app directives: single-dispatch path taken (verified by mock component call count = 1).
Job with at least one app carrying
PRTE_APP_MAPBY: per-app path taken; mock component called once per app.OVERSUBSCRIBEin a per-app--map-bystring: mapping aborts withPRTE_JOB_STATE_MAP_FAILED.Conflicting
INHERIT/NOINHERITacross apps: mapping aborts.Any app with display-map directive:
PRTE_JOB_DISPLAY_MAPpromoted to job level.``NOLOCAL`` on app[0], not on app[1], shared HNP node (see below).
``test_round_robin.c``, ``test_ppr.c``, ``test_seq.c``, ``test_rank_file.c``:
Each file tests its component with the app_idx field in both modes:
app_idx = -1: component maps all apps (baseline, existing behaviour).app_idx = 0on a two-app job: only app[0] is mapped; app[1] procs remain unplaced.app_idx = 1on a two-app job: only app[1] is mapped; app[0] procs remain unplaced.app_idxset to an index withNULLin the apps array: component skips gracefully.
8.1.1.15.6. Makefile.am for the test suite
# test/unit/rmaps/Makefile.am
#
# Copyright (c) 2026 Nanook Consulting All rights reserved.
# $COPYRIGHT$
# Additional copyrights may follow
# $HEADER$
AM_CPPFLAGS = \
-I$(top_srcdir)/src \
-I$(top_srcdir)/include \
-I$(top_srcdir)
check_PROGRAMS = test_rmaps
test_rmaps_SOURCES = \
test_rmaps_main.c \
test_policy_parse.c \
test_resolve_options.c \
test_dispatch.c \
test_round_robin.c \
test_ppr.c \
test_seq.c \
test_rank_file.c
test_rmaps_LDADD = $(top_builddir)/src/libprrte.la
TESTS = test_rmaps
8.1.1.16. make check Integration
PRRTE currently has no make check target. This section specifies everything
required to wire the new unit test suite into the Automake check framework so
that make check builds and runs the tests from the top of the source tree.
8.1.1.16.1. 1. Add test/unit/ to the build tree
Create test/unit/Makefile.am:
# test/unit/Makefile.am
#
# $COPYRIGHT$
# Additional copyrights may follow
# $HEADER$
SUBDIRS = rmaps
Create test/Makefile.am:
# test/Makefile.am
#
# $COPYRIGHT$
# Additional copyrights may follow
# $HEADER$
SUBDIRS = unit
8.1.1.16.2. 2. Wire test/ into the top-level build
In Makefile.am, add test to SUBDIRS:
# Makefile.am (excerpt — existing SUBDIRS line)
SUBDIRS = config contrib src include docs test
The test entry must come after src so that libprrte.la is built before
the test programs that link against it.
8.1.1.16.3. 3. Register test/ Makefiles in the configure system
In config/prte_config_files.m4, extend the AC_CONFIG_FILES call:
AC_DEFUN([PRTE_CONFIG_FILES],[
AC_CONFIG_FILES([
src/Makefile
...existing entries...
test/Makefile
test/unit/Makefile
test/unit/rmaps/Makefile
])
])
8.1.1.16.4. 4. Automake check mechanics
Automake’s make check target automatically builds everything listed in
check_PROGRAMS and then runs everything listed in TESTS. Because
TESTS = test_rmaps is set in test/unit/rmaps/Makefile.am, running
make check from the top of the build tree will:
Build
test_rmaps(and its dependencylibprrte.laif not already built).Execute
./test_rmaps.Report pass/fail based on the exit code.
No additional Automake variables or test-driver configuration are required for
a simple binary-exit-code test. If a TAP-based driver is preferred in future,
AM_TESTS_ENVIRONMENT and LOG_DRIVER can be added at that point without
changing the test source files.
8.1.1.16.5. 5. autogen.pl / configure impact
Adding test/Makefile.am, test/unit/Makefile.am, and
test/unit/rmaps/Makefile.am to the source tree and listing them in
config/prte_config_files.m4 is sufficient. No new configure.ac macros
are needed. Developers must re-run ./autogen.pl && ./configure after
pulling these files for the first time.
8.1.1.16.6. 6. Isolation from normal builds
The test programs are listed under check_PROGRAMS, not bin_PROGRAMS or
noinst_PROGRAMS. Automake only builds check_PROGRAMS when make check
is explicitly invoked; a plain make or make install does not build them.
This keeps the normal build fast and does not install test binaries.
8.1.1.16.7. 7. Developer workflow
# After configure:
make -j$(nproc) # normal build, does not compile tests
make check # build and run all unit tests
make check -C test/unit/rmaps # run only the rmaps suite
8.1.1.17. Open Questions
None.