5. PRRTE DVM Configuration

The PMIx Reference RunTime Environment (PRRTE) can be instantiated as a Distributed Virtual Machine (DVM) in two ways. First, the prte command can be executed at a shell prompt. This will discover the available resources (either from hostfile or as allocated by a resource manager) and start a PRRTE shepherd daemon (prted(1)) on each of the indicated nodes.

The other method, however, is to bootstrap the DVM at time of cluster startup. Bootstrapping PRRTE allows the DVM to serve as the system-level runtime, providing a full-service PMIx environment to sessions under its purview. Integration to an appropriately enabled scheduler can provide a full workload managed environment for users.

Establishing the DVM using the bootstrap method requires that a PRRTE configuration file be created and made available on every node of the cluster at node startup. The configuration file provides necessary information for establishing the communication infrastructure between the DVM controller and the compute node daemons. It also provides a means for easily defining DVM behavior for options such as logging, system-level prolog and epilog scripts for each session, and other PRRTE features.

The configuration file can be manually created or can be created using the PRRTE configuration tool. Manual creation can best be done by editing the example configuration file (<source-location>/src/etc/prte.conf). This file contains all the supported configuration options, with all entries commented out. Simply uncomment the options of interest and set them to appropriate values. The file will be installed into the final <install-location>/etc when make install is performed.

5.1. Configuration Options

The following options are supported by PRRTE latest. While we make every effort to maintain compatibility with prior versions, we recommend that you check options when installing new versions to see what may have changed and/or been added. We also recommend that you use the PRRTE DVM configurator for the version you are using to ensure that it is fully compatible.

5.1.1. Bootstrap Options

ClusterName=<string> (default: "cluster") is the name of the cluster upon which the DVM is executing. This is used by PRRTE to form the namespace for the DVM daemons, which is taken as <clustername>-prte-dvm. Using different names for each of your clusters is important if you use a single database to record information from multiple PRRTE-managed clusters.

DVMControllerHost=<hostname> is the host upon which the DVM controller will be executing. The prted that finds itself booting onto this host will declare itself to be the system controller and will initialize itself accordingly.

DVMPort=<number> (default: 7817) is the TCP port upon which every DVM process listens for connections from its peers. The controller uses it to accept connections from its prted daemons, and each prted uses it to accept connections from peer daemons. Because a single well-known port is shared across the DVM, any process can construct any peer’s contact address from that peer’s host without a discovery exchange.

DVMNodes=<regex of DVM nodes> (default: none) provides a regular expression identifying the nodes upon which user applications can run. IP addresses can be provided in place of hostnames if desired. The regular expression can consist of a simple comma-delimited list of hostnames, or a comma-delimited list of hostname ranges (e.g., “linux0,linux[2-10]”), or a PMIx “native” regular expression.

DVMNetworks=<comma-delimited list> (default: all) restricts the networks the runtime uses for inter-node (daemon-to-daemon) communication. Entries may be interface names or CIDR subnets (e.g., “eth0,10.0.0.0/8”). When omitted, the runtime selects among all available interfaces. This duplicates the prte_if_include MCA parameter and is provided here so the transport can be managed from the single configuration file.

The CIDR entries additionally disambiguate the address a daemon uses to reach its tree parent (or the controller). A daemon synthesizes those contact addresses from the configured host names before any topology has been distributed; when a host resolves to several addresses, the CIDR selects the one on the DVM interconnect. If such a host is multi-homed and no CIDR is given to choose among its addresses, the daemon fails to start with a clear diagnostic rather than guessing an interface. A single-homed host needs no DVMNetworks entry.

DVMNetmask=<netmask> (default: derived) is the interface netmask associated with the inter-node network. It is used when constructing the contact information the DVM daemons exchange, allowing them to agree on reachability without dynamic discovery. The value follows the selected address family: a dotted netmask or prefix length for IPv4, or a prefix length for IPv6. When omitted, the prefix of the DVMNetworks CIDR that selected the address is used; absent that, reachability is left unrestricted.

DVMIPVersion=<4|6> (default: 4) selects the IP address family the DVM uses for inter-node communication. The default, 4, uses IPv4. Setting it to 6 configures an IPv6-only DVM: the daemons listen and connect over IPv6, and the IPv4 family is disabled. IPv6 support requires that PRRTE was built with IPv6 enabled; if it was not, a DVM configured for 6 will fail to start with a clear diagnostic. The remaining address-bearing options (DVMControllerHost, DVMNodes, DVMNetworks, DVMNetmask) accept values of the selected family, so IPv6 literal addresses and IPv6 CIDR subnets may be used when DVMIPVersion=6.

DVMRadix=<number> (default: 64) sets the radix of the routing tree that connects the daemons. Rather than every daemon opening a connection directly to the controller, each daemon connects to its parent in a radix tree of the given width, which spreads the connection and message-relay load across the DVM. The value ties directly to the rml_base_radix MCA parameter and must be identical on every node, so it is set once in the configuration file. The default of 64 keeps the tree shallow for typical clusters while capping the number of children any single daemon (including the controller) must serve.

DVMConnectMaxTime=<seconds> (default: 30) bounds how long a daemon will keep trying to reach its assigned parent in the routing tree before it heals up to the next ancestor. Because the daemons start independently, a daemon’s parent may not yet be running; after this interval the daemon promotes its connection target to the parent’s parent, and so on up the tree, until it reaches the controller. The controller itself is retried indefinitely (see DVMRetryMaxDelay), so a daemon always eventually joins the DVM even if only the controller is up. Setting this to 0 disables healing and retries every parent forever.

DVMRetryMaxDelay=<seconds> (default: 5) bounds the delay between a daemon’s attempts to connect to the DVM controller during bootstrap. Because the daemons start independently and the controller may come up arbitrarily late, a bootstrapping daemon retries the connection indefinitely rather than giving up. To avoid hammering an absent controller, the delay between attempts grows from an initial short interval and doubles on each attempt, up to this maximum, after which retries continue at that steady rate until the controller answers. A larger value lowers the polling load of a long-absent controller; a smaller value reconnects faster once it appears.

Note

Several bootstrap options duplicate values that can also be set as MCA parameters. They are provided here so that all DVM behavior can be managed in one place. Where an option and an MCA parameter set the same value, the configuration file takes precedence over the MCA parameter file. A value given explicitly on the command line still overrides both.

5.1.2. Operational Options

DVMTempDir=<path> (default: /tmp) is the temporary directory that the DVM daemons and controller are to use as the base for their session directories. Working files/directories for the DVM will be placed under this location.

SessionTmpDir=<path> (default: DVMTempDir) is the temporary directory that the DVM daemons are to use as the base for session directories for all application sessions. Working files for each session will be placed under this location, separated out into a directory for each session.

5.1.3. Logging Options

ControllerLogJobState=<true|false> (default: false) directs the DVM controller to log each DVM-launched job state transition. Log entry includes the namespace of the job, the state to which it is transitioning, and the date/time stamp when the transition was ordered.

ControllerLogProcState=<true|false> (default: false) directs the DVM controller to log each process (in a DVM-launched job) state transition. Log entry includes the namespace and rank of the process, the state to which it is transitioning, and the date/time stamp when the transition was ordered.

ControllerLogPath=<path> (default: DVMTempDir) is the path to where the logs are to be written. If a relative path is provided, then the directory will be created under the DVMTempDir location. The path defaults to the specified SessionTmpDir in the absence of any input to this field. The log filename is formatted as prtectrlr-<hostname>-log<.

PRTEDLogJobState=<true|false> (default: false) directs each prted in the DVM to log each DVM-launched job state transition. Log entry includes the namespace of the job, the state to which it is transitioning, and the date/time stamp when the transition was ordered.

PRTEDLogProcState=<true|false> (default: false) directs each prted in the DVM to log each process (in a DVM-launched job) state transition. Log entry includes the namespace and rank of the process, the state to which it is transitioning, and the date/time stamp when the transition was ordered.

PRTEDLogPath=<path> (default: DVMTempDir) is the path to where the logs are to be written. If a relative path is provided, then the directory will be created under the DVMTempDir location. The path defaults to the specified SessionTmpDir in the absence of any input to this field. The log filename is formatted as prted-<hostname>-log<.

5.2. Configurator Tool

The PRRTE configuration tool contains all the supported options in an easy-to-use form. Once you have filled out the desired entries, the “submit” button will show the resulting configuration file on the browser window — a simple “copy/paste” operation into your target configuration file will yield the final result.