The Dynamically-Updated Request Online Coallocator
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Introduction to DUROC | ||||||||||||||||||||||||||||||||||||||||||||||||||||
DUROC provides a simple distributed-job coallocator with dynamic job-reconfiguration capabilities. It is meant to provide the common infrastructure needed by most distributed Globus applications, to expedite further exploration of resource brokering.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
This document is divided into the following sections:
General information
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Coallocator requirements and motivation
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
The Globus environment includes resource managers to provide access to a range of system-dependent schedulers. Each resource manager (RM) provides an interface to submit jobs on a particular set of physical resources. In order to execute jobs which need to be distributed over resources accessed through independent RMs, a coallocator is used to coordinate transactions with each of the RMs and bring up the distributed pieces of the job. The coallocator must provide a convenient interface to obtain resources and execute jobs across multiple management pools.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Reflective management architecture | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The task an intelligent coallocation agent performs has two abstractly distinct parts. First, the agent must process resource specifications to determine how a job might be distributed across the resources of which it is aware--the agent lowers an abstract specification such that portions of the specification are allocated to the individual RMs that control access to those required resources. Second, the agent must process the lowered resource specification as part of a job request to actually attempt resource allocation--the agent issues job requests to each of the pertinent RMs to schedule the job. The process of lowering a resource specification in a job request in essence refines the request based on information available to the lowering agent. By separating the tasks of refinement and allocation in the architecture, we can allow user intervention to adjust the refinement based on information or constraints beyond the heuristics used internally by a particular automated agent. A GUI specification-editor has been suggested as a meaningful mode of user (job requester) intervention.
spec1 : resource specification DUROC implements the allocation operation across multiple RMs in the Globus test-bed and leaves lowering decisions to higher-level tools.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Atomic requests | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Once a resource specification has been refined the agent must attempt to allocate resources. In general the resources might managed by different RMs, and the coallocator must atomically schedule the user's single abstract job or fail to schedule the job. Because the GRAM interface does not provide support for inter-manager atomicity, the user code must be augmented to implement a job-start barrier; as distributed components of the job become active, they must rendezvous with the allocating agent to be sure all components were successfully started prior to performing any non-restartable user operations.
main :atomicity of job creation can only guaranteed after the barrier, so the user should not perform operations which cannot be reversed, e.g. certain persistent effects or input/output operations, until after the barrier.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Coallocated resource specification language | ||||||||||||||||||||||||||||||||||||||||||||||||||||
DUROC shares its Resource Specification Language (RSL) with GRAM. DUROC can perform allocations described by a 'lowered' resource specification. The task of the lowering agent is to take a resource request of some form, be it a generalized GRAM request or user inputs to a GUI interface, and produce a lowered request so that DUROC can directly acquire the resources for the user. The allocation semantics for DUROC requests are that each component of the top-level multi-request represents one GRAM request that DUROC should make as part of the distributed job DUROC is allocating. In order to make the request, DUROC must be able to determine what RM to contact. Typically there will be additional terms in the conjunctions of the lowered request, and those terms will be passed on verbatim in each GRAM request. DUROC will extract each component of the lowered multi-request, remove the DUROC-specific components of the subrequest, and then forward that subrequest to the specified GRAM. Therefore any other attributes supported by GRAM are implicitly supported by DUROC. For example: +(&(resourceManagerContact=RM1)(count=3)(executable=myprog.sparc)) (&(resourceManagerContact=RM2)(count=2)(executable=myprog.rs6000)) in this request the executables and node counts are specified for each resource pool. While GRAM may in fact require fields such as these, DUROC treats them as it would any other fields not needed to do its job--it forwards them in the subrequests and it is up the the RMs to either successfully handle the request or return a failure-code back to DUROC (which will then return an appropriate code to the user).
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
DUROC request processing (coallocation) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Requests submitted to the DUROC API are decomposed into the individual GRAM requests and each request is submitted through the GRAM API. A DUROC request proceeds with each GRAM request in the job that succeeds. Runtime features available to the job processes include a start barrier and inter-process communications to help coordinate the job processes. The start barrier allows the processes to synchronize before performing any non-restartable operations. In the absence of a start barrier, there is no way to guarantee that all job components are successfully created prior to executing user code. The communications library provides two simple mechanisms to send start-up and bootstrapping information between processes: an inter-subjob mechanism to communicate between ``node 0'' of each subjob, and an intra-subjob mechanism to communicate between all the nodes of a single subjob. A library of common bootstrapping operations is provided, using the public inter-subjob and intra-subjob communication interfaces. It is important to note that the bootstrapping interfaces are designed to be reliable and portable. They do not necessarily provide high-performance nor asynchronous, concurrent messaging. To user should bootstrap their own communications environment and completely switch over to it; failure to do so may result in confusing deadlock situations where the bootstrapping interfaces prevent forward progress in the user communications or vice versa. For each GRAM subjob in the DUROC job, there are two optional RSL fields which affect the subjob behavior. The `subjobStartType' field allows the user to configure each subjob to either participate in the start barrier with strict subjob-state monitoring (value `strict-barrier'), participate in the start barrier without strict subjob-state monitoring (value `loose-barrier), or not participate in the barrier at all (value `no-barrier'). Subjobs that don't perform the barrier run forward independently of the other subjobs. Strict state monitoring means that the job will be automatically killed if the subjob terminates prior to completing the barrier. The `subjobCommsType' field allows the user to configure each subjob to either join the inter-subjob communications group as a blocking operation (value `blocking-join') or not join the inter-subjob communications group at all (value `independent'). When joining the group as a blocking operation, all participating subjobs will join together, i.e. the communications startup function will function as a group barrier.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Generic resource coallocation API | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The resource coallocation API provides functions for submitting a job request to a broker, for editing a submitted request, for cancelling a request, and for requesting job state information. The Dynamically-Updated Request Online Coallocator API (DUROC) is similar to that of the Resource Management API (GRAM), with the addition of the subjob-add, subjob-delete, and barrier-release operations for managing resources, the runtime-barrier operation which must be performed during the startup of each node, and the job-structure and inter-subjob communication interface operations, which at runtime provide a mechanism for job self-organization. The following API documents the DUROC v0.8 API, including runtime operations necessary to use DUROC v0.8. |
||||||||||||||||||||||||||||||||||||||||||||||||||||
Duroc control-library APIglobus_module_activate (GLOBUS_DUROC_CONTROL_MODULE) Activate the DUROC control-library API implementation prior to using any of the API functions. int globus_module_activate (GLOBUS_DUROC_CONTROL_MODULE)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_module_deactivate (GLOBUS_DUROC_CONTROL_MODULE) Deactivate the DUROC control-library API implementation when finished using any of the API functions. int globus_module_deactivate (GLOBUS_DUROC_CONTROL_MODULE)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_control_init ( ) Initialize a globus_duroc_control_t object for subsequent coallocated-job submission and control. int globus_duroc_control_init (globus_duroc_control_t * controlp)
A single globus_duroc_control_t object can be used to concurrently submit and control multiple DUROC jobs.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_control_job_request ( ) Request coallocation of interactive resources at the current time. int globus_duroc_control_job_request (globus_duroc_control_t * controlp, const char * description, int job_state_mask, const char * callback_contact, char ** job_contactp, int * subreq_countp, int ** subreq_resultsp) A job submitted through this interface can subsequently be controlled with the other DUROC API functions by providing the submitted job's contact string to those calls. |
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_control_subjob_add ( ) Augment a coallocation with an additional interactive resource at the current time. int globus_duroc_control_subjob_add (globus_duroc_control_t * controlp, const char * job_contact, const char * subjob_description) A job modified through this interface can subsequently be controlled with the other DUROC API functions by providing the job's contact string to those calls. |
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_control_subjob_delete ( ) Modify a coallocation by removing an interactive resource at the current time. int globus_duroc_control_subjob_delete (globus_duroc_control_t * controlp, const char * job_contact, const char * subjob_label) A job modified through this interface can subsequently be controlled with the other DUROC API functions by providing the job's contact string to those calls. |
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_control_barrier_release ( ) Allow a requested coallocation to run forward when all subjobs have entered the barrier. int globus_duroc_control_barrier_release (globus_duroc_control_t * controlp, const char * job_contact, globus_bool_t wait_for_subjobs)
This routine allows subjobs to run forward past the runtime barrier, and currently delimits a point after which subjobs cannot be added or deleted.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_control_job_cancel ( ) Remove a Pending job request or kill processes associated with an Active request, releasing any associated resources, if such action is supported by the associated resource managers. int globus_duroc_control_job_cancel (globus_duroc_control_t * controlp, const char * job_contact)
This routine ``succeeds'' if the job is known. A successful return code does not guarantee that all job resources were successfully released.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_control_subjob_states ( ) Obtain a snapshot of the status of each subjob in a submitted DUROC job. int globus_duroc_control_subjob_states (globus_duroc_control_t * controlp, const char * job_contact, int * subjob_countp, int ** subjob_statesp, char *** subjob_labelsp)
This routine can effectively be used in a polling loop to monitor the status of a job, for example in the display loop of a GUI agent. |
||||||||||||||||||||||||||||||||||||||||||||||||||||
DUROC runtime-library APIglobus_module_activate (GLOBUS_DUROC_RUNTIME_MODULE) Activate the DUROC runtime-library API implementation prior to using any of the API functions. int globus_module_activate (GLOBUS_DUROC_RUNTIME_MODULE)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_module_deactivate (GLOBUS_DUROC_RUNTIME_MODULE) Deactivate the DUROC runtime-library API implementation when finished using any of the API functions. int globus_module_deactivate (GLOBUS_DUROC_RUNTIME_MODULE)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_runtime_barrier ( ) Rendezvous with the coallocator to implement job-start atomicity and coordinate the distributed processes. void globus_duroc_runtime_barrier () This routine is called by the job processes at startup to implement job-start atomicity. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_runtime_inter_subjob_structure ( ) Get the layout of the DUROC job. The DUROC inter-subjob communication routines can only be called on the subjob node where globus_duroc_runtime_intra_subjob_rank() reports the rank as zero (0)!
int globus_duroc_runtime_inter_subjob_structure (int * local_addressp, int * remote_countp, int ** remote_addressesp) This routine is called by the job processes after the inter-subjob initialization operation to find the layout of the job. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_runtime_inter_subjob_send ( ) Send a byte-vector to another subjob in the DUROC job. The DUROC inter-subjob communication routines can only be called on the subjob node where globus_duroc_runtime_intra_subjob_rank() reports the rank as zero (0)!
int globus_duroc_runtime_inter_subjob_send (int dst_addr, const char * tag, int msg_size, globus_byte_t * msg) This routine is called by the job processes after the inter-subjob initialization operation to transmit messages between subjobs. The data is received by a corresponding call to globus_duroc_runtime_inter_subjob_receive at the destination subjob. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_runtime_inter_subjob_receive ( ) Receive a byte-vector sent by another subjob in the DUROC job. int globus_duroc_runtime_inter_subjob_receive (const char * tag, int * msg_sizep, globus_byte_t ** msgp) The DUROC inter-subjob communication routines can only be called on the subjob node where globus_duroc_runtime_intra_subjob_rank() reports the rank as zero (0)!
This routine is called by the job processes after the inter-subjob initialization operation to receive messages from other subjobs. The data is transmitted by a corresponding call to globus_duroc_runtime_inter_subjob_send at the originating subjob with a matching message tag, and messages are queued and reordered if the subjob receives messages with a different tag than the one requested by the receiving subjob process. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_runtime_intra_subjob_rank ( ) Obtain the rank of the local subjob process.
int globus_duroc_runtime_intra_subjob_rank (int * rankp) This routine is called by the job processes after the intra-subjob initialization operation to obtain the rank of the local subjob process. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job. |
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_runtime_intra_subjob_size ( ) Obtain the rank of the local subjob process.
int globus_duroc_runtime_intra_subjob_size (int * sizep) This routine is called by the job processes after the intra-subjob initialization operation to obtain the number of local subjob processes. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_runtime_intra_subjob_send ( ) Send a byte-vector to another process in the DUROC subjob.
void globus_duroc_runtime_intra_subjob_send (int dst_rank, const char * tag, int msg_size, globus_byte_t * msg) This routine is called by the job processes after the intra-subjob initialization operation to transmit messages between subjob processes. The data is received by a corresponding call to globus_duroc_runtime_intra_subjob_receive at the destination subjob. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_runtime_intra_subjob_receive ( ) Receive a byte-vector sent by another process in the DUROC subjob. void globus_duroc_runtime_intra_subjob_receive (const char * tag, int * msg_sizep, globus_byte_t * msg)
This routine is called by the job processes after the intra-subjob initialization operation to receive messages from other subjob processes. The data is transmitted by a corresponding call to globus_duroc_runtime_intra_subjob_send at the originating process with a matching message tag, and messages are queued and reordered if the process receives messages with a different tag than the one requested by the receiving call. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
DUROC bootstrap-library |
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_module_activate (GLOBUS_DUROC_BOOTSTRAP_MODULE) Activate the DUROC bootstrap-library implementation prior to using any of the API functions. int globus_module_activate (GLOBUS_DUROC_BOOTSTRAP_MODULE)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_module_deactivate (GLOBUS_DUROC_BOOTSTRAP_MODULE) Deactivate the DUROC bootstrap-library implementation when finished using any of the API functions. int globus_module_deactivate (GLOBUS_DUROC_BOOTSTRAP_MODULE)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_bootstrap_subjob_exchange ( ) Perform an exchange of information between subjobs. void globus_duroc_bootstrap_subjob_exchange (const char * local_info, int * subjob_countp, int * local_indexp, char *** subjob_info_arrayp)
This routine is called by the job processes after the bootstrap activation operation to exchange string information between subjobs. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_bootstrap_master_sp_vector ( ) Construct a vector of Nexus startpoints on the master node. void globus_duroc_bootstrap_master_sp_vector (nexus_startpoint_t * local_sp, int * job_sizep, nexus_startpoint_t ** sp_vectorp)
This routine is called by the job processes after the bootstrap activation operation to construct a startpoint vector on the master node. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
globus_duroc_bootstrap_ordered_master_sp_vector ( ) Construct a vector of Nexus startpoints on the master node. void globus_duroc_bootstrap_ordered_master_sp_vector (nexus_startpoint_t * local_sp, int subjob_index, int * job_sizep, nexus_startpoint_t ** sp_vectorp)
This routine is called by the job processes after the bootstrap activation operation to construct a startpoint vector on the master node. It differs from the simpler globus_duroc_bootstrap_master_sp_vector() routine in that it allows some extra control over the selection of a master node for expert users with special considerations. It is not really part of the coallocation API in that it is called by the job, rather than by the process requesting a job.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
DUROC source manifest | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The ResourceManagement/duroc directory in your Globus source tree should contain the following directories and files:
For each build-directory listed above, the following files exist:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Building DUROC | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Globus uses the GNU autoconf system to configure and build on any supported platform. To build DUROC you can run the following commands in your Globus build directory (see the Globus docs for more general information): % ./configure --enable-duroc (plus any other desired options)The optional DUROC configuration flags are: --enable-duroc-debug
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Installing DUROC | ||||||||||||||||||||||||||||||||||||||||||||||||||||
After building DUROC as described above, you can install it on your system by running the following command in the directory where Globus was built (see the Globus docs for more general information): % make installThe following libraries and header files are installed (globus_duroc_common.h is referenced from the other header files):
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Using the DUROC libraries | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The DUROC control and runtime libraries can be linked and used individually or in any combination within the same program. The control library provides the DUROC request API and the runtime library is used by every process initiated via DUROC. The programs duroc/src/tools/duroc-request.c and duroc/src/tools/duroc-stub-app.c serve as examples of how to use the DUROC libraries.
Assuming you have done that, and your application includes "globus_duroc_runtime.h", your makfile will have the following two flavors of rules (one for compiling and one for linking): If you are constructing a makefile to build your app as a globus component, simply replace "$(libdir)" and "$(includedir)" with "$(BUILD_DIR_LIB)" and "$(BUILD_DIR_INC)", respectively. In this case you should also use the standard Globus method of inserting makefile_header into your Makefile during the configuration process.myapp.$(OFILE): myapp.c $(CC) $(CFLAGS) $(GLOBUS_DUROC_RUNTIME_CFLAGS) \ -I$(includedir) -c myapp.c myapp: myapp.$(OFILE) $(CC) $(CFLAGS) myapp.$(OFILE) -o myapp \ -L$(libdir) $(LDFLAGS) $(GLOBUS_DUROC_RUNTIME_LDFLAGS) \ $(GLOBUS_DUROC_RUNTIME_LIBS) $(LIBS) The complete set of DUROC-related variables defined in the "makefile_header" are as follows:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Using the DUROC tools | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Below is a summary of the tools provided with DUROC. Each tool is a minimalist wrapper around DUROC library functions.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Known bugs and limitations | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The error codes documented in the API section of this file are a subset of the actual codes returned. The globus_duroc_runtime_inter_subjob_* and globus_duroc_runtime_intra_subjob_* interfaces are not yet reentrant. The user must refrain from calling any of the routines concurrently.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Last modified 10/23/98. Comments? webmaster@globus.org
|