Technical Implementation of PolyHub

The domain name, “PolyHub.org,” was registered March 15, 2008. This website acts as the central access server/distribution/storage for the Engineering Virtual Organization (EVO). The technological infrastructural needs of any EVO revolve around the central problem of enabling effective communication among its members and between the VO and the wider community. This communication takes many forms: obvious examples such as e-mail, teleconferencing, and web-based documentation must be addressed to satisfy the needs of person-to-person communication. In addition, research data distribution and computing task execution are two examples of VO operation of person-to-computer communication. PolyHub employs the computing model known as Grid Computing, as implemented in the form of the Open Science Grid (OSG), for developing and establishing the VO technical infrastructure.

We have documentation for current PolyHub members on Getting Started using PolyHub and and Using The PolyHub Grid.

Open Science Grid

The OSG grid-computing model addresses wide-area computing challenges by providing standard interfaces for command execution, data storage, and user authentication, as well as a support infrastructure, for these three central tasks. OSG interfaces are implemented as a software stack that can be layered on top of existing computing infrastructures, while putting few requirements on the underlying systems. The grid provides a standard API for building services to enable collaborative computing within a VO, as well as between the VO and larger OSG community.

The schematic below is a diagram of the major components of the PolyHub OSG site. VO member authentication is achieved by the issuance of personal grid certificates by a certificate authority such as DOEGrids. All interaction with grid resources requires these certificates. Grid users register for PolyHub VO through the VO Management Registration Service (VOMRS), which also allows PolyHub administration to manage VO roles. The VOMRS publishes VO user data to the VO Membership Service (VOMS), where the data is made available for grid sites that support the PolyHub VO. The Grid User Management System (GUMS) uses the data published by the PolyHub VOMS (and the VOMS operated by other organizations) to authorize usage of the PolyHub grid site. All grid components can be integrated to GUMS for fine-grained, role-based resource control.

Schematic_1.JPG

Compute and Storage Resource Access

The OSG Compute Element (CE) is the gateway to the site's computational resources. Using a personal grid certificate, a PolyHub user authenticates to the CE and submits a grid job to Globus. Globus uses GUMS to authorize the user and map the job to a cluster user account at the grid site. The job is then passed to the batch-queue system at the site and is executed on the compute cluster. The Storage Element (SE) is the gateway to the site's storage resources. Storage is accessed by a grid user through either the gridFTP protocol or the Storage Resource Manager (SRM) interface. This grid interface to storage is independent of the way the storage is finally implemented on the cluster. For the PolyHub site, the storage engine dCache is used to distribute the storage across the cluster. The largest technical challenge for the PolyHub EVO is in the area of data distribution and warehousing. In our model, a central grid site will maintain an authoritative data set and bookkeeping system. Access to portions of the data set can be based on grid credential, VO membership, VO roles, or time limitations, as required. The bookkeeping system will maintain a database of data set locations across the OSG sites that support the PolyHub VO. Data sets can be accessed directly at the central grid site or distributed to alternate grid sites for analysis. Data produced at alternate grid sites can also be imported to the central grid site and added to bookkeeping system. In this manner, data can be stored in a warehouse for use within the VO to promote sharing of effort and prevent wasteful re-calculation of generated data sets. Using the SRM interface for grid storage, it is possible to distribute storage across multiple physical machines within a single grid site. This allows easy scaling of a given grid site to satisfy IO access, bulk data capacity, or redundancy requirements.

Computational resources will be distributed among PolyHub OSG sites based mainly on data access requirements. For analysis of very large data sets, computations can be distributed to the grid sites which host the data (as recorded by the bookkeeping system). Computational jobs with small data access requirements can be distributed to all OSG grid sites that support the PolyHub VO. In this manner, it is possible to tune the workflow to minimize turnaround time by adapting to the compute and bandwidth resources currently available. Additional collaborative infrastructure amongst VO affiliates at allied institutions required by PolyHub makes use of the Grid Certificate system and accompanying VO membership system. The PolyHub mail gateway to the web-based discussion system will make use of this feature. Certificates will also be used to authenticate to PolyHub VO web sites for access to the central bookkeeping database and collaborative documentation system.

PolyHub leverages the existing model of software infrastructure provided by the Open Science Grid to satisfy the VO technical requirements in a cost-effective manner that does not rely on sophisticated hardware solutions or extensive software development. For the central PolyHub grid site, dedicated machines are required for the compute and storage interface systems, as well as one machine for the dataset web front-end. Bulk storage requirements are on the order of 50TB. This storage is to be distributed among at least ten machines to ensure storage redundancy and acceptable access rates. Compute resources will support a capacity of about 200 simultaneous grid jobs with Infiniband interconnect to allow efficient parallel (MPI) jobs.

Central PolHub Grid Site

The schematic below illustrates the network structure of the central PolyHub grid site. All compute and storage machines communicate via 1 Gbit/sec Ethernet connections. External network connectivity consists of multiple redundant connections to the internet via commodity carriers. In addition, there is a 3 Gbit/sec connection directly to the Internet2 network, and the option to peer with other networks (e.g., Energy Sciences net) at up to 10 Gbit/sec. Physical hosting of machines in an enterprise-level data center, as well as electrical power and cooling, is provided by The University of Tennessee Office of Information Technology High Performance Computing Support Program. This program also provides administration personnel for all HPC clusters, OSG nodes, and storage systems. By leveraging the OSG, PolyHub retains the VO technical infrastructure in an efficient manner and gains the benefit of a widely adopted and supported computing infrastructure that allows collaboration with the wider OSG community.

Schematic_2.JPG

In summary, the PolyHub infrastructure described above

  1. Stores and catalogs extensive simulation data for future use by all institutions within the EVO, using hyper-array structuring and strong management protocols;
  2. Makes accessible state-of-the-art visualization software, and provide a platform for its remote use by allied institutions;
  3. Automatically formats and compiles codes and input/output source data from one simulation to another across differing computational platforms at the various institutions;
  4. Allows resource sharing among allied institutions, whereby one research group with limited computational power can take advantage of idle cycles on another group’s computer systems;
  5. Allows portability of simulation codes from one institution to another by centralizing and regulating compiling operations, allowing codes developed at individual institutions to run on the computational resources available at other institutions;
  6. Facilitates cooperative data analysis between simulationists and experimentalists, thereby enhancing multiscale modeling efforts.