Monitoring system and accounting of GridNNN project resources
Modern distributed systems as, in particular, grid infrastructures, are quite complex technical systems. Therefore, maintaining of their operation requires automated tools to monitor their state and usage. The aim of the GridNNN (2009-2011 years) project was to create and support the grid infrastructure for the needs of the National Nanotechnology Network in Russia.
Middleware for building of the GridNNN project’s grid infrastructure was partially based on such well-known packages as Globus Toolkit 4 and VOMS, but a lot of infrastructure services have been fully developed “from scratch”. Thus it also required the development of specialized tools for system monitoring and accounting of resource usage. To manage the grid infrastructure effectively, detect errors and plan the infrastructure development, it is necessary to provide information not only about the current state of the infrastructure, but also knowledge about the history of changes of its state over time.
Monitoring is a fairly general concept. The main objectives of monitoring of GridNNN project were:
- continuous monitoring of grid services, both core (common to all infrastructure) and related to the individual resource centers;
- obtaining information about the computing resources (number of slots for jobs, architecture of computer system, installed software, available specialized software packages);
- data on access rules to the resources for virtual organizations;
- information about the use of computing resources (primarily on the consumption of CPU time);
- monitoring of execution of computational tasks and jobs (start, status changes, exit codes and so on);
- information on computing resource quotas for virtual organizations.
Parallel computational jobs and the use of the MPI technology were typical for the GridNNN project. Such jobs require quite large amounts of computations, while the volume of data to store and transfer is not too large. For this reason, computational tasks and jobs (jobs are parts of the task and may be linked to each other) were of special interest for the monitoring system. Tasks monitoring is directly related to accounting of resources (so-called billing or resource usage statistics) because of its importance in knowing who, when and where has used the project resources.
To choose the appropriate computing resources to execute the job on, user must know the main characteristics of the available resources. The most important ones are the list of supported virtual organizations, hardware architecture of the computer system, the total number of slots for the jobs and the number of slots available at this time, the version of the operating system and the list of installed problem-oriented software packages.
When selecting resources to run the jobs, it is also important to take into account the state of services of appropriate resource centers, as well as the state of core grid infrastructure services.
The LIT JINR team developed a system for monitoring and accounting in the GridNNN project’s grid infrastructure, performing objectives outlined above: monitoring of the operation and parameters of resources and services of the grid infrastructure, job queues, user tasks and jobs, and also accounting for the usage of computing resources.
Monitoring and accounting data on the usage of computational resources of the GridNNN project were available through the web interface at http://mon.ngrid.ru. The information was presented in the tables and diagrams in the form of reports on resource centers, virtual organizations and individual users. The ability to display monitoring of the tasks and jobs in real-time three-dimensional image of the globe was implemented with the Google Earth program.
Examples of the information displayed by the monitoring and accounting systems are given in the figures below: