The monitoring system of file transfer service FTS
One of the major functions of the grid is distribution, storage and processing of huge data files, for what specialized services are responsible. Data transfer services interact in their work with set of components. Grid-infrastructures can consist of hundreds of sites (the regional computer centers) on which various systems of storage of files are used. For interaction with storage systems various services are used, and for data transfer various protocols are used. Data transfer services organize work of all elements and data control services, and also process the information on errors at any stage of data transfer. Daily these services carry out thousand transport jobs, are sent ten thousand files, total amounts of the transferred information make 10 Terabyte a day! The success of projects directly depends on reliability and the correctness of work of transfer services. The detailed information on file transfer services contains in A.Uzhinskiy’s and V.Korenkov’s article “Architecture of service of data transmission in grid”.
File transfer service in the gLite middleware is called FTS (File Transfer Service), its bGlobus middleware analogue is RFT (Reliable File Transfer Service). By the end of 2006 year FTS and RFT were at the formation stage, and the set of tools of their monitoring has been rather limited. Basically it consisted of the scripts visualizing the information from services in a command line interface, or complexes of monitoring of the wide profile, providing the general information on data transfer. Any of systems did not give the full information on the state of services, the history of their functioning, and the main thing about the errors arising in distributed systems of data transfer.
In frames of the JINR-CERN cooperation, a full-function monitoring system of FTS service has been developed. In the course of designing four basic groups of users have been defined: managers of the virtual organizations, the higher management, managers of FTS service and managers of grid-sites. Each of the presented groups is interested in the various data collected for various time intervals and presented in various kinds. Managers of the virtual organizations are interested in the information on the general parameters of service of data transfer and the concrete information on grid-sites for the certain period of time. The information on options and the current condition of data links, the information on errors on the part of their site and hosts is necessary for managers of grid-sites, and the last data and the information for the last days are more demanded. Managers of grid-sites interest both in categories of errors for definition of problems, and concrete descriptions of errors for their elimination. The higher management needs reception of summary reports for wide intervals of time. At last, the operative information on errors, loading, work of sites and the virtual organizations, and also degrees of coherence of various errors is required to managers of FTS service. And the reports allowing in the shortest terms with the least expenses to find and define the reason of errors, arising at data transfer are necessary for all categories of users. Thus, at designing of system of monitoring the requirement for the big spectrum of reports for different categories of users which were necessary for standardizing and realizing in system was considered.
After a detailed analysis of a file transfer service database, the model of the data of the system of monitoring, giving convenient base for creation of various reports has been designed. The model kernel is made by information and system tables, and also the tables containing the processed and raw information. The data in the specified tables arrive directly from a file transfer service database, or they are set by the user. There are also tables of separate modules of the system: «warning system» and «expert system». A general model of data of the monitoring system is presented on Fig. 1.
The interface of the system consists of several modules. Users have a possibility to begin the work with system directly from the module in which they are interested, or from the main page on which the general reports allowing to define the state of service and possible sources of problems are presented. The system gives possibilities of reception of a wide spectrum of reports, ratings, statistical calculations and definition of correlation coefficient for pair of errors. Almost all reports of data transfer service monitoring system are supplied by cross references that is very convenient for detailed elaboration of results.
In the system the warning mechanism is realized at the failures, allowing the manager of service to create own rule set (triggers) at which operation certain actions (messages by means of the web-interface, email, sms, and so on are sent) will be executed. Triggers can be created for transfer on channel, grid-sites, hosts and the virtual organizations. The information on trigger operation can be received from the special table that essentially simplifies the use of the mechanism of notifications. Thanks to the mechanism of notifications the work of service managers considerably becomes simpler. In detail to familiarize with system of FTS monitoring it is possible in A.Uzhinskiy’s and V.Korenkov’s article “System of monitoring of service of data transmission (FTS) of EGEE/WLCG project”.
The created system of FTS monitoring is intended to improve a quality of data transfer. Thanks to this system of monitoring various groups of users got an opportunity to have the necessary information for them on data transfer. The system was used in CERN for a long time for maintenance of functioning of FTS service.