The Requirement of Access Interface Proximity to Application Domain Terminology |
An integration system complying with two foresaid requirements may be considered as global from the point of view of covering unlimited and constantly increasing quantity of information systems. But concerning such global integration system, we can state that data accessing with it are necessary both for IT specialists (to create seamless GUI or data processing algorithms) and for large quantity of people specializing in other fields. Of course, such people use specialized GUI developed by IT specialists, but not all tasks are covered by such interfaces: the possibility to work with data directly is needed also. Therefore the data access interface should be understandable to specialists in those application domains which the data concerns.
But the more important thing is that, in global integration environment, IT specialists need the application-domain-aware access interface to the same extent as other specialists. When the quantity of systems to integrate is significant, familiarization with data storage peculiarity for every one of them becomes difficult. The semantic access interface let any specialist formulate queries in application domain terms, and mapping to data storage structures should be done by the integration system automatically. It allows to increase data-related task resolution efficiency for all categories of users.
What do we understand by application-domain-aware data access interface? What principles should underlie it? It is obvious that fundamental elements of the interface should become concepts, terms of application domains. A user, whoever he (she) is, should be able to use the concepts, and not columns and tables. It will let the user abstract from system implementation details, data storage formats, and focus on an application domain itself. Mapping of queries formulated over concepts to queries formulated over data storage structure (tables, for instance) should be done by the integration system.
Otherwise, if we used a relational schema as the access interface, a user would have to familiarize with data storage nuances: names of tables and columns, table relations, semantics of all just listed. Taking into account that names of tables and columns as well as existence of table relations do not define their semantics precisely, the task gets significant complexity.
Integration systems present on the market [DB2UD, DVDP] use the relational schema as the access interface. But these systems are oriented to rather different tasks: first, they are not intended for global integration with unlimited scalability, and second, they are oriented to IT specialists. In our case, in global integration environment, usage of the relational schema as the access interface, as we said before, is not rational: we need an approach which lets use application domain concepts to access data.
What principles should be used for the access interface to be operated by means of application domain concepts? First of all, data (information) model formalizing application domains with concepts should underlie the access interface. It could be one of two main solutions: one of ontology models (RDFS [RDFS], OWL [OWL], etc.), as it was proposed, for example, in the papers [WVV01, JAP06], or the semantically complete model (SCM) [Ov04, Ov06-2, Ov06-3].
We chose SCM by the following reasons. First, ontology query languages
(SPARQL [SPARQL], RDQL [RDQL], etc.) have less clearness than SCQL
(the query language over SCM) [Ov04, Ov05, Ov06-1], which is important when data are processed
by specialists in various application domains, not only IT. For instance,
the query "select all tasks being solved by Fusionsoft company with the help of people older 50" is
formulated with SCQL as follows:
(Company~Project~Task~Person~Age), (Company = "Fusionsoft"), (Age>50)
By comparing the query with its analogues for SPARQL or RDQL, we will see that query
structure is more complicated for them even in the case of simple queries like "select all tasks solved by
every person", which is written in SCQL simply as "(Person~Task)".
Of course, you can ask why don't we use SCQL over ontologies: there are a lot of open standards in the ontology field, while there is none in the field of SCM? The reason is that SCQL exploits the unique semantic completeness property of SCM which is not proper to ontologies; therefore SCQL can not be used over ontologies in principle. The property can be formulated as follows: there can be only one association describing connection of one group of concepts. In other words, there can be no alternative associations based on the same concepts or on concept sets included one into another. The property does not result in model expressiveness limitation, as it seems at first sight [Ov04, Ov05, Ov06-1].
The second reason of choosing SCM also follows from the semantic completeness property. Absence of alternative associations over concepts results in sufficiency of mastering one association to get complete understanding of interrelation of the concepts: the association describes interrelation semantics completely; there is no need to use other associations for clarification, some special cases, etc. All the special cases are inevitably extracted in explicit way within SCM as separate concepts.
Moreover, absence of alternative associations also results in uselessness of association proper names: it is sufficient to enumerate concepts underlying an association to refer to it (it is the fact that cleared the way to SCM query language with unique properties - SCQL). As an association has no name, one does not need to remember it. In global integration environment, this feature becomes very important since the quantity of names to remember in other case would be boundless.
So, it is sufficient for a specialist using SCM data access interface to know concepts of the appropriate application domain and facts of concept association existence with no name. This information is sufficient both for information structure comprehension and for formulation of complex queries to it. Such approach has fundamental distinctions from others: relational, where one should remember names of tables and columns, abbreviated perhaps; ontology, where one should know names of slots, etc.
Next: Conclusion and Bibliography
Previous: The Requirement of Unlimited Scalability Nonstop
