Abstract (english) | Software engineering is systematic application of engineering in the information system development. Part of software engineering is information system modelling and documentation. For this purpose it is customary to use a modelling language, such as Unified Modeling Language (UML). Information system model purpose is correct and exact representation of the modeled information system in used modelling language. Model quality is essential for every information system development project. Many projects fail due to poor application of software engineering practices, and this includes constant requirements changes, poor information system model quality, and models not consistent with gathered requirements. Such problems usually need more effort to be corrected than ordinary programming issues. Therefore, preventing poor model quality and inconsistencies is important. This can be done by introducing model compliance testing, i.e., model verification. To ensure model quality and correctness, a set of modelling rules need to be established for every project. These rules are then tested on the information system model to verify its compliance. This doctoral thesis explores and elaborates possibilities of having standard information system model analysis procedures for checking model correctness and quality. Since information system models can be created using various modelling languages and notations, model analysis procedures must process an intermediary model, transformed from the original model. Graph based models are identified as a good solution, since most of the modelling languages and notations used in software engineering can be transformed into a graph. Some other information sources, such as relational databases can be transformed into graph based models, which makes model analysis procedures presented in this thesis applicable to various information and model sources. Using graph based intermediary models, makes graph pattern recognition basis of every model analysis procedure. For the graph pattern recognition purpose, this thesis contains a definition of newly proposed hierarchical fuzzy neural network cascade, used in model analysis procedures. The proposed neural network cascade can be used to perform graph queries. Therefore, a declarative graph query language (GQL) is specified over the proposed neural network cascade. Besides selection capabilities, i.e., graph pattern recognition, graph queries written in the GQL can comprise insertions, updates, and deletions on the intermediary graph model. The GQL is a generic language, and can be applied on graphs that do not represent information system models. The GQL is a novel idea since it is visual and unlike many existent graph query languages, does not use regular expressions. In order to enable model analysis procedure execution as a series of steps, this thesis gives specification of additional streamlined modeling analysis language (SMAL). A model analysis procedure is executed as a Petri net, whose composition is enabled by SMAL. SMAL elements allow defining metric and validity rules for model correctness and quality check. SMAL, GQL, and fuzzy neural network cascade are implemented as an eclipse framework plugin. This thesis contains an overview of the practical implementation, including testing on a real-life information system model. For testing, a medium sized UML model of a library was created. The library model comprises use cases, classes, components, and interactions. The purpose of testing on such model is to check practical implementation performance and applicability on various UML elements. This thesis is organized in eight chapters. First chapter contains generic introduction similar to this extended abstract. The second chapter gives a generic overview of software engineering practices, focused on different modelling languages and notations that can be used to design, describe, and document an information system. Specific focus of this thesis is Meta Object Facility (MOF), a meta-language developed by the Object Management Group (OMG), and MOF derived modelling languages such as UML. SMAL and GQL are also MOF derived modelling languages. Four MOF meta-levels are explained in this chapter. SMAL and GQL are placed in the language specifying meta-level (M2). Graph queries and model analysis procedures are placed in the model specifying meta-level (M1). Final execution elements for graph queries and model analysis procedures, i.e., neural network cascades and Petri nets, are placed in the object meta-level (M0), and applied on other models in the model meta-level (M1). Inherently, graph queries and model analysis procedures can be applied on themselves. For this purpose, an undirected conceptual attributed role based graph model is developed, and an example of a model transformation is given. The difference between model and visual notation is considered. A model represents data made of modelling language elements, while model diagrams are visual representation of the model. Each modelling language element has its own visual notation. When combined together, they are forming a diagram that reflects the model. Model quality and correctness consideration is given in the third chapter. This chapter includes various quality models that can be used for model quality evaluation. Quality models are frameworks that define quality elements which need to be assessed in an information system model. Results of the model assessment are combined together giving one final overall model quality evaluation. To quantify result for each quality element, assessed model needs to be measured and verified. A set of model metric definitions can be defined to support evaluation of some model quality elements. A solution in form of a procedure and block schema is given to explain how graph pattern recognition and model analysis procedures can be combined to perform elementary model measurements. A special model measurement processing component is added to the solution, to assist collecting elementary model measurements and to derive complex model measurements. Additionally, a set of model hypotheses can be defined to support evaluation of some model quality elements. Model hypotheses are logical atoms that use model measurements to verify model correctness. Additional solution in form of a procedure and block schema is given to explain how graph pattern recognition and model analysis procedures can be combined to perform model correctness verification using defined model hypotheses. A special model correctness verification component is added to the solution, to assist collecting model hypotheses results and combine them in a verification rule. Results of these verification rules are combined together in a logical program that calculates model overall correctness. Model measurements and verification rule results are matched against used model quality elements, which gives us ability to evaluate model correctness in the context of the selected quality model. In order to be able recognize graph patterns, several different graph matching solutions are evaluated in the fourth chapter. Most of the classical graph matching algorithms are not suitable in this case, from learning and usage performance point of view. A new solution is proposed that combines fuzzy logic, hierarchical multi-stage approach, and recursive encoding network, to achieve hierarchical fuzzy neural network cascade. This is a set of neural networks divided in four stages, where each stage is responsible for recognition of graph elements on a different abstraction level. Local graph element features, such as concept, attributes and roles are recognized in the first stage. Later stages aggregate inference values following the structure of the learned graph, to finish with a final single result in the last stage. Most of the stages are fuzzy oriented, and an unsupervised learning algorithm is used for their construction. Only the last stage uses standard neural networks, such as perceptron or multilayered perceptron, which require usage of supervised learning algorithms. Using unsupervised learning algorithms greatly improves construction speed of a neural network cascade, and allows fast compilation of graph queries written in the GQL into an executing set of neural network cascades. In the same chapter, complete architecture of the proposed neural network cascade is detailed, and learning procedure for the neural network cascade is given. Also, to support some specific features in the GQL, such as path searching capabilities and comparator usage, the neural network cascade architecture is extended. Testing of the proposed neural network cascade was done on several different standard testing sets and scenarios. This enabled a comparison between proposed neural network cascade and similar graph matching methods and algorithms. Testing neural network cascade noise resilience was also very important, as tested models can contain noise in form of extra or missing elements, attributes, different values, and similar. This turned out to be important when recognizing names and string values. String values and text recognition algorithms are evaluated in the same chapter. The proposed GQL is specified in the fifth chapter. Various existing graph query languages are evaluated as part of the GQL introduction. Most of these languages are regular expression oriented. However, some graph query languages are visually oriented and using graph structure combined with regular expressions. To make it more user friendly, the proposed GQL does not use regular expressions. The structure and notation of the GQL is adjusted to the proposed neural network cascade, so that graph queries written in the GQL can be compiled directly into neural network cascades. A graph query is written in the eclipse visual editor, and directly compiled into neural network cascades, then executed against tested graphs by a proposed graph query execution algorithm. The proposed graph query execution algorithm is executed in the following phases: selection, post-selection, subquery execution, action execution, hypotheses evaluation, and result composition. In the selection phase, the graph execution algorithm uses the compiled set of neural network cascades to obtain a set of elementary graphs, which represent graph query results. These elementary graphs are named graphlets. Resulting graphlets are additionally selected in the post-selection phase, based on conditions in the executed graph query. An important phase of the graph query execution algorithm is action execution. In this phase all inserts, updates and deletions are performed on the set of resulting graphlets. This way, each graphlet can be modified. This capability allows graph transformation through graph query execution. At the end of the graph query execution algorithm, graph query hypotheses are evaluated. These hypotheses are evaluated against the final set of graphlets. In the case of model analysis procedures, graph query hypotheses represent model hypotheses, whose evaluation directly affects model quality and correctness evaluation. The GQL elements are the following: • Selectors - Elements responsible for selection of nodes and edges of an input graph. A graph query can be made only by using selector elements, • Conditions - Condition elements can be applied on selectors, and are used to additionally apply conditions on the input graph nodes and edges, • Aggregations - Aggregation elements that can be applied on selectors, and are used to aggregate graphlets, • Subqueries - Elements that allow query nesting, • Actions - Action elements in graph queries allow insertions, updates and deletions of nodes and edges in graphlets, • Hypotheses - Elements that allow defining hypotheses on the resulting set of graphlets. The remainder of the chapter is giving the GQL specification, along with a number of query examples for each GQL element. To enable execution of a model analysis procedure in a series of steps, the SMAL is introduced and specified in the sixth chapter. Modified Petri nets are used for execution of model analysis procedures written in the SMAL. In the SMAL introduction, mapping between SMAL elements and a Petri net is elaborated. The SMAL is specified to enable this mapping. Petri net modification includes position and transition specializations. Net positions can be specialized between input, output and standard. Net transitions can be specialized between generators, processors and consumers. Input and output positions can contain various tokens, while standard positions can contain only intermediary graph model tokens. A generator transition can be used to connect an input position to a standard position in the modified Petri net. Generator transitions are used to transform external models, such as UML model, contained in an input position token into a graph model token. Graph model tokens are then processed by the processor transitions. A consumer transition can be used to connect a standard position to an output position in the modified Petri net. Consumer transitions are used to transform graph models contained in a standard position token into an external model token placed to an output position. Generator and consumer transitions can be additionally used to perform additional actions, such as activity and token content logging. The SMAL is specified to support this structure, by having generator, processor and consumer elements. Generator SMAL elements are used to transform external models to intermediary graph models. Processor SMAL elements are having the following responsibilities: • Graph query processors - Allows using graph queries on an input graph model token. The result of the graph query execution is the output graph model token, • Flow control - Elements that allow control of the token flow. Can be used for token testing, splitting, merging, and similar actions on graph model tokens, • Subnet - Allows nesting of model analysis procedures, • Metric processors - Model metric processing component that allows creating metric definitions and link these definitions to the underlying graph queries, • Verification processors - Model verification processing component that allows creating verification rules and their combinations, and to link these definitions to hypotheses in the underlying graph queries. Consumer SMAL elements can be used to transform intermediary graph model that represents processing result into an external model, such as UML model. A Petri net that represents a model analysis procedure stops executing when no tokens are left in the net. After the Petri net stopped executing, model measurements and verification rules results from metric and verification processors are displayed to the user, giving him an insight into quality elements evaluation results. The remainder of the chapter is giving the SMAL specification, along with a number of examples how to use each SMAL element. Testing of the practical implementation is presented in the seventh chapter. Two different UML models were created to test various model analysis procedures. A medium sized model of a library was created, and is given in the appendix A. The library model is containing use cases, classes, components, collaborations, and interactions. A set of universal Component Based Design (CBD) model analysis procedures were development and applied on the library model. The intention of the test was to check the library model quality and correctness from the CBD point of view. The second testing model analysis procedure is applied against UML class models. The purpose of this test is to check whether a class hierarchy is having a private attribute not accessible through a public non-static method, i.e., there is no public non-static method in the same class or any specialization classes in the class hierarchy. This model analysis procedure is used on several different correct and incorrect class hierarchies, to test whether procedure is detecting an introduced problem or not. This thesis contains the following scientific contributions: 1. A language for information model analysis procedures specification, which enables quicker creation of procedures, model measurements, and eliminates need for specialized model measurement solutions. Existing solutions for model measurements and correctness verification are specialized and tool-embedded procedures whose execution can be modified either by changing their execution parameters, or changing their source code, which is not an option for an average user. SMAL and GQL are open languages that allows creation of procedures for model transformation and analysis. The practical implementation is given as a plugin in the eclipse framework, which contains UML modelling plugins as well. Model analysis procedures are directly using UML2 model plugin in the eclipse framework to access UML models. Model analysis procedures represent models that can be used to analyze, check and measure another model. The SMAL is specified to enable addition of new generator, processor and consumer elements. This allows model analysis procedures to use other information and model sources, which enables universality of model analysis procedures written in the SMAL and GQL. Since the SMAL and GQL specification is independent from the practical implementation, languages specified in this thesis are not strictly bound to the eclipse framework. 2. Combining model transformation and analysis features in the information model analysis procedures, enabling model transformation to higher abstraction levels. GQL enables creation of procedures that allow model analysis, transformation and pattern recognition. Several model analysis procedures in the seventh chapter represent model transformation examples. Graph queries in these procedures are selecting elements of the input model, and then updating them by adding new attributes, relations, nodes, to ease final model analysis. Such approach is used in testing private attribute accessibility in the second test of the practical implementation. Model analysis procedures that evaluate UML interactions tend to replace a set of model nodes with more abstract ones, to determine communication between two components of an analyzed information system model. This is considered for a transformation that raises model abstraction level. Lifelines and messages in the UML model interaction are replaced by a single "communication" edge, which makes final model analysis much easier. 3. Original method of information model verification that enables transformation of an original model to an intermediate model, which is then tested and verified. Testing the intermediary model means transitional verification of the original model. The SMAL contains special metric and verification processors that implement metric and verification processing components, used to collect, combine and display results of the model analysis procedures execution. A metric processor is used to collect model measurements from graph query selectors, while a verification processor can be used to combine graph query hypotheses evaluation results in verification rules allowing model testing. Model analysis procedures comprises a number of graph query processors, whose role is to transform an input model. The transformation of the input model is important, since information system models can be quite large. Such transformation of the input model helps reducing its size. Measurement and verification done on an intermediary model elements that did not suffer information loss in the transformation process can be considered for transitionally applicable on the original model as well. |