|Abstract (english)|| |
Business intelligence encompasses a set of technologies and processes with the target objective of providing proper and high quality information used for operational, tactical and strategic decision making activities. Data extraction, transformation and loading are carried out within the central data repository storage - data warehouse system. Two of the most prominent approaches for building a data warehouse are Ralph Kimball’s top-down and Bill Inmon’s bottom-up approach. In both cases, the development of ETL (Extract, Transform, Load) procedures presents one of the most challenging phases of the data warehouse building process, from the viewpoints of required engagement of computer science engineers and financial costs. The modelling of banking information systems is challenging due to regulatory, competitive and operational requirements, and the necessities of the high-quality optimization regarding large data volumes, which must be processed within the data warehouse. Data model must not be separated into the detached entity on the conceptual level, but conceived in a way that supports different presentations of the same data for various groups of business users, while simultaneously utilizing unified interfaces and central data repository module, with the objective of supporting day to day operational business tasks. Main aspects of business intelligence and data warehousing have been presented in this work through an overview of a research domain, with an emphasis on challenges and possible applications in the banking industry domain, i.e. in the environment where the actual data models often do not conform to recommended practices and approaches, which is, to a large extent, caused by communication problems between business users and computer science engineers, lack of knowledge regarding the actual business requirements, and consequently the production of reporting modules that become only an end in itself. Depending on a context-aware data that appears by some value obtained by the observation, human knowledge can be described by a hierarchical structure that ranges from meagre portrayal in the context of fundamental data to more extensive context terms – information, knowledge and wisdom. Resource Description Framework (RDF) model is a family of World Wide Web Consortium (W3C) specifications used for the conceptual description of resources and their properties, as well for information sharing between program agents over World Wide Web (WWW) services. RDF Schema (RDFS) imposes vocabulary constraints, provides a mechanism for the description of related resources and enables the definition of classes, class hierarchies and instances (individuals), domains and ranges of predicates, using the principles similar to the object oriented programming paradigm. OWL (Web Ontology Language) supports even greater semantic expressiveness using the existing RDF/RDFS vocabulary constructs enriched by the additional properties of classes and predicates. Utilization of RDF/RDFS and OWL constructs enables the formalization of an ontology – a formal, explicit specification of conceptualization which describes a certain phenomenon or a domain by identification and explicit definition of the corresponding concepts. OWL encompasses a set of logical axioms, developed by taking into account the purpose and scope of vocabulary used for domain description, i.e. construction of a framework used for information exchange. The stated ontology properties, combined with the reasoning capabilities based on user defined rule-bases, provide a solid foundation for knowledge representation and discovery. Conceptualization of an abstract model into explicitly defined, system platform independent format, i.e. ontology, is an iterative process which requires definition of the business domain of interest, the corresponding concepts, classes, taxonomy and hierarchy of classes, by incorporating associated properties and relationships, as well as by introducing instances and axioms. In the scope of the ontology production process, it is recommended to follow the principles stated in the ontology development methodologies. This dissertation encompasses the analysis of five ontology development methodologies and application of the most appropriate practices in the process of addressing a complex business case within the domain of banking industry, with the goal of enquiring justifiability of the semantic technologies incorporation into the banking data warehouse system. These methodologies define steps, techniques and methods, some generic, some concrete, which are applicable in the development process. Interlinking of domain experts, i.e. business users, and computer science engineers, whose task is to formalize the ontology in some well-defined language, is mandatory in the process of ontology development, because understanding of the business domain, as well as the understanding of ontology language formalisms, are both fundamental preconditions. The development process is initiated by the identification of data sources, definition of the scope of business domain and user groups (domain experts) who are intended to utilize the ontology. By the production of the motivating scenario and competence questions, expected model requirements are stated, which is followed by an informal definition of concepts and corresponding relationships, by forming natural language descriptions and by selecting the primary development approach – bottom-up, top-down or middle-out, which was applied in scope of the research presented in this dissertation. One must note that, from the data warehouse architects' point of view, the motivating scenario is often made by third parties (regulatory authorities, supervisory and management board, non-IT organization sectors) who are often not familiar with the functioning principles of the financial institutions' internal reporting systems. In most cases, the motivating scenario provides overall specifications regarding business requirements so the compilation of competency questions is required in order to provide comprehensive, concrete and usable specification. Competency questions are composed by the internal domain experts, who are for the most part aware of the banking reporting systems’ functioning principles, and who are adjoined by the data warehouse architects in various project phases. Ontology development is a cyclic process, characterized by the constant interlacing of domain experts and computer science engineers. After compiling the glossary and the table of binary relationships between concepts, the building of the formal ontology apparatus was conducted, by applying the middle-out approach in the formal conceptualization phase, as in the previous phase of the informal conceptualization. In most cases, the originally developed ontology model will require continuous improvements through the phases of maintenance and enhancement. Banks’ primary inflows are deposits with short to medium-term maturities. These funds are disbursed on the market through the long-term loans with tenures of up to thirty years, inducing the maturity mismatch of banks’ assets and liabilities, as well as the occurrence of liquidity risk, which is defined as the inability of the settlement of due liabilities in a certain period of time. The importance of liquidity management transcends individual entities - payment obstructions overrun single financial institution and jeopardize the financial system on the whole. Upon request of the Croatian National Bank, in order to measure and manage liquidity risk and retain liquidity coverage ratio at acceptable levels, the monitoring of cumulative mismatches across different time periods, as well as the reconciliation of assets and liabilities, must be implemented in each banking institution in the Republic of Croatia. To that end, one must build up a corresponding analytical maturity ladder, at the level of the payment schedule of a single exposure, which provides information related to cumulative net surpluses and deficits at various intervals. For each contract/exposure in the balance sheet, the maturity ladder contains number of rows representing calculated and projected cash flow components, dependent on the settled tenure. Justification for incorporating the ontology into the banking data warehouse, as well as the possibilities of applying specific methodological techniques and methods, have been explored in the scope of this research, in three main phases, through the development of the above stated maturity ladder reporting module. Primarily, it was required to identify relevant accounting positions by the parametrization of the chart of accounts and by the introduction of the amount type entities into the existing dimensional model. For each portfolio product acquired by the financial institution's counterparties, a number of associated contracts occur in the data warehouse system. Contract and product's attributes also provide a foundation for the definition of assets and liabilities in the scope of the maturity ladder extraction. Still, the requirement for the introduction of additional entities, i.e. instrument types and aggregated booking report item entries, aroused during the informal and formal conceptualization phases, in order to improve unification and integration of data. Maturity ladder comprises an analytical flat entity, including counterparty, contract and account concept’s identifiers, allocated by projected maturities. In order to utilize analytics through the reporting tools, one must define maturity buckets. In other words, reporting items, consisting of various maturity spans, must be calculated dynamically, in accordance with regulatory and internal requirement specifications. Entities representing the chart of accounts, amount types, banking product and financial instrument catalogs, analytical and aggregated reporting items, business hierarchies and rules, were incorporated through the ontology module. They comprise the most volatile part of the existing data warehouse system, and the greatest enhancements in the terms of flexibility, agility and transparency were gained by their definitions through the utilization of semantic technologies, that is, through the definition of corresponding ontology concepts and relationships. In order to ensure the proper functioning of the data warehouse system, and consequently, the timely delivery of diverse analytical financial reports, optimization must be applied at the individual and collective level of ETL procedures' processing, bearing in mind that development of ETL procedures constitutes the most expensive part of a data warehouse project, expressed both in consumption of time and resources. Partition pruning enables simultaneous handling of the single logical entity - table, and multiple physical entities – partitions, in order to address the problems related to concurrent transactions processing. The more complex database query, the less probable its’ optimum performance. Therefore, it is worth to consider decomposition of complicated queries into reduced modules. The selection of the appropriate join techniques should be done empirically – one should experiment the use of hash joins, sort joins and nested loops joins for each query module and decide on the fastest solution. A data warehouse system includes a certain number of data marts that differ in importance and which are used as a foundation for the construction of corresponding reports. Every report must be completed until its deadline. In this perspective, the best optimum schedule on the collective level of the entire set of ETL procedures is the one in which the minimum number of reports breaches user defined deadlines. A finite number of distinct ETL procedures can be executed in parallel, within a particular period of time. If the maximum number of procedures that can be executed concurrently is defined by the variable SESSIONS at the database management system level, then this problem is equivalent to the common problem in the manufacturing industry - the job-shop scheduling problem (JSSP). A classical JSSP can be described as a set of N jobs which have to be processed on a set of M machines. Unlike the JSSP optimization, which attempts to minimize the total execution time, in the context of ETL procedures’ optimization, one should propose the definition of individual time constraints and priorities with the associated weighting factors, since the total execution time does not affect the quality of the solution. Using this approach, groups of users and corresponding ETL procedures can be precisely specified and the fitness function fine-tuned, assuming some level of commitment and cooperation between data warehouse architects and business users. The ontology module developed in the scope of this work, was built and integrated into the existing data warehouse system using the same principles that were followed in the development process of any other existing component in the data warehouse. Therefore, it is possible to apply particular and collective optimization techniques in the similar manner while loading data into ontology or retrieving data from it. Business entities were initially defined within the standard dimensional model, then subsequently designed and integrated into the ontology, focusing on the unification to the largest extent possible by mapping accounts into the amount types, as well as by building hierarchies of banking products, financial instruments and reporting items of booking entries, at various aggregation levels. The ontology was entirely integrated into the existing data warehouse system and it was possible to combine data stored in the semantic graph with the data stored in the relational tables within single database query. Execution performances of the complete maturity ladder calculation were tested, as well as the performances of the most critical operations in the computation process. Algorithm was firstly performed by consuming exclusively the standard dimensional model, which was followed by the implementation of business rules, utilized for data retrieval and aggregation operations, within the ontology. The third step of performance testing incorporated storage of slowly changing dimensions of contracts and counterparties into ontology as well. Taking into the account the increased flexibility of the ontological approach, in terms of the definition of critical entities, hierarchies and corresponding relationships in the form of triplets that constitute the semantic graph, as well as non-aggravated performances when storing business rules within the ontology module, the integration of the ontology into the existing data warehouse system was proved justified and useful. One should note that the definition of custom business rules is feasible by utilizing ontology rules engine. Reasoning over these rule bases produces new triplets within the ontology and offers potentially powerful tool in order to utilize data quality procedures. The research presented in this paper will be expanded by exploring the possible applications of ontology based data quality assurance module, incorporated into the existing data warehouse.