Naslov Estimacija emocionalnih stanja zasnovana na dubinskoj analizi akustičkih značajki govornoga signala
Naslov (engleski) Emotional state estimation based on data mining of acoustic speech features
Autor Branimir Dropuljić
Mentor Davor Petrinović (mentor)
Mentor Krešimir Ćosić (komentor)
Član povjerenstva Davor Petrinović (član povjerenstva)
Ustanova koja je dodijelila akademski / stručni stupanj Sveučilište u Zagrebu Fakultet elektrotehnike i računarstva (Zavod za elektrostrojarstvo i automatizaciju) Zagreb
Datum i država obrane 2014, Hrvatska
Znanstveno / umjetničko područje, polje i grana TEHNIČKE ZNANOSTI Elektrotehnika Elektrostrojarstvo
Univerzalna decimalna klasifikacija (UDC ) 621.3 - Elektrotehnika
Sažetak Estimacija emocionalnih stanja iz govora može imati važnu ulogu u mnogim područjima. U okviru ove doktorske disertacije realiziran je sustav za estimaciju emocionalnih stanja, temeljen na akustičkim značajkama govornog signala, koji svoju primjenu može naći u psihoterapiji te u postupcima selekcije i obuke kandidata za stresne i odgovorne operacije. Zbog takvog potencijala je poseban naglasak stavljen na estimaciju govora pod stresom, kao i na pobuđivanje ispitanika prepadnim, odnosno startle
... Više pobudama. Istražena je neurobiološka podloga nastanka emocija kao i utjecaj emocija na biološke mehanizme za produkciju govora, a posljedično i na pojedine akustičke parametre i značajke iz glasa. Predložene su mjere perturbacije glasa, odnosno značajke utjecaja limbičkih struktura na poremećaje koordinacije antagonističkog procesa titranja glasnica, koje su rezultirale značajnom razlučivosti na razinu stresa u glasu. Pritom je ustavovljena i njihova robusnost na voljne komponente govora, konkretno dinamike fundamentalne frekvencije tijekom izgovora, gdje se konvencionalne perturbacijske mjere (jitter) nisu pokazale toliko uspješne. Analiziran je utjecaj intenzivnih zvučnih pobuda impulsnog oblika, odnosno startle pobuda, na promjene fundamentalne frekvencije glasa. Takozvane fear-potentiated startle reakcije nalaze veliku primjenu u dijagnostici posttraumatskog stresnog poremećaja, odnosno u paradigmama kondicioniranja i ekstinkcije straha. Kao konvencionalna mjera za predikciju startle reakcija danas se koristi elektromiografija orbicularis oculi mišića, to jest analiza treptaja oka. U okviru ove disertacije izvršena je usporedna analiza odziva na fundamentalnoj frekvenciji i odziva na orbicularis oculi mišiću te su ustanovljene konzistentnosti i slična svojstva odziva. Nadalje, predloženo je unaprjeđenje konvencionalne arhitekture sustava za estimaciju dimenzijskih emocija, ugode i pobuđenosti, s a priori znanjem o povezanosti tih emocija. Analizama je potvrđeno unaprjeđenje točnosti estimacije korištenjem takve arhitekture. Sakrij dio sažetka
Sažetak (engleski) This doctoral thesis is the result of research on the project “Adaptive Control of Scenarios in VR Therapy of PTSD”, which aims to develop collaborative and intelligent agent that, as a decision-making support, could be applicable in a number of areas such as prediction, selection, diagnosis and the treatment of mental disorders, especially those caused by stress. The thesis explores the estimation problem of emotional states, stress and acoustic startle responses based on acoustic speech
... Više features. Emphasis is placed on evaluating the features using statistical analysis methods in the context of the aforementioned problems. New voice perturbation features are proposed and evaluated in this thesis that describe the impact of limbic structures on neural regions responsible for coordinating the antagonistic process of the vocal folds vibrations. A comparative analysis of changes in speech fundamental frequency (F0) with electromyographic (EMG) response of the orbicularis oculi muscle was performed. This thesis proposes improvement of the conventional system architecture for estimating emotional dimensions, valence and arousal, with a priori knowledge about the relation between these two emotional dimensions. The introductory chapter defines the domains, motivation and objectives of the research, citing the inherent interdisciplinarity of the research field. The scientific contributions and the structure of the dissertation are also defined in this chapter. In the second chapter, neurobiological processes are described through which emotions impact on speech production mechanisms. The influence of emotions on respiration, phonation and articulation mechanisms of speech is explored. Special attention is given to the internal muscles of the larynx, i.e. phonation mechanisms, which due to their sensitive structures are most vulnerable to the impact of emotions. The acoustic speech features that are commonly used for estimation of emotional states and stress are described in the third chapter. Furthermore, decomposition of speech fundamental frequency is proposed, where components selectively include specific neurobiological processes of emotions. Speech perturbation features are proposed that describe the time and amplitude aspect of the disturbance in the vocal folds oscillation, which is a consequence of the limbic system influence on the cerebellum and brainstem. The proposed features are validated using the example of artificially generated speech perturbations and in terms of speech under stress. In most cases, the proposed features showed statistically significant difference to the level of speech perturbations and the level of stress. Furthermore, their satisfactory robustness was shown to the impact of the voluntary component in pronunciation, in particular the dynamics of the fundamental frequency, which is their main benefit over conventional speech perturbation measures (e.g. jitter measures). In the fourth chapter, F0 features are validated in the context of the acoustic startle response. Features like peak value, peak time, duration etc. are validated depending on the parameter changes of the startle stimulus, i.e. intensity, duration, rise time and spectral characteristics of the stimulus, as well as depending on the existence and intensity of the startle response. A comparative analysis is performed between F0 response features and EMG features of the orbicularis oculi muscle response (eyeblink), which is considered the reference measure for startle reaction analysis. Analyses have shown similar behavior of F0 and EMG responses when changing the intensity of the startle stimulus. In both cases the highest statistically significant difference is achieved for the response peak value. A significant increasing trend was observed in peak values of F0 and EMG responses with an increase in the stimulus intensity at higher levels of stimulus intensity. In the fifth chapter, the methodology of emotional state estimation based on acoustic speech features is described, which is conventionally performed through four sequential processes: speech signal processing with the extraction of acoustic measures; feature calculation from acoustic measures; reduction of a feature space; and estimation of emotional states using machine learning methods. An upgrade of conventional architecture for estimating emotion dimensions, valence and arousal, which is based on a priori relationships between the two dimensions, is proposed in this thesis. A priori model is applied on the conventional estimation process in order to shift estimation results in valence-arousal space toward more probable values, according to the level of the estimation uncertainty. Different approaches to a priori knowledge modeling have been undertaken: (a) single integral model over valence-arousal space, and (b) integration of multiple models that represent different discrete emotions in the valence-arousal space, specifically happiness, sadness, fear, anger and neutral state. Building and validation of the emotional state estimation system are performed using utterances from the Croatian emotional speech corpus, which was collected and annotated in collaboration with the University of Zagreb, Faculty of Humanities and Social Sciences. In the sixth chapter, validation of machine learning methods, specifically support vector machines and random forest, is performed in the cases of emotional states, stress and startle response estimation. In this context, the improvements proposed in the thesis were compared with conventional approaches from the literature. The results showed the justification for introducing new perturbation speech features for classification of speech under stress, applying F0 features for startle response analysis and proposing the enhanced method for estimation of emotional states. The last chapter concludes the doctoral thesis and provides suggestions for future related research. Specific applications of the proposed methods are also discussed. Sakrij dio sažetka
Ključne riječi
estimacija emocionalnih stanja
dubinska analiza podataka
emocionalna stanja
govorni signal
akustičke značajke
perturbacija glasa
intenzivna zvučna pobuda
korpus emocionalnog govora
Ključne riječi (engleski)
emotional state estimation
data mining
emotional states
speech signal
acoustic speech features
speech perturbation
acoustic startle
emotional speech corpus
Jezik hrvatski
URN:NBN urn:nbn:hr:168:872424
Studijski program Naziv: Elektrotehnika i računarstvo Vrsta studija: sveučilišni Stupanj studija: poslijediplomski doktorski Akademski / stručni naziv: Doktor znanosti elektrotehnike i računarstva (dr.sc.)
Vrsta resursa Tekst
Opseg 168 str. ; 30 cm.
Način izrade datoteke Izvorno digitalna
Prava pristupa Zatvoreni pristup
Uvjeti korištenja
Datum i vrijeme pohrane 2019-04-16 16:14:44