Sažetak | Govorna interakcija u stvarnom vremenu s robotom TIOSS koristeći umjetnu inteligenciju
Već se dvije godine radi na temeljitoj obnovi prvog hrvatskog robota TIOSS. TIOSS je aluminijski dvometraš kojeg su šezdesetih godina 20. stoljeća izradili članovi Kibernetičke grupe tadašnjeg Elektrotehničkog fakulteta u Zagrebu. Iako je prošlo više od šest desetljeća otkako je prvi puta predstavljen javnosti i tehnologija je izrazito napredovala, njegova pojava i dalje pobuđuje interes javnosti. Prošle se godine provodila intenzivna renovacija i modernizacija sustava i obnova motora koji će omogućiti ponovno kretanje glave i nogu. Nakon što su ponovno osposobljene neke od prvotnih funkcija, počelo se raspravljati i o dodatnoj modernizaciji. Tako je proizašla ideja za temu ovoga diplomskog rada koji se bavi implementacijom glasovne komunikacije s TIOSS-om. U budućnosti je ideja TIOSS-a izlagati na raznim manifestacijama te je bilo bitno učiniti ga što zanimljivijim posjetiteljima. Glasovna komunikacija u velikoj mjeri doprinosi doživljaju čovjek-robot interakcije. Planirano je koristiti jednokomponentno računalo Raspberry Pi za izvođenje programa. Osim toga, planirano je ugraditi mikrofon i zvučnike kako bi komunikacija bila moguća. Nakon opsežnog pregleda literature o trenutnim dostignućima modela prepoznavanja i sinteze govora, implementiran je sustav koji omogućuje prepoznavanje i sintezu govora. Za implementaciju korišten je programski jezik Python. Također, sustav komunicira s velikim jezičnim modelom GPT-3.5 koji omogućuje generiranje smislenih odgovora. Osim širokog znanja kojeg ima GPT-3.5, dodatno je naučen o povijesti TIOSS-a, njegovim tvorcima, funkcionalnostima, ali i svima koji su sudjelovali na njegovoj obnovi. Problematika zadatka leži u tome što potrebni modeli zahtijevaju velike računalne i vremenske resurse za svoje treniranje i izvođenje, a Raspberry Pi ima poprilično ograničene resurse. U konačnici, pronađena su optimalna rješenja i omogućena je tečna komunikacija.
|
Sažetak (engleski) | Real-time speech interaction with the TIOSS robot using artificial intelligence
For two years, work has been underway on the thorough restoration of the first Croatian robot, TIOSS. TIOSS is an aluminum two-meter robot that was made in 1960s by members of the Cybernetic Group of the then Faculty of Electrical Engineering in Zagreb. Although it has been more than six decades since it was first introduced to the public and technology has greatly advanced, its appearance continues to arouse public interest. Last year, intensive renovation and modernization of the system and restoration of the engine was carried out, which will again enable the movement of the head and legs. After some of the original functions were retrained, additional modernization began to be discussed. This is how the idea for the topic of this diploma thesis, which deals with the implementation of voice communication with TIOSS, came about. In the future, the idea of TIOSS is to exhibit at various events, and it was important to make it as interesting as possible for visitors. Voice communication greatly contributes to the human-robot interaction experience. It is planned to use a Raspberry Pi single-component computer to run the program. In addition, it is planned to install a microphone and speakers so that communication is possible. After an extensive review of the literature on the current achievements of speech recognition and synthesis models, a system was implemented that enables speech recognition and synthesis. The programming language Python was used for implementation. Also, the system communicates with the large language model GPT-3.5, which allows generating meaningful responses. In addition to the broad knowledge that GPT-3.5 has, he was additionally taught about the history of TIOSS, its creators, functionalities, but also about everyone who participated in its restoration. The problem of the task lies in the fact that the necessary models require large computer and time resources for their training and execution, and the Raspberry Pi has rather limited resources. Ultimately, optimal solutions were found and fluent communication was enabled.
|