Denisa MILLO - ANALIZA EMOCIONALE PËR KORPUSIN E TEKSTIT MEDIATIKO-EKONOMIK NË GJUHËN SHQIPE. ZGJEDHJA E MODELIT OPTIMAL DUKE PËRDORUR TRANSFORMUESIT.

Titulli i Disertacionit: ANALIZA EMOCIONALE PËR KORPUSIN E TEKSTIT MEDIATIKO-EKONOMIK NË GJUHËN SHQIPE. ZGJEDHJA E MODELIT OPTIMAL DUKE PËRDORUR TRANSFORMUESIT.

Autori: Denisa MILLO
Institucioni: Universiteti i Tiranës, Fakulteti i Ekonomisë, Departamenti i Statistikës dhe Informatikës së Zbatuar
Fusha e studimit: Sisteme Informacioni Ekonomi
Data e publikimit: 27.04.2026
Disertacioni gjendet i publikuar në Gjuhën Shqipe

© E drejta e autorit: Denisa MILLO

Publikuar nga Universiteti i Tiranës
Bazuar në aktet ligjore, rregulloreve dhe politikave të UT-ës

👉 Klikoni këtu për të parë disertacionin e plotë (PDF)

Abstrakti:

Ky studim adreson mungesën e korpuseve gjuhësore në gjuhën shqipe, veçanërisht në fushën ekonomike, ku analiza tekstuale kërkon saktësi të lartë dhe modele të afta për të kapur nuancat semantike. Punimi i përgjigjet pyetjeve kërkimore lidhur me ndërtimin e korpusit, gjetjen e modeleve transformuese të përshtatshme për gjuhën shqipe, krahasimin e tyre për gjetjen e modelit me performancë më të mirë dhe adresimin e kufizimeve metodologjike dhe teknike.

Analiza e literaturës konfirmon se modelet XLM-R dhe DistilBERT janë modele më të përshtatshëm për gjuhët me burime të pakëta, të cilat të kombinuara me Regresin Logjistik përdoren për klasifikimin shumëklasësh të emocioneve.

Analiza krahasuese dhe rezultatet tregojnë se DistilBERT-i, falë kompaktësisë dhe efikasitetit llogaritës, arrin performancë më të qëndrueshme në korpusin e përdorur në gjuhën shqipe, me F1-score 0.76 kundrejt 0.72 të XLM-R-së. Analiza e matricave të konfuzionit konfirmoi aftësinë e DistilBERT-it për të diferencuar më qartë emocionet semantikisht të përafërta.

Megjithatë, studimi vuri në dukje kufizime të rëndësishme, përfshirë mungesën e korpuseve të mëdha në shqip, morfologjinë komplekse të gjuhës, mungesën e përshtatjes së imët (fine-tuning) të plotë dhe kufizimet infrastrukturore në trajtimin e modeleve shumëgjuhëshe. Rekomandimet përfshijnë zgjerimin e korpuseve ekonomike në gjuhën shqipe, ndërtimin e tokenizuesve të dedikuar për shqipen dhe përdorimin e GPU-ve ose TPU-ve për trajnime të avancuara.

Studimi kontribuoi në themelimin e një baze shkencore empirike për krijimin dhe analizën emocionale për tekstin mediatiko-ekonomik.. Metodologjia e krijuar mund të përdoret në tekste të tjera dhe hedh themelet për identifikimin e problemeve të mashtrimit apo propagandës së realizuar me mjete mediatike.

Nënfusha e Studimit:

Sistemet e Informacioni në Ekonomi

Kodi i nënfushës: 061 sipas ISCED-06.

Fjalët kyce:

Përpunimi i gjuhës natyrore (NLP), analizë emocionale, korpusi ekonomik në gjuhën shqipe, transformuesit, XLM-R-i, DistilBERT, Regresioni logjistik, gjuhët me burime të kufizuara, metrika F1- score.

Abstract:

This study addresses the lack of language corpora in the Albanian language, especially in the economic domain, where textual analysis requires high accuracy and models capable of capturing semantic nuances. The paper responds to research questions related to corpus construction, identifying suitable transformer models for Albanian language, comparing them to determine the best-performing model, and addressing methodological and technical limitations.

The literature review confirms that XLM-R and DistilBERT are among the most suitable models for low- resource languages, and that, when combined with Logistic Regression, they can be used for multiclass emotion classification.

The comparative analysis and results show that DistilBERT, thanks to its compactness and computational efficiency, achieves more stable performance for Albanian language corpora used , with an F1-score of 0.76 compared to 0.72 for XLM-R. The analysis of the confusion matrices confirmed DistilBERT’s ability to differentiate more clearly between semantically similar emotions.

However, the study highlighted important limitations, including the absence of large Albanian corpora, the complex morphology of the language, the lack of full fine-tuning, and infrastructural constraints when working with multilingual models. Recommendations include expanding Albanian economic corpora, building dedicated tokenizers for Albanian, and using GPUs or TPUs for advanced training.

The study contributed to establishing an empirical scientific foundation for emotion media-aconomical corpus creation and analysis. The proposed methodology can be applied to other texts and lays the groundwork for identifying issues of deception or propaganda produced through media tools.

Sub-Field:

Information Systems in Economics

Code: 061 as per ISCED-06.

Keywords:

Natural Language Processing (NLP), emotion analysis, Albanian economic corpus, transformers, XLM- R, DistilBERT, Logistic Regression, low-resource languages, F1-score metric.