Research Papers

Research Papers http://drr.vau.ac.lk/handle/123456789/253 Mon, 06 Jul 2026 21:21:06 GMT 2026-07-06T21:21:06Z Common technique for detecting and correcting both non-word and real-word errors in Tamil sentences http://drr.vau.ac.lk/handle/123456789/1784 Common technique for detecting and correcting both non-word and real-word errors in Tamil sentences Sakuntharaj, R.; Mahesan, S. Erroneous words can be classified into two categories, namely non-word errors and real-word errors. These errors can occur in sentences when typing a document due to fast typing, switching of fingers on keys, input tools and method, or not knowing the right pronunciation, correct spelling or the meaning of the word. A common approach to correcting non-word and real-word errors in Tamil language is proposed in this paper. Erroneous words are detected by considering the appropriateness of the words in the context of the sentence. A bigram probabilistic model is constructed as it is simple and found to be good enough to determine the appropriateness of the valid word in the context of the sentence (than a trigram model). In case of lacking appropriateness, the word is marked as an erroneous word (non-word or real-word error) and word-level trigram technique is used to generate suggestions. In case of finding more than three suggestions, word-level n-gram (unigram, bigram & trigram) language probabilistic model is constructed to determine suggestions appropriate to the context. Test results show that the proposed erroneous word detection and correction system performs well. In our testing with 9170 sentences having 142 non-word errors & 119 real-word errors, bigram probabilistic model detects all of them successfully. The bigram probabilistic model detects non-word as well as real-word errors. For the 261 erroneous words, error correction module gives 583 suggestions, and 569 of 583 suggestions are found to be appropriate to the context. The suggestions produced by the system are checked by a Scholar in Tamil language and found to be 97.6% correct with F1-score of 0.99. This shows that the approach proved to be good for detecting and correcting real-word errors can be used for non-word errors as well. Wed, 01 Jan 2020 00:00:00 GMT http://drr.vau.ac.lk/handle/123456789/1784 2020-01-01T00:00:00Z Do images really do the talking? analysing the significance of images in tamil troll meme classification http://drr.vau.ac.lk/handle/123456789/1783 Do images really do the talking? analysing the significance of images in tamil troll meme classification Hegde, S. U.; Hande, A.; Priyadharshini, R.; Thavareesan, S.; Sakuntharaj, R.; Thangasamy, S.; Bharathi, B.; Chakravarthi, B. A meme is a part of media created to share an opinion or emotion across the internet. Due to their popularity, memes have become the new form of communication on social media. However, they are used in harmful ways such as trolling and cyberbullying progressively due to their nature. Various data modelling methods create different possibilities in feature extraction and turn them into beneficial information. The variety of modalities included in data plays a significant part in predicting the results. We try to explore the significance of visual features of images in classifying memes. Memes are a blend of both image and text, where the text is embedded into the picture. We consider a meme to be trolling if the meme in any way tries to troll a particular individual, group, or organisation. We try to incorporate the memes as a troll and non-trolling memes based on their images and text. We evaluate if there is any major significance of the visual features for identifying whether a meme is trolling or not. Our work illustrates different textual analysis methods and contrasting multimodal approaches ranging from simple merging to cross attention to utilising both worlds’—visual and textual features. The fine-tuned cross-lingual language model, XLM, performed the best in textual analysis, and the multimodal transformer performs the best in multimodal analysis. Wed, 01 Jan 2025 00:00:00 GMT http://drr.vau.ac.lk/handle/123456789/1783 2025-01-01T00:00:00Z Extraction of Sentiments in Tamil Sentences Using Deep Learning http://drr.vau.ac.lk/handle/123456789/1782 Extraction of Sentiments in Tamil Sentences Using Deep Learning Loganathan, H.; Sakuntharaj, R. Sentiment analysis is the process of extracting information from the given text in which the text consists of various sensations such as happiness, perturbation, pride, worry, and so on about various functions, human beings, systems, and facts. Sentimental analysis or opinion mining uses data mining and natural language processing techniques to discover, retrieve and filter the information and opinions from the World Wide Web’s vast textual information. The sentiment analysers for European languages and some Indic languages are fully developed. However, Tamil, which is an under-resourced language with rich morphology, has not experienced these advancements. A few experiments have been conducted to determine the sentiments for Tamil text. An approach to doing the sentiment analysis for the Tamil language is proposed in this paper. The proposed approach uses Long Short-Term Memory, Convolutional Neural networks, and simple Deep Neural Network techniques. Test results show that the Long Short-Term Memory-based deep learning model performs well than the Convolutional Neural Network and simple Deep Neural Network for sentiment analysis of Tamil language with 94.10% accuracy. Sat, 01 Jan 2022 00:00:00 GMT http://drr.vau.ac.lk/handle/123456789/1782 2022-01-01T00:00:00Z A Sequential DNN for Sentiment Analysis of Dravidian Code-Mixed Language Comments on YouTube http://drr.vau.ac.lk/handle/123456789/1781 A Sequential DNN for Sentiment Analysis of Dravidian Code-Mixed Language Comments on YouTube Aaron Samuel, A.; Sambath Kumar, L.; Navaneethakrishnan, S.; Sakuntharaj, R. A method for determining if a block of text is positive, neutral, or negative is sentiment analysis. As code-mixed material in many native languages is becoming increasingly widespread, there is also an increasing need for intense research in order to produce satisfactory results. This research paper aims to classify the sentiments from a data set of comments/posts into pre-defined classes belonging to the code-mixed text in Tamil, Malayalam, and Kannada by utilizing the Sequential Deep Learning model on the code-mixed data set. The sequential model achieved an f1-score of 0.20 for Tamil-English, 0.48 for Malayalam-English, and 0.47 for Kannada-English data sets. The results were submitted to the competition ‘Shared Task on Sentiment Analysis and Homophobia detection of YouTube comments in Code-Mixed Dravidian Languages’ organized by DravidianLangTech. Sat, 01 Jan 2022 00:00:00 GMT http://drr.vau.ac.lk/handle/123456789/1781 2022-01-01T00:00:00Z