Memes Under the Lens: Multimodal Offensive Content Classification Using Text and Images
Author(s):Jasraj Singh Sehmbey1, Kanishk Srivastava2, Tejasv Kaushik3, Dinesh Kumar Vishwakarma4
Affiliation: 1,2,3Department of Information Technology, Delhi Technological University, Delhi, India ,?Professor, Department of Information Technology, Delhi Technological University, Delhi, India
Page No: 27-36
Volume issue & Publishing Year: Volume 2 Issue 5 ,May-2025
Journal: International Journal of Advanced Multidisciplinary Application.(IJAMA)
ISSN NO: 3048-9350
DOI: https://doi.org/10.5281/zenodo.17523653
Abstract:
Memes, with their fusion of images and text, have become a cornerstone of digital communication, encapsulating humor, cultural critique, and social commentary. However, their potential to disseminate offensive or harmful content presents a formidable challenge for automated content moderation systems, which often struggle to decipher the complex interplay between visual and textual elements. This study proposes an innovative multimodal deep learning framework to identify offensive memes, utilizing a robust dataset of annotated memes designed to test the synergy of text and image modalities. The approach employs the Inception-ResNet-V2 model, an advanced convolutional neural network, to extract intricate visual features from meme images, complemented by a transformer-based model that captures nuanced textual semantics. These modalities are integrated through a late-fusion strategy, enabling the model to interpret combined meanings that elude unimodal systems. Experimental evaluation reveals a balanced performance, achieving an overall accuracy of 55% and a macro-averaged F1-score of 0.51. The framework demonstrates notable strength in detecting non-offensive content, with a recall of 0.83, indicating reliability in identifying benign memes. However, its lower recall of 0.26 for offensive content highlights the difficulty of capturing subtle or context-dependent harmful intent. These findings illuminate the intricacies of multimodal classification and underscore the need for advanced techniques to address semantic ambiguities. By enhancing the detection of offensive content, this research contributes to the development of more effective content moderation tools, fostering safer and more inclusive online environments. It also lays a foundation for future explorations into real-time applications and cross-cultural adaptations, addressing the evolving landscape of digital communication.
Keywords: Multimodal Classification, Offensive Content Detection, Memes, Deep Learning, Convolutional Neural Networks, Transformer Models, Late Fusion, Content Moderation, Social Media, Hate Speech
Reference:
- 1. Kiela, D., et al. (2020). The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes. arXiv preprint arXiv:2005.04790.
- 2. Hossain, E., et al. (2022). MemoSen: A Multimodal Dataset for Sentiment Analysis of Memes. LREC 2022, pp. 20�25.
- 3. Suryawanshi, S., et al. (2020). Multimodal Meme Dataset (MultiOFF) for Identifying Offensive Content in Image and Text. Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pp. 32�41.
- 4. Afridi, T. H., et al. (2021). A Multimodal Memes Classification: A Survey and Open Research Issues. Innovations in Smart Cities Applications Volume 4, pp. 1451�1466.
- 5. Kannan, R., & Rajalakshmi, R. (2022). Multimodal Code-Mixed Tamil Troll Meme Classification Using Feature Fusion. ACL Anthology, pp. 1�8.
- 6. Ouaari, S., et al. (2022). Multimodal Feature Extraction for Memes Sentiment Classification. IEEE 2nd Conference on Information Technology and Data Science (CITDS), pp. 285�290.
- 7. Suryawanshi, S., et al. (2023). Multimodal Offensive Meme Classification with Natural Language Inference. ACL Anthology.
- 8. Wu, F., et al. (2024). Multimodal Hateful Meme Classification Based on Transfer Learning and a Cross-Mask Mechanism. Electronics, 13, 2780.
- 9. Thakur, A. K., et al. (2022). Multimodal and Explainable Internet Meme Classification. arXiv:2212.05612.
- 10. Chen, Y., & Pan, F. (2022). Multimodal Detection of Hateful Memes by Applying a Vision-Language Pre-Training Model. PLOS One.
- 11. Suryawanshi, S., et al. (2020). Multimodal Meme Dataset (MultiOFF) for Identifying Offensive Content in Image and Text. Academia.edu.
- 12. Alzu�bi, A., et al. (2023). Multimodal Deep Learning with Discriminant Descriptors for Offensive Memes Detection. Journal of Data and Information Quality.
- 13. Deng, X., et al. (2023). Meme-Integrated Deep Learning: A Multimodal Classification Fusion Framework to Fuse Meme Culture into Deep Learning. Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023).
- 14. Sabat, B. O., et al. (2019). Hate Speech in Pixels: Detection of Offensive Memes Towards Automatic Moderation. arXiv preprint arXiv:1910.02334.
- 15. Zhou, Z., et al. (2022). DD-TIG at SemEval-2022 Task 5: Investigating the Relationships Between Multimodal and Unimodal Information in Misogynous Memes Detection and Classification. SemEval-2022.
- 16. Potrimba, M. (2023). Multimodal Computation or Interpretation? Automatic vs. Critical Understanding of Text-Image Relations in Racist Memes in English. ScienceDirect.
- 17. Yuan, Z., et al. (2021). Transformer-Based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis. Proceedings of the 29th ACM International Conference on Multimedia, pp. 4400�4407.
- 18. El-Niss, A., et al. (2024). Multimodal Fusion for Disaster Event Classification on Social Media: A Deep Federated Learning Approach. ResearchGate.
- 19. Sandulescu, V. (2020). Detecting Hateful Memes Using a Multimodal Deep Ensemble. arXiv preprint arXiv:2012.13235.
- 20. Beskow, D. M., et al. (2020). The Evolution of Political Memes: Detecting and Characterizing Internet Memes with Multi-Modal Deep Learning. Information Processing & Management, 57(2), 102170.
- 21. Gillespie, T. (2020). Content Moderation, AI, and the Question of Scale. Journals.sagepub.com.
- 22. Islam, M. R., et al. (2020). Deep Learning for Misinformation Detection on Online Social Networks: A Survey and New Perspectives. Social Network Analysis and Mining.
- 23. Majumder, N., et al. (2018). Multimodal Sentiment Analysis Using Hierarchical Fusion with Context Modeling. Knowledge-Based Systems, 161, 124�133.
- 24. Shang, L., et al. (2021). AOMD: An Analogy-Aware Approach to Offensive Meme Detection on Social Media. Information Processing & Management, 58(5), 102664.
- 25. Sharma, C., et al. (2020). SemEval-2020 Task 8: Memotion Analysis � The Visuo-Lingual Metaphor! Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 759�773.
- 26. Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.
- 27. Devlin, J., et al. (2018). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
- 28. Baltru�aitis, T., et al. (2018). Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423�443.
- 29. Choi, J.-H., & Lee, J.-S. (2019). EmbraceNet: A Robust Deep Learning Architecture for Multimodal Classification. Information Fusion, 51, 259�270.
- 30. Yue, L., et al. (2019). A Survey of Sentiment Analysis in Social Media. Knowledge and Information Systems, 60(2), 617�663.
- 31. Alzu�bi, A., et al. (2021). Masked Face Recognition Using Deep Learning: A Review. Electronics, 10(21), 2666.
- 32. Gandhi, A., et al. (2023). Multimodal Sentiment Analysis: A Systematic Review of History, Datasets, Multimodal Fusion Methods, Applications, Challenges and Future Directions. Information Fusion, 91, 424�444.
- 33. Jadhav, R., & Honmane, P. (2023). Late Fusion Technique for Meme Classification Using EX-OR Method. ResearchGate.
- 34. Kumari, G., et al. (2023). EmoffMeme: A Large-Scale Multimodal Dataset for Hindi. ResearchGate.
- 35. Hossain, E., et al. (2023). A Novel Multimodal Dataset for Bengali, BHM (Bengali Hateful Memes). ResearchGate.
- 36. Kirk, H. R., et al. (2023). Memes in the Wild: Analyzing Real-World Meme Challenges. ResearchGate.
- 37. French, J. H. (2023). Semantic Content Analysis of Memes in Social Media Communications. ResearchGate.
- 38. Prasad, N., & Saha, S. (2023). Multimodal Hate Speech Classifier Using CLIP Embeddings. ResearchGate.
- 39. Blandfort, P., et al. (2023). Analyzing Psychosocial Factors in Gang-Related Tweets Using Multimodal Deep Learning. ResearchGate.
- 40. Simidjievski, N., et al. (2021). Variational Autoencoders for Multimodal Data Fusion in Breast Cancer Analysis. Briefings in Bioinformatics.
- 41. Ronen, G., et al. (2021). Stacked VAE for Colorectal Cancer Survival Subtyping. Briefings in Bioinformatics.
- 42. Albaradei, S., et al. (2021). Convolutional VAE for Pan-Cancer Metastasis Prediction. Briefings in Bioinformatics.
- 43. Liang, X., et al. (2021). AF: An Association-Based Fusion Method for Multi-Modal Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 9236�9254.
- 44. Meena, G., et al. (2024). Identifying Emotions from Facial Expressions Using a Deep Convolutional Neural Network-Based Approach. Multimedia Tools and Applications, 83, 15711�15732.
- 45. Allenspach, S., et al. (2024). Neural Multi-Task Learning in Drug Design. Nature Machine Intelligence, 6, 124�137.
- 46. Zhan, J., et al. (2024). Yolopx: Anchor-Free Multi-Task Learning Network for Panoptic Driving Perception. Pattern Recognition, 148, 110152.
- 47. Zhao, X., et al. (2023). Multimodal Sentiment Analysis Model Based on BERT-VGG16. Journal of Minzu University of China (Natural Science Edition).
- 48. Peng, N., et al. (2023). Research on Chinese Internet Meme Image Discrimination Method Based on Decision-Level Fusion Strategy. Journal of Minzu University of China (Natural Science Edition).