Real-Time Kafka-Based Topic Modeling and Identification of Tweets written by Policy Cloud partners George Manias, Argyro Mavrogiorgou, Athanasios Kiourtis, Dimitris Kakomitas and Dimosthenis Kyriazis, has been published examining the topic modeling and indentification of tweets, specific activity in the Policy Cloud project.
The tremendous growth, popularity, and usage of social media in modern societies has led to the production of an enormous real-time volume of social texts and posts, including Tweets that are being produced by users. These collections of social data can be potentially useful, but the extent of meaningful data in these collections is still of high research and business interest. One of the main elements in several application domains, such as policy making, addresses the scope of identifying and categorizing these texts into natural groups based on the topics to which they refer to, in order to better understand and correlate them. The latter is recently realized through the utilization of Topic Modeling and Identification tasks, for identifying and extracting subjective information and topics from raw texts with the ultimate objective to enhance the categorization of them. This paper introduces an end-to-end pipeline that primarily focuses on the phases of the collection, text preprocessing, as well as utilization of Natural Language Processing and Topic Modeling models, which are considered to be of major importance for the successful Topic Modeling and Identification of Tweets and the final interpretation of them.