Skip to main content
Log in

Extracting the patterns of truthfulness from political information systems in Serbia

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

In modern information societies, there are information systems that track and log parts of the ongoing political discourse. Due to the sheer volume of the accumulated data, automated tools are required in order to enable citizens to better interpret political statements and promises, as well as evaluate their truthfulness. We propose an approach to use the established machine learning and data mining techniques for analyzing annotated political statements and promises available via the Serbian Truth-o-meter (Istinomer) system in order to extract and interpret the hidden patterns of truthfulness and deceit. We perform standard textual processing and topic extraction and associate topical truthfulness profiles with the promise makers, for pattern discovery and prediction. Prevailing trends in Serbian political discourse emerge as strong association rules where truthfulness is set as the target variable. The evaluated set of standard content-based prediction models exhibit a bias towards the negative outcomes, due to an overall low truthfulness rate in the data. Our results demonstrate that it is possible to use data mining within political information systems for generating insights into the workings of governments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  • Adamic, L., & Glance, N. (2005). The political blogosphere and the 2004 U.S. election: Divided they blog. In In LinkKDD’05: Proceedings of the 3rd international workshop on Link discovery (pp. 36–43).

  • Adamo, J. (2001). Data Mining for Association Rules and Sequential Patterns: Sequential and Parallel Algorithms. Berlin: Springer.

    Book  Google Scholar 

  • Agirre, E., Martínez, D., de Lacalle, O.L., & Soroa, A (2006). Two graph-based algorithms for state-of-the-art WSD. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 585–593).

  • Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. SIGMOD Rec, 22(2), 207–216.

    Article  Google Scholar 

  • AlSumait, L., Barbara, D., & Domeniconi, C. (2008). On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Eighth IEEE International Conference on Data Mining (ICDM) (pp. 3–12).

  • Baccianella, A.E.S., Sebastiani, F., & Sentiwordnet 3.0 (2010). An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta.

  • Balasubramanyan, R., Routledge, B.R., & Smith, N.A. (2010). From tweets to polls : Linking text sentiment to public opinion time series.

  • Cagliero, L., & Fiori, A. (2013). Discovering generalized association rules from twitter. Intelligent Data Analysis, 17(4), 627–648.

    Google Scholar 

  • Campbell, J.E. (2008). Evaluating u.s. presidential election forecasts and forecasting equations. Int. J. Forecast., 24(2), 259–271.

    Article  Google Scholar 

  • Carruba, C., Gabel, M., Murrah, L., Clough, R., Montgomery, E., & Schambach, R. (2006). Off the Record: Unrecorded Legislative Votes, Selection Bias and Roll-Call Vote Analysis. Br. J. Polit. Sci., 36(4), 691–704.

    Article  Google Scholar 

  • Cate, F.H., Dempsey, J.X., & Rubinstein, I.S. (2012). Systematic government access to private-sector data. International Data Privacy Law, 2(4), 195–199. doi:10.1093/idpl/ips027.

    Article  Google Scholar 

  • Cavnar, W.B., & Trenkle, J.M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (pp. 161–175).

  • Charalabidis, Y., & Koussouris, S. (Eds.) (2012). Empowering Open and Collaborative Governance - Technologies and Methods for Online Citizen Engagement in Public Policy Making. Springer

  • Charalabidis, Y., Triantafillou, A., Karkaletsis, V., & Loukis, E. (2012). Public policy formulation through non moderated crowdsourcing in social media, (pp. 156–169): Springer.

  • Cliffe, L., Ramsay, M., & Bartlett, D. (2000). The politics of lying: Implications for democracy: St Martin’s Press.

  • Clinton, J., Jackman, S., & Douglas, R. (2004). The Statistical Analysis of Roll Call Data. Am. Polit. Sci. Rev., 2, 355–370.

    Article  Google Scholar 

  • Custers, H., Calders, T., & Zarsky, T. (2013). Discrimination and Privacy in the Information Society: Data Mining and Profiling in Large Databases. Studies in applied philosophy, epistemology and rational ethics: Springer.

  • Dai, H.J., Chang, Y.C., Tzong-Han Tsai, R., & Hsu, W.L. (2010). New challenges for biological text-mining in the next decade. J. Comput. Sci. Technol., 25(1), 169–179.

    Article  Google Scholar 

  • Damashek, M. (1995). Gauging similarity with n-grams: Language-independent categorization of text. Science, 267(5199), 843–849.

    Article  Google Scholar 

  • Danna, A. (2002). Gandy OscarH., J.: All that glitters is not gold: Digging beneath the surface of data mining. J. Bus. Ethics, 40(4), 373–386.

    Article  Google Scholar 

  • Dörre, J., Gerstl, P., & Seiffert, R. (1999). Text mining: finding nuggets in mountains of textual data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’99 (pp. 398–401). New York: ACM. doi:10.1145/312129.312299.

    Chapter  Google Scholar 

  • Fairclough, I., & Fairclough, N. (2013). Political Discourse Analysis: A Method for Advanced Students: Taylor & Francis.

  • Feldman, R., & Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data: Cambridge University Press.

  • François, D., Wertz, V., & Verleysen, M. (2007). The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7), 873–886.

    Article  Google Scholar 

  • Gamon, M., Basu, S., Belenko, D., Fisher, D., Hurst, M., & Konig, A. C. (2008). BLEWS: Using Blogs to Provide Context for News Articles. In ICWSM, 2008.

  • Greenberg, J. (2010). There’s nothing anyone can do about it: Participation, apathy, and ”successful” democratic transition in postsocialist serbia. Slav. Rev., 69(1), 41–64.

    Article  Google Scholar 

  • Grosskreutz, H., Boley, M., & Krause-Traudes, M. (2010). Subgroup discovery for election analysis: A case study in descriptive data mining. In Discovery Science (pp. 57–71). Berlin Heidelberg: Springer.

    Chapter  Google Scholar 

  • Hamamoto, M., Kitagawa, H., Pan, J.Y., & Faloutsos, C. (2005). A comparative study of feature vector-based topic detection schemes a comparative study of feature vector-based topic detection schemes. In Web Information Retrieval and Integration, 2005. WIRI ’05. Proceedings. International Workshop on Challenges in (pp. 122–127).

  • He, X., & Zhang, J. (2006). Why Do Hubs Tend to Be Essential in Protein Networks PLoS Genet., 2(6).

  • Helbing, D., & Balietti, S. (2011). From social data mining to forecasting socio-economic crises. The European Physical Journal Special Topics, 195(1), 3–68.

    Article  Google Scholar 

  • Hong, T.P., Kuo, C.S., & Chi, S.C. (1999). Mining association rules from quantitative data. Intelligent Data Analysis, 3(5), 363–376.

    Article  Google Scholar 

  • Howard, P.N. (2005). Deep democracy, thin citizenship: The impact of digital media in political campaign strategy. The ANNALS of the American Academy of Political and Social Science, 597(1), 153–170. doi:10.1177/0002716204270139.

    Article  Google Scholar 

  • Jackman, S. (2001). Multidimensional Analysis of Roll Call Data via Bayesian Simulation: Identification, Estimation, Inference, and Model Checking. Polit. Anal., 9(3), 227–241.

    Article  Google Scholar 

  • Jackson, P., & Moulinier, I. (2007). Natural Language Processing for Online Applications: Text retrieval, extraction and categorization. Second revised edition. Natural Language Processing: John Benjamins Publishing Company.

  • Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Inf. Syst. Manag., 29(4), 258–268.

    Article  Google Scholar 

  • Keṡelj, V., Peng, F., Cercone, N., & Thomas, C. (2003). N-gram-based author profiles for authorship attribution. In Proceedings of the conference pacific association for computational linguistics, PACLING, (Vol. 3 pp. 255–264).

  • Klein, D., Smarr, J., Nguyen, H., & Manning, C.D. (2003). Named entity recognition with character-level models. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL, CONLL ’03, Association for Computational Linguistics (pp. 18–183). USA: Stroudsburg. doi:10.3115/1119176.1119204.

    Google Scholar 

  • Liu, B. (2007). Opinion mining. In Web Data Mining, Data-Centric Systems and Applications (pp. 411–447). Berlin Heidelberg: Springer.

    Google Scholar 

  • Loukis, E., & Charalabidis, Y. (2012). Participative public policy making through multiple social media platforms utilization. Int. J. Electron. Gov. Res., 8(3), 78–97. doi:10.4018/jegr.2012070105.

    Article  Google Scholar 

  • Malouf, R., & Mullen, T. (2008). Taking sides: user classification for informal online political discourse. Internet Research, 18(2), 177–190.

    Article  Google Scholar 

  • Maragoudakis, M., Loukis, E., & Charalabidis, Y. (2011). A review of opinion mining methods for analyzing citizensâĂŹ contributions in public policy debate. In Electronic Participation (pp. 298–313). Berlin Heidelberg: Springe.

    Chapter  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop.

  • Milošević, N. (2012). Stemmer for Serbian language: ArXiv e-prints.

  • Miner, G., Elder, J., Hill, T., Delen, D., & Fast, A. (2012). Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications. Academic Press: Academic Press.

  • Mostafa, M.M., & El-Masry, A.A. (2013). Citizens as consumers: Profiling e-government servicesâĂŹ users in egypt via data mining techniques. Int. J. Inf. Manag., 33(4), 627–641. doi:10.1016/j.ijinfomgt.2013.03.007.

    Article  Google Scholar 

  • Murray, G.R., Riley, C., & Scime, A. (2009). Pre-election polling: Identifying likely voters using iterative expert data mining. Public Opinion Quarterly, 73(1), 159–171. doi:10.1093/poq/nfp004.

    Article  Google Scholar 

  • Murray, G.R., & Scime, A. (2010). Microtargeting and electorate segmentation: Data mining the american national election studies. Journal of Political Marketing, 9(3), 143–166. doi:10.1080/15377857.2010.497732.

    Article  Google Scholar 

  • Nanopoulos, A., Radovanović, M., & Ivanović, M. (2009). How does high dimensionality affect collaborative filtering?. In Proceedings of the third ACM conference on Recommender systems, RecSys ’09 (pp. 293–296). USA: ACM.

    Google Scholar 

  • Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Found. Trends Inf. Retr., 2(1-2), 1–135. doi:10.1561/1500000011.

    Article  Google Scholar 

  • Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. In Knowledge Discovery in Databases (pp. 229–248): AAAI Press.

  • PÃtry, F., Collette. (2009) In L.M. Imbeau (Ed.), Measuring how political parties keep their promises: A positive perspective from political science (Vol. 15, pp. 65–80). New York : Springer.

  • Raghavan, V.V., & Wong, S.K.M. (1986). A critical analysis of vector space model for information retrieval. J. Am. Soc. Inf. Sci., 37(5), 79–287. doi:10.1002/(SICI)1097-4571(198609)37:5<279::AID-ASI1>3.0.CO;2-Q.

    Article  Google Scholar 

  • Rana, N., Dwivedi, Y., & Williams, M. (2013). A meta-analysis of existing research on citizen adoption of e-government. Inf. Syst. Front., 1–17.

  • Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Flammini, A., & Menczer, F. (2011). Detecting and tracking political abuse in social media. In Proc. 5th International AAAI Conference on Weblogs and Social Media (ICWSM).

  • Sanches, P., Svee, E.O., Bylund, M., Hirsch, B., & Boman, M. (2013). Knowing your population: Privacy-sensitive mining of massive data Vol. 1: Network and Communication Technologies.

  • Scharl, A., & Weichselbraun, A. (2008). An automated approach to investigating the online media coverage of U.S. presidential elections. Journal of Information Technology and Politics, 5 (1), 121–132. doi:10.1080/19331680802149582.

    Article  Google Scholar 

  • Seo, Y.W., & Sycara, K. (2004). Text clustering for topic detection. Tech. Rep. CMU-RI-TR-04-03. Pittsburgh: Robotics Institute.

    Google Scholar 

  • Stamatatos, E. (2009). Intrinsic plagiarism detection using character n-gram profiles. In 3rd PAN Workshop. Uncovering Plagiarism, Authorship and Social Software Misuse (pp. 38–46).

  • Stieglitz, S., & Dang-Xuan, L. (2012). Social media and political communication: a social media analytics framework. Soc. Netw. Anal. Min., 1–15.

  • Tomašev, N., & Mladenić, D. (2012). Nearest neighbor voting in high dimensional data: Learning from past occurrences. Computer Science and Information Systems, 9, 691–712.

    Article  Google Scholar 

  • Tomašev, N., Radovanović, M., Mladenić, D., & Ivanović, M. (2013). The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng., 99(PrePrints), 1.

    Google Scholar 

  • Tomašev, N., Radovanović, M., Mladenić, D., & Ivanovicć, M. (2011). A probabilistic approach to nearest neighbor classification: Naive hubness bayesian k-nearest neighbor. In Proceeding of the CIKM conference.

  • Uramoto, N., Matsuzawa, H., Nagano, T., Murakami, A., Takeuchi, H., & Takeda, K. (2004). A text-mining system for knowledge discovery from biomedical documents. IBM Syst. J., 43(3), 516–533.

    Article  Google Scholar 

  • Vachudova, M.A. (2009). Corruption and compliance in the EU’s post-communist members and candidates. JCMS: Journal of Common Market Studies, 47, 43–62.

    Google Scholar 

  • Vaidya, J. (2012). Privacy in the context of digital government. In Proceedings of the 13th Annual International Conference on Digital Government Research, dg.o ’12 (pp. 302–303). New York: ACM. doi:10.1145/2307729.2307796.

  • Vitas, D., Krstev, C., Obradović, I., Popović, L., & Pavlović-Lazetić, G. (2003). An overview of resources and basic tools for processing of Serbian written texts.

  • Vlado, K., & Šipka, D. (2008). A suffix subsumption-based approach to building stemmers and lemmatizers for highly inflectional languages with sparse resources. INFOTHECA. Can. J. Inf. Libr. Sci., 9(1), 23–33.

    Google Scholar 

  • Wartena, C., & Brussee, R. (2008). Topic detection by clustering keywords. In 19th International Workshop on Database and Expert Systems Application, 2008. DEXA ’08 (pp. 54– 58).

  • Weber, I., Garimella, V.R.K., & Borra, E. (2012). Political search trends. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’12 (pp. 1012–1012). New York: ACM. doi:10.1145/2348283.2348437.

  • Weerakkody, V., Irani, Z., Lee, H., Osman, I., & Hindi, N. (2013). E-government implementation: A birdâĂŹs eye view of issues relating to costs, opportunities, benefits and risks. Inf. Syst. Front., 1–27.

  • Witten, I.H., & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). USA: Morgan Kaufmann Publishers Inc.

    Google Scholar 

  • Zhong, N., Li, Y., & Wu, S.T. (2012). Effective pattern discovery for text mining. Knowledge and Data Engineering. IEEE Transactions on, 24(1), 30–44.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nenad Tomašev.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tomašev, N. Extracting the patterns of truthfulness from political information systems in Serbia. Inf Syst Front 19, 109–127 (2017). https://doi.org/10.1007/s10796-015-9596-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-015-9596-8

Keywords

Navigation