Information Systems Frontiers

, Volume 19, Issue 1, pp 109–127 | Cite as

Extracting the patterns of truthfulness from political information systems in Serbia

Article

Abstract

In modern information societies, there are information systems that track and log parts of the ongoing political discourse. Due to the sheer volume of the accumulated data, automated tools are required in order to enable citizens to better interpret political statements and promises, as well as evaluate their truthfulness. We propose an approach to use the established machine learning and data mining techniques for analyzing annotated political statements and promises available via the Serbian Truth-o-meter (Istinomer) system in order to extract and interpret the hidden patterns of truthfulness and deceit. We perform standard textual processing and topic extraction and associate topical truthfulness profiles with the promise makers, for pattern discovery and prediction. Prevailing trends in Serbian political discourse emerge as strong association rules where truthfulness is set as the target variable. The evaluated set of standard content-based prediction models exhibit a bias towards the negative outcomes, due to an overall low truthfulness rate in the data. Our results demonstrate that it is possible to use data mining within political information systems for generating insights into the workings of governments.

Keywords

Data mining Text mining Information systems Politics Truthfulness Association rules 

References

  1. Adamic, L., & Glance, N. (2005). The political blogosphere and the 2004 U.S. election: Divided they blog. In In LinkKDD’05: Proceedings of the 3rd international workshop on Link discovery (pp. 36–43).Google Scholar
  2. Adamo, J. (2001). Data Mining for Association Rules and Sequential Patterns: Sequential and Parallel Algorithms. Berlin: Springer.CrossRefGoogle Scholar
  3. Agirre, E., Martínez, D., de Lacalle, O.L., & Soroa, A (2006). Two graph-based algorithms for state-of-the-art WSD. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 585–593).Google Scholar
  4. Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. SIGMOD Rec, 22(2), 207–216.CrossRefGoogle Scholar
  5. AlSumait, L., Barbara, D., & Domeniconi, C. (2008). On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Eighth IEEE International Conference on Data Mining (ICDM) (pp. 3–12).Google Scholar
  6. Baccianella, A.E.S., Sebastiani, F., & Sentiwordnet 3.0 (2010). An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta.Google Scholar
  7. Balasubramanyan, R., Routledge, B.R., & Smith, N.A. (2010). From tweets to polls : Linking text sentiment to public opinion time series.Google Scholar
  8. Cagliero, L., & Fiori, A. (2013). Discovering generalized association rules from twitter. Intelligent Data Analysis, 17(4), 627–648.Google Scholar
  9. Campbell, J.E. (2008). Evaluating u.s. presidential election forecasts and forecasting equations. Int. J. Forecast., 24(2), 259–271.CrossRefGoogle Scholar
  10. Carruba, C., Gabel, M., Murrah, L., Clough, R., Montgomery, E., & Schambach, R. (2006). Off the Record: Unrecorded Legislative Votes, Selection Bias and Roll-Call Vote Analysis. Br. J. Polit. Sci., 36(4), 691–704.CrossRefGoogle Scholar
  11. Cate, F.H., Dempsey, J.X., & Rubinstein, I.S. (2012). Systematic government access to private-sector data. International Data Privacy Law, 2(4), 195–199. doi: 10.1093/idpl/ips027.CrossRefGoogle Scholar
  12. Cavnar, W.B., & Trenkle, J.M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (pp. 161–175).Google Scholar
  13. Charalabidis, Y., & Koussouris, S. (Eds.) (2012). Empowering Open and Collaborative Governance - Technologies and Methods for Online Citizen Engagement in Public Policy Making. SpringerGoogle Scholar
  14. Charalabidis, Y., Triantafillou, A., Karkaletsis, V., & Loukis, E. (2012). Public policy formulation through non moderated crowdsourcing in social media, (pp. 156–169): Springer.Google Scholar
  15. Cliffe, L., Ramsay, M., & Bartlett, D. (2000). The politics of lying: Implications for democracy: St Martin’s Press.Google Scholar
  16. Clinton, J., Jackman, S., & Douglas, R. (2004). The Statistical Analysis of Roll Call Data. Am. Polit. Sci. Rev., 2, 355–370.CrossRefGoogle Scholar
  17. Custers, H., Calders, T., & Zarsky, T. (2013). Discrimination and Privacy in the Information Society: Data Mining and Profiling in Large Databases. Studies in applied philosophy, epistemology and rational ethics: Springer.Google Scholar
  18. Dai, H.J., Chang, Y.C., Tzong-Han Tsai, R., & Hsu, W.L. (2010). New challenges for biological text-mining in the next decade. J. Comput. Sci. Technol., 25(1), 169–179.CrossRefGoogle Scholar
  19. Damashek, M. (1995). Gauging similarity with n-grams: Language-independent categorization of text. Science, 267(5199), 843–849.CrossRefGoogle Scholar
  20. Danna, A. (2002). Gandy OscarH., J.: All that glitters is not gold: Digging beneath the surface of data mining. J. Bus. Ethics, 40(4), 373–386.CrossRefGoogle Scholar
  21. Dörre, J., Gerstl, P., & Seiffert, R. (1999). Text mining: finding nuggets in mountains of textual data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’99 (pp. 398–401). New York: ACM. doi: 10.1145/312129.312299.CrossRefGoogle Scholar
  22. Fairclough, I., & Fairclough, N. (2013). Political Discourse Analysis: A Method for Advanced Students: Taylor & Francis.Google Scholar
  23. Feldman, R., & Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data: Cambridge University Press.Google Scholar
  24. François, D., Wertz, V., & Verleysen, M. (2007). The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering, 19(7), 873–886.CrossRefGoogle Scholar
  25. Gamon, M., Basu, S., Belenko, D., Fisher, D., Hurst, M., & Konig, A. C. (2008). BLEWS: Using Blogs to Provide Context for News Articles. In ICWSM, 2008.Google Scholar
  26. Greenberg, J. (2010). There’s nothing anyone can do about it: Participation, apathy, and ”successful” democratic transition in postsocialist serbia. Slav. Rev., 69(1), 41–64.CrossRefGoogle Scholar
  27. Grosskreutz, H., Boley, M., & Krause-Traudes, M. (2010). Subgroup discovery for election analysis: A case study in descriptive data mining. In Discovery Science (pp. 57–71). Berlin Heidelberg: Springer.CrossRefGoogle Scholar
  28. Hamamoto, M., Kitagawa, H., Pan, J.Y., & Faloutsos, C. (2005). A comparative study of feature vector-based topic detection schemes a comparative study of feature vector-based topic detection schemes. In Web Information Retrieval and Integration, 2005. WIRI ’05. Proceedings. International Workshop on Challenges in (pp. 122–127).Google Scholar
  29. He, X., & Zhang, J. (2006). Why Do Hubs Tend to Be Essential in Protein Networks PLoS Genet., 2(6).Google Scholar
  30. Helbing, D., & Balietti, S. (2011). From social data mining to forecasting socio-economic crises. The European Physical Journal Special Topics, 195(1), 3–68.CrossRefGoogle Scholar
  31. Hong, T.P., Kuo, C.S., & Chi, S.C. (1999). Mining association rules from quantitative data. Intelligent Data Analysis, 3(5), 363–376.CrossRefGoogle Scholar
  32. Howard, P.N. (2005). Deep democracy, thin citizenship: The impact of digital media in political campaign strategy. The ANNALS of the American Academy of Political and Social Science, 597(1), 153–170. doi: 10.1177/0002716204270139.CrossRefGoogle Scholar
  33. Jackman, S. (2001). Multidimensional Analysis of Roll Call Data via Bayesian Simulation: Identification, Estimation, Inference, and Model Checking. Polit. Anal., 9(3), 227–241.CrossRefGoogle Scholar
  34. Jackson, P., & Moulinier, I. (2007). Natural Language Processing for Online Applications: Text retrieval, extraction and categorization. Second revised edition. Natural Language Processing: John Benjamins Publishing Company.Google Scholar
  35. Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Inf. Syst. Manag., 29(4), 258–268.CrossRefGoogle Scholar
  36. Keṡelj, V., Peng, F., Cercone, N., & Thomas, C. (2003). N-gram-based author profiles for authorship attribution. In Proceedings of the conference pacific association for computational linguistics, PACLING, (Vol. 3 pp. 255–264).Google Scholar
  37. Klein, D., Smarr, J., Nguyen, H., & Manning, C.D. (2003). Named entity recognition with character-level models. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL, CONLL ’03, Association for Computational Linguistics (pp. 18–183). USA: Stroudsburg. doi: 10.3115/1119176.1119204.Google Scholar
  38. Liu, B. (2007). Opinion mining. In Web Data Mining, Data-Centric Systems and Applications (pp. 411–447). Berlin Heidelberg: Springer.Google Scholar
  39. Loukis, E., & Charalabidis, Y. (2012). Participative public policy making through multiple social media platforms utilization. Int. J. Electron. Gov. Res., 8(3), 78–97. doi: 10.4018/jegr.2012070105.CrossRefGoogle Scholar
  40. Malouf, R., & Mullen, T. (2008). Taking sides: user classification for informal online political discourse. Internet Research, 18(2), 177–190.CrossRefGoogle Scholar
  41. Maragoudakis, M., Loukis, E., & Charalabidis, Y. (2011). A review of opinion mining methods for analyzing citizensâĂŹ contributions in public policy debate. In Electronic Participation (pp. 298–313). Berlin Heidelberg: Springe.CrossRefGoogle Scholar
  42. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop.Google Scholar
  43. Milošević, N. (2012). Stemmer for Serbian language: ArXiv e-prints.Google Scholar
  44. Miner, G., Elder, J., Hill, T., Delen, D., & Fast, A. (2012). Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications. Academic Press: Academic Press.Google Scholar
  45. Mostafa, M.M., & El-Masry, A.A. (2013). Citizens as consumers: Profiling e-government servicesâĂŹ users in egypt via data mining techniques. Int. J. Inf. Manag., 33(4), 627–641. doi: 10.1016/j.ijinfomgt.2013.03.007.CrossRefGoogle Scholar
  46. Murray, G.R., Riley, C., & Scime, A. (2009). Pre-election polling: Identifying likely voters using iterative expert data mining. Public Opinion Quarterly, 73(1), 159–171. doi: 10.1093/poq/nfp004.CrossRefGoogle Scholar
  47. Murray, G.R., & Scime, A. (2010). Microtargeting and electorate segmentation: Data mining the american national election studies. Journal of Political Marketing, 9(3), 143–166. doi: 10.1080/15377857.2010.497732.CrossRefGoogle Scholar
  48. Nanopoulos, A., Radovanović, M., & Ivanović, M. (2009). How does high dimensionality affect collaborative filtering?. In Proceedings of the third ACM conference on Recommender systems, RecSys ’09 (pp. 293–296). USA: ACM.Google Scholar
  49. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Found. Trends Inf. Retr., 2(1-2), 1–135. doi: 10.1561/1500000011.CrossRefGoogle Scholar
  50. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. In Knowledge Discovery in Databases (pp. 229–248): AAAI Press.Google Scholar
  51. PÃtry, F., Collette. (2009) In L.M. Imbeau (Ed.), Measuring how political parties keep their promises: A positive perspective from political science (Vol. 15, pp. 65–80). New York : Springer.Google Scholar
  52. Raghavan, V.V., & Wong, S.K.M. (1986). A critical analysis of vector space model for information retrieval. J. Am. Soc. Inf. Sci., 37(5), 79–287. doi: 10.1002/(SICI)1097-4571(198609)37:5<279::AID-ASI1>3.0.CO;2-Q.CrossRefGoogle Scholar
  53. Rana, N., Dwivedi, Y., & Williams, M. (2013). A meta-analysis of existing research on citizen adoption of e-government. Inf. Syst. Front., 1–17.Google Scholar
  54. Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Flammini, A., & Menczer, F. (2011). Detecting and tracking political abuse in social media. In Proc. 5th International AAAI Conference on Weblogs and Social Media (ICWSM).Google Scholar
  55. Sanches, P., Svee, E.O., Bylund, M., Hirsch, B., & Boman, M. (2013). Knowing your population: Privacy-sensitive mining of massive data Vol. 1: Network and Communication Technologies.Google Scholar
  56. Scharl, A., & Weichselbraun, A. (2008). An automated approach to investigating the online media coverage of U.S. presidential elections. Journal of Information Technology and Politics, 5 (1), 121–132. doi: 10.1080/19331680802149582.CrossRefGoogle Scholar
  57. Seo, Y.W., & Sycara, K. (2004). Text clustering for topic detection. Tech. Rep. CMU-RI-TR-04-03. Pittsburgh: Robotics Institute.Google Scholar
  58. Stamatatos, E. (2009). Intrinsic plagiarism detection using character n-gram profiles. In 3rd PAN Workshop. Uncovering Plagiarism, Authorship and Social Software Misuse (pp. 38–46).Google Scholar
  59. Stieglitz, S., & Dang-Xuan, L. (2012). Social media and political communication: a social media analytics framework. Soc. Netw. Anal. Min., 1–15.Google Scholar
  60. Tomašev, N., & Mladenić, D. (2012). Nearest neighbor voting in high dimensional data: Learning from past occurrences. Computer Science and Information Systems, 9, 691–712.CrossRefGoogle Scholar
  61. Tomašev, N., Radovanović, M., Mladenić, D., & Ivanović, M. (2013). The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng., 99(PrePrints), 1.Google Scholar
  62. Tomašev, N., Radovanović, M., Mladenić, D., & Ivanovicć, M. (2011). A probabilistic approach to nearest neighbor classification: Naive hubness bayesian k-nearest neighbor. In Proceeding of the CIKM conference.Google Scholar
  63. Uramoto, N., Matsuzawa, H., Nagano, T., Murakami, A., Takeuchi, H., & Takeda, K. (2004). A text-mining system for knowledge discovery from biomedical documents. IBM Syst. J., 43(3), 516–533.CrossRefGoogle Scholar
  64. Vachudova, M.A. (2009). Corruption and compliance in the EU’s post-communist members and candidates. JCMS: Journal of Common Market Studies, 47, 43–62.Google Scholar
  65. Vaidya, J. (2012). Privacy in the context of digital government. In Proceedings of the 13th Annual International Conference on Digital Government Research, dg.o ’12 (pp. 302–303). New York: ACM. doi: 10.1145/2307729.2307796.
  66. Vitas, D., Krstev, C., Obradović, I., Popović, L., & Pavlović-Lazetić, G. (2003). An overview of resources and basic tools for processing of Serbian written texts.Google Scholar
  67. Vlado, K., & Šipka, D. (2008). A suffix subsumption-based approach to building stemmers and lemmatizers for highly inflectional languages with sparse resources. INFOTHECA. Can. J. Inf. Libr. Sci., 9(1), 23–33.Google Scholar
  68. Wartena, C., & Brussee, R. (2008). Topic detection by clustering keywords. In 19th International Workshop on Database and Expert Systems Application, 2008. DEXA ’08 (pp. 54– 58).Google Scholar
  69. Weber, I., Garimella, V.R.K., & Borra, E. (2012). Political search trends. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’12 (pp. 1012–1012). New York: ACM. doi: 10.1145/2348283.2348437.
  70. Weerakkody, V., Irani, Z., Lee, H., Osman, I., & Hindi, N. (2013). E-government implementation: A birdâĂŹs eye view of issues relating to costs, opportunities, benefits and risks. Inf. Syst. Front., 1–27.Google Scholar
  71. Witten, I.H., & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). USA: Morgan Kaufmann Publishers Inc.Google Scholar
  72. Zhong, N., Li, Y., & Wu, S.T. (2012). Effective pattern discovery for text mining. Knowledge and Data Engineering. IEEE Transactions on, 24(1), 30–44.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Artificial Intelligence LaboratoryJožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations