Cognitive Computation

, Volume 9, Issue 2, pp 259–274 | Cite as

SLT-Based ELM for Big Social Data Analysis

  • Luca Oneto
  • Federica Bisio
  • Erik CambriaEmail author
  • Davide Anguita


Recently, social networks and other forms of media communication have been gathering the interest of both the scientific and the business world, leading to the increasing development of the science of opinion and sentiment analysis. Facing the huge amount of information present on the Web represents a crucial task and leads to the study and creation of efficient models able to tackle the task. To this end, current research proposes an efficient approach to support emotion recognition and polarity detection in natural language text. In this paper, we show how the most recent advances in statistical learning theory (SLT) can support the development of an efficient extreme learning machine (ELM) and the assessment of the resultant model’s performance when applied to big social data analysis. ELM, developed to overcome some issues in back-propagation networks, represents a powerful learning tool. However, the main problem is represented by the necessity to cope with a large number of available samples, and the generalization performance has to be carefully assessed. For this reason, we propose an ELM implementation that exploits the Spark distributed in memory technology and show how to take advantage of SLT results in order to select ELM hyperparameters able to provide the best generalization performance.


Sentiment analysis Big data Extreme learning machines Model selection Spark distributed in memory computing 


Compliance with Ethical Standards

Conflict of Interest

The authors have received no grants. All the authors declare they have no conflict of interest.


  1. 1.
    Agrawal D, Das S, El Abbadi A. Big data and cloud computing: current state and future opportunities. In: International conference on extending database technology; 2011.Google Scholar
  2. 2.
    Akusok A, Bjork KM, Miche Y, Lendasse A. High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Open Access 2015;3:1011–1025.CrossRefGoogle Scholar
  3. 3.
    Anguita D, Ghio A, Oneto L, Ridella S. Maximal discrepancy vs. rademacher complexity for error estimation. In: European symposium on artificial neural networks, computational intelligence and machine learning (ESANN); 2011.Google Scholar
  4. 4.
    Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst. 2012;23(9):1390–1406.CrossRefPubMedGoogle Scholar
  5. 5.
    Anguita D, Ghio A, Oneto L, Ridella S. A learning machine with a bit-based hypothesis space. In: European symposium on artificial neural networks, computational intelligence and machine learning; 2013.Google Scholar
  6. 6.
    Anguita D, Ghio A, Ridella S, Sterpi D. K-fold cross validation for error rate estimate in support vector machines. In: International conference on data mining; 2009.Google Scholar
  7. 7.
    Bartlett PL, Boucheron S, Lugosi G. Model selection and error estimation. Mach Learn. 2002;48(1–3): 85–113.CrossRefGoogle Scholar
  8. 8.
    Bartlett PL, Bousquet O, Mendelson S. Local Rademacher complexities. Ann Stat. 2005;33(4):1497–1537.CrossRefGoogle Scholar
  9. 9.
    Bartlett PL, Mendelson S. Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res. 2003;3:463–482.Google Scholar
  10. 10.
    Bishop CM. Neural networks for pattern recognition. Oxford: Clarendon Press; 1995.Google Scholar
  11. 11.
    Bisio F, Gastaldo P, Zunino R, Cambria E. A learning scheme based on similarity functions for affective common-sense reasoning. In: International joint conference on neural networks; 2015. p. 2476–2481.Google Scholar
  12. 12.
    Bobicev V, Sokolova M, Oakes M. What goes around comes around: learning sentiments in online medical forums. Cogn Comput 2015;7(5):609–621.CrossRefGoogle Scholar
  13. 13.
    Bousquet O, Elisseeff A. Stability and generalization. J Mach Learn Res. 2002;2:499–526.Google Scholar
  14. 14.
    Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.CrossRefGoogle Scholar
  15. 15.
    Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001; 16(3):199– 231.CrossRefGoogle Scholar
  16. 16.
    Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31(2):102–107.CrossRefGoogle Scholar
  17. 17.
    Cambria E, Fu J, Bisio F, Poria S. AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: AAAI. Austin; 2015. p. 508–514.Google Scholar
  18. 18.
    Cambria E, Gastaldo P, Bisio F, Zunino R. An ELM-based model for affective analogical reasoning. Neurocomputing. 2015;149:443–455.CrossRefGoogle Scholar
  19. 19.
    Cambria E, Huang GB, et al. Extreme learning machines. IEEE Intell Syst. 2013;28(6):30–59.CrossRefGoogle Scholar
  20. 20.
    Cambria E, Poria S, Bajpai R, Schuller B. SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: COLING; 2016.Google Scholar
  21. 21.
    Cambria E, Wang H, White B. Guest editorial: big social data analysis. Knowl-Based Syst. 2014;69:1–2.CrossRefGoogle Scholar
  22. 22.
    Cambria E, White B. Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag. 2014;9(2):48–57.CrossRefGoogle Scholar
  23. 23.
    Cao LJ, Keerthi SS, Ong CJ, Zhang JQ, Periyathamby U, Fu XJ, Lee HP. Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans Neural Netw. 2006;17(4):1039–1049.CrossRefPubMedGoogle Scholar
  24. 24.
    Carlyle AG, Harrell SL, Smith PM. Cost-effective hpc: the community or the cloud? In: IEEE international conference on cloud computing technology and science; 2010.Google Scholar
  25. 25.
    Caruana R, Lawrence S, Lee G. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: Neural information processing systems; 2001.Google Scholar
  26. 26.
    Chang CC, Lin CJ. Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2 (3):27.CrossRefGoogle Scholar
  27. 27.
    Cherkassky V. The nature of statistical learning theory. IEEE Trans Neural Netw. 1997;8(6):1564–1564.CrossRefPubMedGoogle Scholar
  28. 28.
    Devroye L, Györfi L., Lugosi G. A probabilistic theory of pattern recognition. Springer; 1996.Google Scholar
  29. 29.
    Dietrich R, Opper M, Sompolinsky H. Statistical mechanics of support vector networks. Phys Rev Lett. 1999;82(14):2975.CrossRefGoogle Scholar
  30. 30.
    Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman & Hall; 1993.Google Scholar
  31. 31.
    Floyd S, Warmuth M. Sample compression, learnability, and the vapnik-chervonenkis dimension. Mach Learn. 1995;21(3):269–304.Google Scholar
  32. 32.
    Furuta H, Kameda T, Fukuda Y, Frangopol DM. Life-cycle cost analysis for infrastructure systems: life cycle cost vs. safety level vs. service life. In: Life-cycle performance of deteriorating structures: assessment, design and management ; 2004.Google Scholar
  33. 33.
    Gangemi A, Presutti V, Reforgiato D. Frame-based detection of opinion holders and topics: a model and a tool. IEEE Comput Intell Mag 2014;9(1):20–30.CrossRefGoogle Scholar
  34. 34.
    Gopalani S, Arora R. Comparing apache spark and map reduce with performance analysis using k-means. Int J Comput Appl. 2015;113(1).Google Scholar
  35. 35.
    He Q, Shang T, Zhuang F, Shi Z. Parallel extreme learning machine for regression based on mapreduce. Neurocomputing. 2013;102:52–58.CrossRefGoogle Scholar
  36. 36.
    Hoeffding W. Probability inequalities for sums of bounded random variables. J Am Stat Assoc. 1963;58(301): 13–30.CrossRefGoogle Scholar
  37. 37.
    Huang G, Cambria E, Toh K, Widrow B, Xu Z. New trends of learning in computational intelligence [guest editorial]. IEEE Comput Intell Mag. 2015;10(2):16–17.CrossRefGoogle Scholar
  38. 38.
    Huang G, Huang GB, Song S, You K. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48.CrossRefPubMedGoogle Scholar
  39. 39.
    Huang GB. An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput. 2014;6(3):376–390.CrossRefGoogle Scholar
  40. 40.
    Huang GB. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cogn Comput. 2015;7(3):263–278.CrossRefGoogle Scholar
  41. 41.
    Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006;17(4):879– 892.CrossRefPubMedGoogle Scholar
  42. 42.
    Huang GB, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern. 2012;42(2):513–529.CrossRefPubMedGoogle Scholar
  43. 43.
    Huang GB, Zhu QY, Siew CK. Extreme learning machine: a new learning scheme of feedforward neural networks. In: IEEE international joint conference on neural networks; 2004.Google Scholar
  44. 44.
    Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing. 2006; 70(1):489–501.CrossRefGoogle Scholar
  45. 45.
    Huang S, Wang B, Qiu J, Yao J, Wang G, Yu G. Parallel ensemble of online sequential extreme learning machine based on mapreduce. In: ELM-2014; 2015.Google Scholar
  46. 46.
    Karau H, Konwinski A, Wendell P, Zaharia M. Learning spark. O’Reilly Media; 2015.Google Scholar
  47. 47.
    Khan FH, Qamar U, Bashir S. Multi-objective model selection (moms)-based semi-supervised framework for sentiment analysis. Cogn Comput. 2016;8(4):614–628.CrossRefGoogle Scholar
  48. 48.
    Kleiner A, Talwalkar A, Sarkar P, Jordan MI. A scalable bootstrap for massive data. J R Stat Soc Ser B (Stat Methodol). 2014;76(4):795–816.CrossRefGoogle Scholar
  49. 49.
    Kohavi R, et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence; 1995.Google Scholar
  50. 50.
    Koltchinskii V. Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory. 2001;47(5): 1902–1914.CrossRefGoogle Scholar
  51. 51.
    Langford J. Tutorial on practical prediction theory for classification. J Mach Learn Res. 2006;6(1):273.Google Scholar
  52. 52.
    Lever G, Laviolette F, Shawe-Taylor J. Tighter PAC-Bayes bounds through distribution-dependent priors. Theor Comput Sci. 2013;473:4–28.CrossRefGoogle Scholar
  53. 53.
    Madden S. From databases to big data. IEEE Internet Comput. 2012;16(3):4–6.CrossRefGoogle Scholar
  54. 54.
    Magdon-Ismail M. No free lunch for noise prediction. Neural Comput. 2000;12(3):547–564.CrossRefPubMedGoogle Scholar
  55. 55.
    Mills S, Lucas S, Irakliotis L, Rappa M, Carlson T, Perlowitz B. DEMYSTIFYING BIG DATA: a practical guide to transforming the business of Government. In: Technical report.; 2012.
  56. 56.
    Ofek N, Poria S, Rokach L, Cambria E, Hussain A, Shabtai A. Unsupervised commonsense knowledge enrichment for domain-specific sentiment analysis. Cogn Comput. 2016;8(3):467–477.CrossRefGoogle Scholar
  57. 57.
    Olukotun K. Beyond parallel programming with domain specific languages. In: Symposium on principles and practice of parallel programming; 2014.Google Scholar
  58. 58.
    Oneto L, Bisio F, Cambria E, Anguita D. Statistical learning theory and ELM for big social data analysis. IEEE Comput Intell Mag. 2016;11(3):45–55.CrossRefGoogle Scholar
  59. 59.
    Oneto L, Ghio A, Ridella S, Anguita D. Fully empirical and data-dependent stability-based bounds. IEEE Trans Cybern. 2015;45(9):1913–1926.CrossRefPubMedGoogle Scholar
  60. 60.
    Oneto L, Ghio A, Ridella S, Anguita D. Global rademacher complexity bounds: From slow to fast convergence rates. Neural Process Lett. (in–press) 2015.Google Scholar
  61. 61.
    Oneto L, Ghio A, Ridella S, Anguita D. Local rademacher complexity: sharper risk bounds with and without unlabeled samples. Neural Netw (in–press). 2015.Google Scholar
  62. 62.
    Oneto L, Pilarz B, Ghio A, D A. Model selection for big data: algorithmic stability and bag of little bootstraps on gpus. In: European symposium on artificial neural networks, computational intelligence and machine learning; 2015.Google Scholar
  63. 63.
    Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Conference on empirical methods on natural language processing; 2015. p. 2539–2544.Google Scholar
  64. 64.
    Poria S, Cambria E, Gelbukh A. Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst. 2016;108:42–49.CrossRefGoogle Scholar
  65. 65.
    Poria S, Cambria E, Gelbukh A, Bisio F, Hussain A. Sentiment data flow analysis by means of dynamic linguistic patterns. IEEE Comput Intell Mag. 2015;10(4):26–36.CrossRefGoogle Scholar
  66. 66.
    Poria S, Chaturvedi I, Cambria E, Bisio F. Sentic LDA: Improving on LDA with semantic similarity for aspect-based sentiment analysis. In: IJCNN; 2016.Google Scholar
  67. 67.
    Poria S, Chaturvedi I, Cambria E, Hussain A. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: ICDM. Barcelona; 2016.Google Scholar
  68. 68.
    Prechelt L. Automatic early stopping using cross validation: quantifying the criteria. Neural Netw. 1998;11(4): 761–767.CrossRefPubMedGoogle Scholar
  69. 69.
    Reforgiato Recupero D, Presutti V, Consoli S, Gangemi A, Nuzzolese AG. Sentilo: frame-based sentiment analysis. Cogn Comput. 2015;7(2):211–225.CrossRefGoogle Scholar
  70. 70.
    Reyes-Ortiz JL, Oneto L, Anguita D. Big data analytics in the cloud: Spark on hadoop vs mpi/openmp on beowulf. Procedia Computer Science 2015.Google Scholar
  71. 71.
    Ridella S, Rovetta S, Zunino R. Circular backpropagation networks for classification. IEEE Trans Neural Netw. 1997;8(1):84–97.CrossRefPubMedGoogle Scholar
  72. 72.
    dos Santos CN, Gatti M. Deep convolutional neural networks for sentiment analysis of short texts. In: International conference on computational linguistics; 2014.Google Scholar
  73. 73.
    Shalev-Shwartz S, Ben-David S. Understanding machine learning: from theory to algorithms. Cambridge University Press; 2014.Google Scholar
  74. 74.
    Shoro AG, Soomro TR. Big data analysis: Apache Spark perspective. Global J Comp Sci Technol. 2015;15 (1).Google Scholar
  75. 75.
    Strapparava C, Valitutti A. WordNet-Affect: an affective extension of WordNet. In: International conference on language resources and evaluation; 2004.Google Scholar
  76. 76.
    Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett. 1999;9(3):293–300.CrossRefGoogle Scholar
  77. 77.
    Tang D, Wei F, Qin B, Liu T, Zhou M. Coooolll: a deep learning system for twitter sentiment classification. In: Proceedings of the 8th international workshop on semantic evaluation; 2014.Google Scholar
  78. 78.
    Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B. Learning sentiment-specific word embedding for twitter sentiment classification. In: Annual meeting of the association for computational linguistics; 2014.Google Scholar
  79. 79.
    Valiant LG. A theory of the learnable. Commun ACM. 1984;27(11):1134–1142.CrossRefGoogle Scholar
  80. 80.
    Vapnik VN. Statistical learning theory. Wiley-Interscience; 1998.Google Scholar
  81. 81.
    Wang CC, Huang CH, Lin CJ. Subsampled hessian newton methods for su-pervised learning. Neural Comput. 2015;27(8):1766–1795.CrossRefPubMedGoogle Scholar
  82. 82.
    White T. Hadoop: the definitive guide. O’Reilly Media, Inc.; 2012.Google Scholar
  83. 83.
    Wolpert DH. The lack of a priori distinctions between learning algorithms. Neural Comput. 1996;8(7):1341–1390.CrossRefGoogle Scholar
  84. 84.
    Wu X, Zhu X, Wu GQ, Ding W. Data mining with big data. IEEE Trans Knowl Data Eng. 2014;26(1):97–107.CrossRefGoogle Scholar
  85. 85.
    Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y. ELM*: distributed extreme learning machine with mapreduce. World Wide Web. 2014;17(5):1189–1204.CrossRefGoogle Scholar
  86. 86.
    Xin RS, Rosen J, Zaharia M, Franklin MJ, Shenker S, Stoica I. Shark: Sql and rich analytics at scale. In: ACM SIGMOD international conference on management of data; 2013.Google Scholar
  87. 87.
    Xu R, Chen T, Xia Y, Lu Q, Liu B, Wang X. Word embedding composition for data imbalances in sentiment and emotion classification. Cogn Comput. 2015;7(2):226–240.CrossRefGoogle Scholar
  88. 88.
    You Y, Song SL, Fu H, Marquez A, Dehnavi MM, Barker K, Cameron KW, Randles AP, Yang G. Mic-svm: designing a highly efficient support vector machine for advanced modern multi-core and many-core architectures. In: IEEE international parallel and distributed processing symposium; 2014.Google Scholar
  89. 89.
    Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX conference on networked systems design and implementation; 2012.Google Scholar
  90. 90.
    Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. In: USENIX conference on hot topics in cloud computing; 2010.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.DIBRISUniversity of GenovaGenovaItaly
  2. 2.aizoOn S.r.l.TorinoItaly
  3. 3.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations