Newspaper Selection Analysis Technique

  • Gourab Das
  • S. K. Setua
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 710)


Print agencies are fighting for their existence in current data-driven and digital era. Everyday they are coming up with some new approaches to attract the current generation. Going with the flow, they are now seeking the help of the data scientist to innovate new ideas by analyzing the future business. Standing on this approach, this paper predicts the reading habits of the common people. To create a good analogy on the dataset, we have segregated our thoughts into data preprocessing and machine learning. Training a machine learning model using raw data alone can never produce good solution in most of the cases. Efficient preprocessing techniques need to be embedded in order to have better result. It is utmost important to note that not all the machine learning models are quite useful. To get better accuracy in this classification problem, we have trained the dataset using ensemble classifier like gradient boosting and extreme gradient boosting. After training both the classifiers with train dataset, we have predicted the accuracy on unseen test dataset. Main aim of this paper is to show that these machine learning models generalize the test dataset quite well and do not overfit on the train dataset.


Data analysis Newspaper Preprocessing Machine learning Gradient boosting Extreme gradient boosting 


  1. 1.
    Friedman, J.H.: Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics. 2001; 29:1189–1232Google Scholar
  2. 2.
    Chen, T., Guestrin, C.: XGBoost: A scalable Tree Boosting System. arXiv preprint arXiv:1603.02754v3, 2016
  3. 3.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay., E.: Scikit-learn: Machine learning in Python. JMLR, 12:2825–2830, 2011Google Scholar
  4. 4.
    Scikit Learn Framework
  5. 5.
    Meinshausen, N., Buhlmann, P.: Stability selection. Journal of the Royal Statistical Society Series B, 72 (2010), 417–473Google Scholar
  6. 6.
    Wang, S., Nan, B., Rosset, S., Zhu, J.: Random Lasso. arXiv preprint arXiv:1104.3398v1, 2011
  7. 7.
    Tenenbaum, J.B., de Silva, V., Langford, J.C. (2000): A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319–2323Google Scholar
  8. 8.
    Saul, L. K., Roweis, S. T. (2000): An introduction to locally linear embedding. Science, 290, 2323–2326Google Scholar
  9. 9.
    Van Dar Maaten, L., Hinton, G.: Visualizing Data Using t-SNE. JMLR, 1 (2008) 1–48Google Scholar
  10. 10.

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.University of CalcuttaKolkataIndia

Personalised recommendations