RETRACTED ARTICLE: Recognition of pivotal instances from uneven set boundary during classification

Abstract

Database may contain pivotal records-small chunks of records or instances consist of important information specific to the domain. These chunks of instances may contain crucial information which assists in decision making by assigning labels to pivotal records, unlabeled data instances and improves accuracy of the classification model. Our work suggests the heuristic Rough Set Boundary detection for approximating the boundary set efficiently from the large database to reduce the search space substantially for finding critical records. The use of Rough Set Boundary detection has the advantage of obtaining rough set from the original data set which confines the search space only to the boundary. It uses the concept of pivotal score for each instance in the boundary to isolate the critical records. The method also exploits Feature Selection technique for reduced set of attributes in order to obtain less computational time. The proposed work retrieves the pivotal records from the boundary set and also improves the classification accuracy by increasing true positive and true negative errors. Experiments are carried out for real—world medical data sets with numeric values and various classification algorithms are executed to validate the results. Result shows that the identification of pivotal records from rough boundary set helps for improved classification accuracy using less computational time and which are validated using real-world data sets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Change history

  • 05 April 2019

    The Editor-in-Chief has retracted this article [1], because it shows substantial overlap with a previously published article [2]. Author A. Suresh does not agree with the retraction. Author R. Varatharajan has not responded to correspondence about this retraction.

  • 05 April 2019

    The Editor-in-Chief has retracted this article [1], because it shows substantial overlap with a previously published article [2]. Author A. Suresh does not agree with the retraction. Author R. Varatharajan has not responded to correspondence about this retraction.

References

  1. 1.

    Angiulli F, Basta S, Lodi S, Sartori C (2013) Distributed strategies for mining outliers in large data sets. IEEE Trans Knowl Data Eng 25(7):1520–1532

    Article  Google Scholar 

  2. 2.

    Anitha A, Kannan E, (2014) Isolating critical data points from boundary region with feature selection. IEEE International Conference in Computational Intelligence and Computing Research (ICCIC), 1–4

  3. 3.

    Anitha A, Kannan E (2014) A Constructive Distance-Based Boundary detection approach with numeric variables. Journal of Theoretical & Applied Information Technology 67, (3)

  4. 4.

    Balamurugan SAA, Rajaram R (2009) Effective and efficient feature selection for large-scale data using Bayes’ theorem. Int J Autom Comput 6(1):62–71

    Article  Google Scholar 

  5. 5.

    Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM comput surv 41(3):15

    Article  Google Scholar 

  6. 6.

    Cherkassky V, Muier F (1998) Learning from data: concepts, theory and methods. Wiley, New York

    Google Scholar 

  7. 7.

    Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231

    Google Scholar 

  8. 8.

    Fan J, Zhou S, Siddique MA (2017) Fuzzy color distribution chart-based shot boundary detection. Multimed Tool Appl 76:10169. https://doi.org/10.1007/s11042-016-3604-y 10190

    Article  Google Scholar 

  9. 9.

    Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Disc 16(3):349–364

    MathSciNet  Article  Google Scholar 

  10. 10.

    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  11. 11.

    Hu Q, Yu D, Xie Z (2007) Selecting samples and features for SVM based on neighborhood model. In Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Springer Berlin Heidelberg, 508–517,

    Google Scholar 

  12. 12.

    Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    MathSciNet  Article  Google Scholar 

  13. 13.

    Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240

    Article  Google Scholar 

  14. 14.

    Jiang MF, Tseng SS, Su CM (2001) Two- phase clustering process for outliers detection. Pattern Recogn Lett 22(6):691–700

    Article  Google Scholar 

  15. 15.

    Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Article  Google Scholar 

  16. 16.

    Knox EM, Ng RT (1998) Algorithms for mining distance based outliers in large datasets. Proceedings of the International Conference on Very Large Data Bases VLDB, San Francisco, 392–403

  17. 17.

    Mitchell T (1997) Machine learning. WCB/McGraw-Hill, Boston

    Google Scholar 

  18. 18.

    Novakovic J, Strbac P, Bulatovic D (2011) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav Journal of Operations Research, vol. 21, no. 1, pp. ISSN: 0354-0243, EISSN: 2334–6043

  19. 19.

    Parthalain N, Shen Q, Jensen R A distance measure approach to exploring the rough set boundary region for attribute reduction. IEEE Trans Knowl Data Eng 22(3, 2010):305–317

    Article  Google Scholar 

  20. 20.

    Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356

    Article  Google Scholar 

  21. 21.

    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  22. 22.

    Poulisse GJ, Patsis Y, Moens MF (2014) Unsupervised scene detection and commentator building using multi-modal chains. Multimed Tool Appl 70:159. https://doi.org/10.1007/s11042-012-1086-0 175

    Article  Google Scholar 

  23. 23.

    Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  Google Scholar 

  24. 24.

    Sathiaraj D, Triantaphyllou E (2013) On identifying critical nuggets of information during classification tasks. IEEE Trans Knowl Data Eng 25(6):1354–1367

    Article  Google Scholar 

  25. 25.

    Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14

    Article  Google Scholar 

  26. 26.

    Thivagar ML, Richard C, Paul NR (2012) Mathematical innovations of a modern topology in medical events. Int j inf sci 2(4):33–36

    Google Scholar 

  27. 27.

    Ye M, Li X, Orlowska ME (2009) Projected outlier detection in high -dimensional mixed-attributes data set. Expert Syst Appl 36(3):7104–7113

    Article  Google Scholar 

  28. 28.

    Ye N, Li X, Chen Q, Emran SM, Xu M (2001) Probabilistic techniques for intrusion detection based on computer audit data. IEEE Trans Syst Man cybern Part A Syst Hum 31(4):266–274

    Article  Google Scholar 

  29. 29.

    Yu D, Sheikholeslami G, Zhang A (2002) Findout: finding outliers in very large datasets. Knowl Inf Syst 4(4):387–412

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to A. Suresh.

Additional information

The Editor-in-Chief has retracted this article, because it shows substantial overlap with a previously published article. Author A. Suresh does not agree with the retraction. Author R. Varatharajan has not responded to correspondence about this retraction.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Suresh, A., Varatharajan, R. RETRACTED ARTICLE: Recognition of pivotal instances from uneven set boundary during classification. Multimed Tools Appl 77, 27075–27088 (2018). https://doi.org/10.1007/s11042-018-5905-9

Download citation

Keywords

  • Classification
  • Feature selection
  • Boundary region
  • Rough set
  • Pivotal records