Recognition of pivotal instances from uneven set boundary during classification

Article
  • 13 Downloads

Abstract

Database may contain pivotal records-small chunks of records or instances consist of important information specific to the domain. These chunks of instances may contain crucial information which assists in decision making by assigning labels to pivotal records, unlabeled data instances and improves accuracy of the classification model. Our work suggests the heuristic Rough Set Boundary detection for approximating the boundary set efficiently from the large database to reduce the search space substantially for finding critical records. The use of Rough Set Boundary detection has the advantage of obtaining rough set from the original data set which confines the search space only to the boundary. It uses the concept of pivotal score for each instance in the boundary to isolate the critical records. The method also exploits Feature Selection technique for reduced set of attributes in order to obtain less computational time. The proposed work retrieves the pivotal records from the boundary set and also improves the classification accuracy by increasing true positive and true negative errors. Experiments are carried out for real—world medical data sets with numeric values and various classification algorithms are executed to validate the results. Result shows that the identification of pivotal records from rough boundary set helps for improved classification accuracy using less computational time and which are validated using real-world data sets.

Keywords

Classification Feature selection Boundary region Rough set Pivotal records 

References

  1. 1.
    Angiulli F, Basta S, Lodi S, Sartori C (2013) Distributed strategies for mining outliers in large data sets. IEEE Trans Knowl Data Eng 25(7):1520–1532CrossRefGoogle Scholar
  2. 2.
    Anitha A, Kannan E, (2014) Isolating critical data points from boundary region with feature selection. IEEE International Conference in Computational Intelligence and Computing Research (ICCIC), 1–4Google Scholar
  3. 3.
    Anitha A, Kannan E (2014) A Constructive Distance-Based Boundary detection approach with numeric variables. Journal of Theoretical & Applied Information Technology 67, (3)Google Scholar
  4. 4.
    Balamurugan SAA, Rajaram R (2009) Effective and efficient feature selection for large-scale data using Bayes’ theorem. Int J Autom Comput 6(1):62–71CrossRefGoogle Scholar
  5. 5.
    Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM comput surv 41(3):15CrossRefGoogle Scholar
  6. 6.
    Cherkassky V, Muier F (1998) Learning from data: concepts, theory and methods. Wiley, New YorkGoogle Scholar
  7. 7.
    Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231Google Scholar
  8. 8.
    Fan J, Zhou S, Siddique MA (2017) Fuzzy color distribution chart-based shot boundary detection. Multimed Tool Appl 76:10169.  https://doi.org/10.1007/s11042-016-3604-y 10190CrossRefGoogle Scholar
  9. 9.
    Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Disc 16(3):349–364MathSciNetCrossRefGoogle Scholar
  10. 10.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHGoogle Scholar
  11. 11.
    Hu Q, Yu D, Xie Z (2007) Selecting samples and features for SVM based on neighborhood model. In Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Springer Berlin Heidelberg, 508–517,Google Scholar
  12. 12.
    Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240CrossRefGoogle Scholar
  14. 14.
    Jiang MF, Tseng SS, Su CM (2001) Two- phase clustering process for outliers detection. Pattern Recogn Lett 22(6):691–700CrossRefMATHGoogle Scholar
  15. 15.
    Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253CrossRefGoogle Scholar
  16. 16.
    Knox EM, Ng RT (1998) Algorithms for mining distance based outliers in large datasets. Proceedings of the International Conference on Very Large Data Bases VLDB, San Francisco, 392–403Google Scholar
  17. 17.
    Mitchell T (1997) Machine learning. WCB/McGraw-Hill, BostonMATHGoogle Scholar
  18. 18.
    Novakovic J, Strbac P, Bulatovic D (2011) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav Journal of Operations Research, vol. 21, no. 1, pp. ISSN: 0354-0243, EISSN: 2334–6043Google Scholar
  19. 19.
    Parthalain N, Shen Q, Jensen R A distance measure approach to exploring the rough set boundary region for attribute reduction. IEEE Trans Knowl Data Eng 22(3, 2010):305–317Google Scholar
  20. 20.
    Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356CrossRefMATHGoogle Scholar
  21. 21.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
  22. 22.
    Poulisse GJ, Patsis Y, Moens MF (2014) Unsupervised scene detection and commentator building using multi-modal chains. Multimed Tool Appl 70:159.  https://doi.org/10.1007/s11042-012-1086-0 175CrossRefGoogle Scholar
  23. 23.
    Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517CrossRefGoogle Scholar
  24. 24.
    Sathiaraj D, Triantaphyllou E (2013) On identifying critical nuggets of information during classification tasks. IEEE Trans Knowl Data Eng 25(6):1354–1367CrossRefGoogle Scholar
  25. 25.
    Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14CrossRefGoogle Scholar
  26. 26.
    Thivagar ML, Richard C, Paul NR (2012) Mathematical innovations of a modern topology in medical events. Int j inf sci 2(4):33–36Google Scholar
  27. 27.
    Ye M, Li X, Orlowska ME (2009) Projected outlier detection in high -dimensional mixed-attributes data set. Expert Syst Appl 36(3):7104–7113CrossRefGoogle Scholar
  28. 28.
    Ye N, Li X, Chen Q, Emran SM, Xu M (2001) Probabilistic techniques for intrusion detection based on computer audit data. IEEE Trans Syst Man cybern Part A Syst Hum 31(4):266–274CrossRefGoogle Scholar
  29. 29.
    Yu D, Sheikholeslami G, Zhang A (2002) Findout: finding outliers in very large datasets. Knowl Inf Syst 4(4):387–412CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNehru Institute of Engineering and TechnologyCoimbatoreIndia
  2. 2.Department of Electronics and Communication EngineeringSri Lakshmi Ammaal Engineering CollegeChennaiIndia

Personalised recommendations