Abstract
Data collection related to the flow pattern has always been associated with outliers due to various reasons. Outlier detection in flow pattern experiments is of high importance and results in a better and more accurate understanding of the flow pattern. In this study, six data mining methods have been used to identify the outliers in flow pattern experiments. The discussed methods include box plot, histograms, linear regression, k-nearest neighbors, local outlier factor, k-medoids clustering, multilayer perceptron, and self-organizing map. The main aim of this study is to detect the outliers in data collection in order to conduct flow pattern experiments using the data mining methods. These methods have been analyzed and compared with each other in a case study and their performance evaluated. The experimental outliers under investigation were emanated from flow pattern experiments around a spur dike located in a 90° bend using Vectrino velocimeter (ADV). The range of velocity measurement of this device is between ± 0.01 and ± 4 m/s, and measurement accuracy is 1 mm/s. Also, the frequency is set at 50 Hz. The comparisons of different outlier detection methods results demonstrated that the box plot and the local outlier factor methods have the best performance.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig2_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig3_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig11_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig13_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40996-018-0131-2/MediaObjects/40996_2018_131_Fig15_HTML.png)
Similar content being viewed by others
References
Aggarwal CC, Procopiuc CM, Wolf JL, Yu PS (1999) Fast algorithms for projected clustering. In: Proceeding of international conference on management of data, Philadelphia
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceeding of international conference on management of data, Seattle
Alarcon-Aquino V, Garcia-Baleon HA, Ramirez-Cortes JM, Gomez-Gil P, Starostenko O (2011) Biometric cryptosystem based on keystroke dynamics and k-Medoids. IETE J Res 57:385–394. https://doi.org/10.4103/0377-2063.86341
Alih E, Ong HC (2015) Cluster-based multivariate outlier identification and re-weighted regression in linear models. J Appl Stat 42:938–955. https://doi.org/10.1080/02664763.2014.993366
Amiri M, Amnieh HB, Hasanipanah M, Khanli LM (2016) A new combination of artificial neural network and k-nearest neighbors models to predict blast-induced ground vibration and air-overpressure. Eng Comput. https://doi.org/10.1007/s00366-016-0442-5
Azari T, Samani N, Mansoori E (2015) An artificial neural network model for the determination of leaky confined aquifer parameters: an accurate alternative to type curve matching methods. Iran J Sci Technol 39:463–472
Bishop C (1995) Neural networks for pattern recognition. Oxford University, New York
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM Sigmod international conference on management of data, vol 29. ACM, New York, NY, USA, pp 93–104
Burago D, Burago YD, Ivanov S (2001) A course in metric geometry. American Mathematical Society, Rhode Island
Cea L, Puertas J, Pena L (2007) Velocity measurements on highly turbulent free surface flow using ADV. Exp Fluids 42:333–348. https://doi.org/10.1007/s00348-006-0237-3
Corona F, Mulas M, Baratti R, Romagnoli JA (2010) On the topological modeling and analysis of industrial process data using the SOM. Comput Chem Eng 34:2022–2032. https://doi.org/10.1016/j.compchemeng.2010.07.002
De la Hoz E, De La Hoz E, Ortiz A, Ortega J, Prieto B (2015) PCA filtering and probabilistic SOM for network intrusion detection. Neurocomp 164:71–81. https://doi.org/10.1016/j.neucom.2014.09.083
Deza E, Deza MM (2009) Encyclopedia of distances. Springer, New York
Dhhan W, Rana S, Midi H (2015) Non-sparse ∈-insensitive support vector regression for outlier detection. J Appl Stat 42:1723–1739. https://doi.org/10.1080/02664763.2015.1005064
Durgesh V, Thomson J, Richmond MC, Polagye BL (2014) Noise correction of turbulent spectra obtained from acoustic doppler velocimeters. Flow Meas Instrum 37:29–41. https://doi.org/10.1016/j.flowmeasinst.2014.03.001
Eskin E (2000) Anomaly detection over noisy data using learned probability distributions. In: Proceeding of 7th international conference on machine learning, Stanford
Fustes D, Dafonte C, Arcay B, Manteiga M, Smith K, Vallenari A, Luri X (2013) SOM ensemble for unsupervised outlier analysis. Application to outlier identification in the Gaia astronomical survey. Expert Syst Appl 40:1530–1541. https://doi.org/10.1016/j.eswa.2012.08.069
Ghodsian M, Vaghefi M (2009) Experimental study on scour and flow field in a scour hole around a T-shaped spur dike in a 90 degree bend. J Sediment Res 24:145–158. https://doi.org/10.1016/S1001-6279(09)60022-6
Giraudel JL, Lek S (2001) A comparison of self-organizing map algorithm and some conventional statistical methods for ecological community ordination. Ecol Model 146:329–339. https://doi.org/10.1016/S0304-3800(01)00324-6
Goring DG, Nikora VI (2002) Despiking acoustic doppler velocimeter data. J Hydraul Eng 128:117–126. https://doi.org/10.1061/(ASCE)0733-9429(2002)128:1(117)
Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco
Hawkins D (1980) Identification of outliers. Chapman and Hall, London
Heidari E, Sobati MA, Movahedirad S (2016) Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN). Chemom Intell Lab 155:73–85. https://doi.org/10.1016/j.chemolab.2016.03.031
Hejazi K, Falconer RA, Seifi E (2016) Denoising and despiking ADV velocity and salinity concentration data in turbulent stratified flows. Flow Meas Instrum 52:83–91. https://doi.org/10.1016/j.flowmeasinst.2016.09.010
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257. https://doi.org/10.1016/0893-6080(91)90009-T
Islam MR, Zhu DZ (2013) Kernel density–based algorithm for despiking ADV data. J Hydraul Eng 139:785–793. https://doi.org/10.1061/(ASCE)HY.1943-7900.0000734
Kang H (2013) Flow characteristics and morphological changes in open-channel flows with alternate vegetation zones. KSCE J Civ Eng 17:1157–1165. https://doi.org/10.1007/s12205-013-0346-5
Kaufman L, Rousseeuw PJ (1987) Clustering by means of medoids, in statistical data analysis based on the L1-norm and related methods. North-Holland, New York
Khorsandi B, Mydlarski L, Gaskin S (2012) Noise in turbulence measurements using acoustic Doppler velocimetry. J Hydraul Eng 138:829–838. https://doi.org/10.1061/(ASCE)HY.1943-7900.0000589
Krause EF (1986) Taxicab geometry: an adventure in non-Euclidean geometry. Courier Dover, New York
Liu X, Wanga X, Pedryczc W (2015) Fuzzy clustering with semantic interpretation. J Appl Soft Com 26:21–30. https://doi.org/10.1016/j.asoc.2014.09.037
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceeding of 5th Berkeley symposium on mathematical statistics and probability, Berkeley
Mahmoodi K, Rostami H, Saybany M, Moradi A (2013) An overview of the science of data mining and its applications in the offshore industry. In: Proceeding of 5th national offshore industries conference, Tehran
Mahmoodi K, Vaghefi M, Moradi A, Sayehbany M (2013) Identifying the errors in the data collection related to the flow and score pattern using the local outlier factor. In: Proceeding of 5th national offshore industries conference, Tehran
Nikora VI, Goring DG (2000) Flow turbulence over fixed weakly mobile gravel beds. J Hydraul Eng 126:679–690. https://doi.org/10.1061/(ASCE)0733-9429(2000)126:9(679)
Nortek AS (2004) Nortek Vectrino velocimeter user guide. Nortek, Norway
Olawoyin R, Nieto A, Larry Grayson R, Hardisty F, Oyewole S (2013) Application of artificial neural network (ANN)—self-organizing map (SOM) for the categorization of water, soil and sediment quality in petrochemical regions. Expert Syst Appl 40:3634–3648. https://doi.org/10.1016/j.eswa.2012.12.069
Papadopoulos A (2014) Metric spaces, convexity and nonpositive curvature. European Mathematical Society, Strasbourg
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42:203–231. https://doi.org/10.1023/A:1007601015854
Ramaswamy S, Rastogi R, Kyuseok S (2002) Efficient algorithms for mining outliers from large data sets. In: Proceeding international conference on management of data, Madison
Rashedi E, Mirzaei A, Rahmati M (2015) An information theoretic approach to hierarchical clustering combination. J Neurocomput 148:487–497. https://doi.org/10.1016/j.neucom.2014.07.014
Rehman MZ, Li T, Yang Y, Wang H (2014) Hyper-ellipsoidal clustering technique for evolving data stream. J Knowl Based Syst 70:3–14. https://doi.org/10.1016/j.knosys.2013.11.022
Shamim MA, Hassan M, Ahmad S, Zeeshan M (2015) A comparison of artificial neural networks (ANN) and local linear regression (LLR) techniques for predicting monthly reservoir levels. KSCE J Civ Eng. https://doi.org/10.1007/s12205-015-0298-z
Solberg HE, Lahti A (2005) Detection of outliers in reference distributions: performance of Horn’s algorithm. Clin Chem 51:2326–2332
Srimani PK, Koti MS (2012) Outliers mining in medical databases by using statistical methods. Int J Eng Sci Technol 4:239–246
Sulaiman MS, Sinnakaudan SK, Shukor MR (2013) Near bed turbulence measurement with acoustic doppler velocimeter (ADV). KSCE J Civ Eng 17:1515–1528. https://doi.org/10.1007/s12205-013-0084-8
Theodoridis S, Koutroumbas K (2006) Pattern recognition. Academic Press, Inc., Orlando
Vaghefi M, Ghodsian M, Salehi Neyshabori SAA (2009) Experimental study on the effect of a T-shaped spur dike length on scour in a 90 degree channel bend. Arab J Sci Eng 34:337–348
Vaghefi M, Ghodsian M, Adib A (2010) Review of errors in data recovery laboratory. In: Proceeding of 9th Iranian hydraulic conference, Tehran
Vaghefi M, Ghodsian M, Salehi Neyshabori SAA (2012) Experimental study on scour around a T-shaped spur dike in a channel bend. J Hydraul Eng 138:471–474. https://doi.org/10.1061/(ASCE)HY.1943-7900.0000536
Vaghefi M, Akbari M, Fiouz AR (2015a) An experimental study of mean and turbulent flow in a 180 degree sharp open channel bend: secondary flow and bed shear stress. KSCE J Civ Eng. https://doi.org/10.1007/s12205-015-1560-0
Vaghefi M, Safarpoor Y, Akbari M (2015b) Numerical investigation of flow pattern and components of three-dimensionalv around a submerged T-shaped spur dike in a 90 degree bend. J Cent South Univ 0: 1–15
Wang Y, Zhang M, Wilson PA, Liu X (2015) Adaptive neural network-based backstepping fault tolerant control for underwater vehicles with thruster fault. Ocean Eng 110:15–24. https://doi.org/10.1016/j.oceaneng.2015.09.035
Wu ST, Chow TWS (2004) Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density. Pattern Recogn 37:175–188. https://doi.org/10.1016/S0031-3203(03)00237-1
Xiekang W, Xingnian L (2016) Experimental investigation of flow structures and bed deformation with small width-to-depth ratio in a bend flume. KSCE J Civ Eng 20:497–508. https://doi.org/10.1007/s12205-015-0654-z
Yafei H (2015) Discussion on the development of algorithm for despiking ADV data. Int J Sci Res 4:1018–1020
Yan X (2011) Multivariate outlier detection based on self-organizing map and adaptive nonlinear map and its application. Chemom Intell Lab 107:251–257. https://doi.org/10.1016/j.chemolab.2011.04.007
Yang B, Zhang Q, Zhou Z (2015) Solving truss topological optimization via swarm intelligence. KSCE J Civ Eng. https://doi.org/10.1007/s12205-015-0501-2
Zhang J (2008) Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy. Dissertation, Dalhousie University
Zhang T, Chen L, Ma F (2014) A modified rough c-means clustering algorithm based on hybrid imbalanced measure of distance and density. Intl J Approx Reason 55:1805–1818. https://doi.org/10.1016/j.ijar.2014.05.004
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vaghefi, M., Mahmoodi, K. & Akbari, M. Detection of Outlier in 3D Flow Velocity Collection in an Open-Channel Bend Using Various Data Mining Techniques. Iran J Sci Technol Trans Civ Eng 43, 197–214 (2019). https://doi.org/10.1007/s40996-018-0131-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40996-018-0131-2