Network-based Models and Algorithms in Data Mining and Knowledge Discovery

Boginski, Vladimir; Pardalos, Panos M.; Vazacopoulos, Alkis

doi:10.1007/0-387-23830-1_5

Network-based Models and Algorithms in Data Mining and Knowledge Discovery

Vladimir Boginski³,
Panos M. Pardalos³ &
Alkis Vazacopoulos⁴

Chapter

1674 Accesses

5 Concluding Remarks

In this chapter, we have addressed several issues regarding the use of network-based mathematical programming techniques for solving various problems arising in the broad area of data mining. We have pointed out that applying these approaches often proved to be effective in many applications, including biomedicine, finance, telecommunications, etc. In particular, if a real-world massive dataset can be appropriately represented as a network structure, its analysis using standard graph-theoretical techniques often yields important practical results.

However, one should clearly understand that the success or failure of applying a certain methodology essentially depends on the structure of the considered dataset, and there is no “universal recipe” that would allow one to obtain useful information from any type of data. This indicates that despite the availability of a great variety of data mining techniques and software packages, choosing an appropriate method of the analysis of a certain dataset is a non-trivial task.

Moreover, as technological progress continues, new types of datasets may emerge in different practical fields, which would lead to further research in the field of data mining algorithms. Therefore, developing and modifying mathematical programming approaches in data mining is an exciting and challenging research area for years to come.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Abello, P.M. Pardalos, and M.G.C. Resende, 1999. On maximum clique problems in very large graphs, DIM ACS Series, 50, American Mathematical Society, 119–130.
MATH MathSciNet Google Scholar
J. Abello, P.M. Pardalos, and M.G.C. Resende (eds.), 2002. Handbook of Massive Data Sets, Kluwer Academic Publishers.
Google Scholar
J. Abello and J. S. Vitter (eds.). External Memory Algorithms. Vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society, 1999.
Google Scholar
L. Adamic and B. Huberman. Power-law distribution of the World Wide Web. Science, 287: 2115a, 2000.
Article Google Scholar
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghvan, 1998. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, in Proceedings of ACM SIGMOD International Conference on Management of Data, ACM, New York, 94105.
Google Scholar
W. Aiello, F. Chung, and L. Lu, 2001. A random graph model for power law graphs, Experimental Math. 10, 53–66.
MATH MathSciNet Google Scholar
R. Albert and A.-L. Barabasi, 2002. Statistical mechanics of complex networks, Reviews of Modern Physics 74, 47–97.
Article MathSciNet Google Scholar
M.R. Anderberg, 1973. Cluster Analysis for Applications, Academic Press, New York.
MATH Google Scholar
S. Arora and S. Safra, 1992. Approximating clique is NP-complete, Proceedings of the 33rd IEEE Symposium on Foundations on Computer Science, 2–13.
Google Scholar
A.-L. Barabasi and R. Albert, 1999. Emergence of scaling in random networks. Science 286: 509–511.
Article MathSciNet Google Scholar
A.-L. Barabasi, 2002. Linked, Perseus Publishing.
Google Scholar
K.P. Bennett and O.L. Mangasarian, 1992. Neural Network Training via Linear Programming, in Advances in Optimization and Parallel Computing, P.M. Pardalos, (ed.), North Holland, Amsterdam, 5667.
Google Scholar
C. Berge, 1976. Graphs and Hypergraphs. North-Holland Mathematical Library, 6.
Google Scholar
P. Berkhin, 2002. Survey of Clustering Data Mining Techniques. Technical Report, Accrue Software, San Jose, CA.
Google Scholar
D. Bertsimas and R. Shioda, 2002. Classification and Regression via Integer Optimization. http://pages.stern.nyu.edu/rcaldent/seminar02/Romy.pdf
Google Scholar
V. Boginski, S. Butenko, and P.M. Pardalos, 2003. Modeling and Optimization in Massive Graphs. In: P. M. Pardalos and H. Wolkowicz, editors. Novel Approaches to Hard Discrete Optimization, American Mathematical Society, 17–39.
Google Scholar
V. Boginski, S. Butenko, and P.M. Pardalos, 2003. On Structural Properties of the Market Graph. In: A. Nagurney (editor), Innovations in Financial and Economic Networks, Edward Elgar Publishers, 28–45.
Google Scholar
I.M. Bomze, M. Budinich, P.M. Pardalos, and M. Pelillo, 1999. The maximum clique problem. In: D.-Z. Du and P.M. Pardalos, editors, Handbook of Combinatorial Optimization, Kluwer Academic Publishers, 1–74.
Google Scholar
P.S. Bradley, U.M. Fayyad, and O.L. Mangasarian, 1999. Mathematical Programming for Data Mining: Formulations and Challenges. INFORMS Journal on Computing, 11(3), 217–238.
MATH MathSciNet Google Scholar
P.S. Bradley, O.L. Mangasarian, and W.N. Street, 1998. Feature Selection via Mathematical Programming, INFORMS Journal on Computing 10, 209217.
MathSciNet Google Scholar
S. Brin and L. Page, 1998. The anatomy of a large scale hypertextual web search engine. Proc. 7th WWW.
Google Scholar
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, 2000. Graph structure in the Web. Computer Networks, 33: 309–320.
Article Google Scholar
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tompkins, and J. Wiener, 2000. The Bow-Tie Web. Proceedings of the 9th International World Wide Web Conference.
Google Scholar
V. Bugera, H. Konno, and S. Uryasev, 2002. Credit Cards Scoring with Quadratic Utility Functions, Journal of Multi-Criteria Decision Analysis, 11(4–5), 197–211.
Article MATH Google Scholar
V. Bugera, S. Uryasev, and G. Zrazhevsky, 2003. Classification Using Optimization: Application to Credit Ratings of Bonds. Univ. of Florida, ISE Dept., Research Report #2003-14.
Google Scholar
C.J.C. Burges, 1998. A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery 2, 121167.
Article Google Scholar
S. Chakrabarti, B. Dom, D. Gibson, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. Proc. ACM SIGIR workshop on Hypertext Information Retrieval on the Web, 1998.
Google Scholar
M. Craven and J. Shavlik, 1997. Using Neural Networks for Data Mining. Future Generation Computer Systems (Special Issue on Data Mining) 13, 211–229.
Article Google Scholar
M. Faloutsos, P. Faloutsos and C. Faloutsos. On power-law relationships of the Internet topology. ACM SICOMM, 1999.
Google Scholar
T. A. Feo and M. G. C. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6:109–133, 1995.
Article MATH MathSciNet Google Scholar
T. A. Feo and M. G. C. Resende. A greedy randomized adaptive search procedure for maximum independent set. Operations Research, 42:860–878, 1994.
MATH Google Scholar
D.J. Felleman and D.C. Van Essen, 1991. Distributed Hierarchical Processing in the Primate Cerebral Cortex. Cereb. Cortex, 1, 1–47.
Google Scholar
M.R. Garey and D.S. Johnson, 1979. Computers and Intractability: A Guide to the Theory of NP-completeness, Freeman.
Google Scholar
M.H. Hassoun, 1995. Fundamentals of Artificial Neural Networks, MIT Press, Cambridge, MA.
MATH Google Scholar
J. Håstad, 1999. Clique is hard to approximate within n ^1−ε, Acta Mathematica 182 105–142.
Article MATH MathSciNet Google Scholar
B. Hayes, 2000. Graph Theory in Practice. American Scientist, 88: 9–13 (Part I), 104–109 (Part II).
Article Google Scholar
S. Haykin, 1999. Neural Networks: A Comprehensive Foundation. Macmillan College Publishing Company, New York.
MATH Google Scholar
C.C. Hilgetag, R. Kötter, K.E. Stephen, O. Sporns, 2002. Computational Methods for the Analysis of Brain Connectivity, In: G. A. Ascoli, ed., Computational Neuroanatomy, Humana Press.
Google Scholar
B. Huberman and L. Adamic. Growth dynamics of the World-Wide Web. Nature, 401: 131, 1999.
Google Scholar
L.D. Iasemidis, P.M. Pardalos, J.C. Sackellares, D-S. Shiau, 2001. Quadratic binary programming and dynamical system approach to determine the predictability of epileptic seizures. J. Combinatorial Optimization 5, 9–26
Article MATH MathSciNet Google Scholar
A.K. Jain and R.C. Dubes, 1988. Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, NJ.
MATH Google Scholar
H. Jeong, B. Tomber, R. Albert, Z.N. Oltvai, and A.-L. Barabasi, 2000. The large-scale organization of metabolic networks, Nature 407: 651–654.
Article Google Scholar
D. S. Johnson and M. A. Trick (eds.), 1996. Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, Vol. 26 of DIMACS Series, American Mathematical Society.
Google Scholar
J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM SODA, 1998.
Google Scholar
J. Kleinberg and S. Lawrence. The Structure of the Web. Science, 294:1849–50, 2001.
Article Google Scholar
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The Web as a graph. Symposium on Principles of Database Systems, pages 1–10, 2000.
Google Scholar
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for cyber communities. Proc. 8th WWW, 1999.
Google Scholar
O.L. Mangasarian, 1993. Mathematical Programming in Neural Networks, ORSA Journal on Computing 5, 349–360.
MATH Google Scholar
O.L. Mangasarian, W.N. Street, and W.H. Wolberg, 1995. Breast Cancer Diagnosis and Prognosis via Linear Programming, Operations Research 43(4), 570–577.
Article MATH MathSciNet Google Scholar
R. N. Mantegna, and H. E. Stanley, 2000. An Introduction to Econophysics: Correlations and Complexity in Finance, Cambridge University Press.
Google Scholar
B. Mirkin and I. Muchnik, 1998. Combinatoral Optimization in Clustering. In: Handbook of Combinatorial Optimization (D.-Z. Du and P.M. Pardalos, eds.), Volume 2. Kluwer Academic Publishers, 261–329.
Google Scholar
A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web. Journal of Digital Libraries, 1: 68–88, 1997.
Google Scholar
A. Mendelzon and P. Wood. Finding regular simple paths in graph databases. SIAM J. Comp., 24: 1235–1258, 1995.
Article MATH MathSciNet Google Scholar
J.M. Murre and D.P. Sturdy, 1995. The Connectivity of the Brain: Multi-Level Quantitative Analysis. Biol. Cybern., 73, 529–545.
MATH Google Scholar
E. Osuna, R. Freund, and F. Girosi, 1997. Improved Training Algorithm for Support Vector Machines, in Proceedings of IEEE NNSP97, Amelia Island, FL, September 1997, IEEE Press, New York, 276285.
Google Scholar
E. Osuna, R. Freund, and F. Girosi, 1997. Training Support Vector Machines: An Application to Face Detection, in IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, June 1997, IEEE Press, New York, 130136.
Google Scholar
P.M. Pardalos, T. Mavridou, and J. Xue, 1998. The Graph Coloring Problem: A Bibliographic Survey. In: Handbook of Combinatorial Optimization (D.-Z. Du and P.M. Pardalos, eds.), Volume 2. Kluwer Academic Publishers, 331–395.
Google Scholar
P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, V.A. Yatsenko, 2003. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG, Submitted to Mathematical Programming.
Google Scholar
N.G. Pavlidis, D.K. Tasoulis, G.S. Androulakis, and M.N. Vrahatis, 2004. Exchange Rate Forecasting through Distributed Time-Lagged Feedforward Neural Networks. In: Supply Chain and Finance (P.M. Pardalos, A. Migdalas, G. Baourakis, eds.), World Scientific, 283–298.
Google Scholar
G. Piatetsky-Shapiro and W. Frawley (eds.), 1991. Knowledge Discovery in Databases, MIT Press, Cambridge, MA.
Google Scholar
O.A. Prokopyev, V. Boginski, W. Chaovalitwongse, P.M. Pardalos, J. C. Sackellares, and P. R. Carney, 2003. Network-Based Techniques in EEG Data Analysis and Epileptic Brain Modeling. Submitted to Computational Statistics and Data Analysis.
Google Scholar
D.E. Rumelhart and D. Zipser, 1985. Feature Discovery by Competitive Learning. Cognitive Science, 9, 75–112.
Article Google Scholar
J.C. Sackellares, L.D. Iasemidis, R.L. Gilmore, S.N. Roper, 1997. Epileptic seizures as neural resetting mechanisms. Epilepsia 38,S3, 189.
Google Scholar
B. Scholkopf, C. Burges, and V. Vapnik, 1995. Extracting Support Data for a Given Task, in Proceedings of the First International Conference in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA. 69–88.
Google Scholar
D.S. Shiau, Q. Luo, S.L. Gilmore, S.N. Roper, P.M. Pardalos, J.C. Sackellares, L.D. Iasemidis, 2000. Epileptic seizures resetting revisited. Epilepsia. 41,S7, 208–209
Google Scholar
V. Vapnik, S.E. Golowich, and A. Smola, 1997. Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing, in Advances in Neural Information Processing Systems 9, M.C. Mozer, M.I. Jordan, and T. Petsche (eds.), MIT Press, Cambridge, MA.
Google Scholar
V.N. Vapnik, 1995. The Nature of Statistical Learning Theory, Springer, New York.
MATH Google Scholar
D. Watts, 1999. Small Worlds: The Dynamics of Networks Between Order and Randomness, Princeton University Press.
Google Scholar
D. Watts and S. Strogatz, 1998. Collective dynamics of’ small-world’ networks, Nature 393 440–442.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Applied Optimization, ISE Department, University of Florida, Gainesville, FL, 32611
Vladimir Boginski & Panos M. Pardalos
Dash Optimization, Inc., Englewood Cliffs, NJ, 07632
Alkis Vazacopoulos

Authors

Vladimir Boginski
View author publications
You can also search for this author in PubMed Google Scholar
Panos M. Pardalos
View author publications
You can also search for this author in PubMed Google Scholar
Alkis Vazacopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Minnesota, Minneapolis, MN
Ding-Zhu Du
University of Florida, Gainesville, FL
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boginski, V., Pardalos, P.M., Vazacopoulos, A. (2004). Network-based Models and Algorithms in Data Mining and Knowledge Discovery. In: Du, DZ., Pardalos, P.M. (eds) Handbook of Combinatorial Optimization. Springer, Boston, MA. https://doi.org/10.1007/0-387-23830-1_5

Download citation

DOI: https://doi.org/10.1007/0-387-23830-1_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-23829-6
Online ISBN: 978-0-387-23830-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics