Data Mining and Signal Detection

  • Mark Chang
Part of the Statistics for Biology and Health book series (SBH)


Data mining, a the confluence of multiple intertwined disciplines such as statistics, machine learning, pattern recognition, database systems, information retrieval, the World Wide Web, visualization, and many application domains, has made great progress in the past decade. Statistics plays a vital role in successful machine learning or data mining in which we are constantly dealing with streamed data, which are usually of vast volume, changing dynamically, possibly infinite, and containing multidimensional features. The contents and methods to be discussed in this chapter often appear under bioinformatics, data mining, signal detection, or pharmacovigilance.


Random Forest Cellular Automaton Reinforcement Learning Adverse Event Reporting System Proportional Reporting Ratio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Further Readings and References

  1. Abt, K.: Poisson sequential sampling modified towards maximal safety in adverse event monitoring. Biomed J. 40, 21–41 (1998)zbMATHGoogle Scholar
  2. Ahmed, C., Dalmasso, F., Haramburu, F., Thiessard, F., Broët, P., Tubert-Bitter, P.: False discovery rate estimation for frequentist pharmacovigilance signal detection methods. Biometrics 66, 301–309 (2010)zbMATHCrossRefGoogle Scholar
  3. Ahmed, I., Haramburu, F., Fourrier-Réglat, A., Thiessard, F., Kreft-Jais, C., Miremont-Salamé, G. et al.: Bayesian pharmacovigilance signal detection methods revisited in a multiple comparison setting. Stat. Med. 28, 1774–1792 (2009)CrossRefMathSciNetGoogle Scholar
  4. Almenoff, J., Tonning, J.M., Gould, A.L., et al.: Perspectives on the use of data mining in pharmacovigilance. Drug Saf. 28(11), 981–1007 (2005)CrossRefGoogle Scholar
  5. An, G.: In-silico experiments of existing and hypothetical cytokine-directed clinical trials using agent based modeling. Crit. Care Med. 32(10), 2050–2060 (2004)CrossRefGoogle Scholar
  6. Ausk, B.J, Gross, T.S., Srinivasan, S.: An agent based model for real-time signaling induced in osteocytic networks by mechanical stimuli. J. Biomech. 39, 2638–2646 (2005)Google Scholar
  7. Balakin, K.V. (ed.): Pharmaceutical Data Mining: Approaches and Applications for Drug Discovery. Wiley, Hoboken (2010)Google Scholar
  8. Bate, A., Lindquist, M., Edwards, I.R., Olsson, S., Orre, R., Lansner, A., De Freitas, R.M.A Bayesian neural network method for adverse drug reaction signal generation. Eur. J. Clin. Pharmacol. 54, 315–321 (1998)Google Scholar
  9. Berlekamp, E.R., Conway, J.H., Guy, R.K.: Winning Ways for Your Mathematical Plays. Academic Press, London (1982)zbMATHGoogle Scholar
  10. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  11. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)zbMATHGoogle Scholar
  12. Brin, S., Page, L.: The anatomy of a large-scale hypertextual (Web) search engine, in the seventh international World Wide Web conference. Comput. Netw. ISDN Syst. 30, 1–767 (1998)CrossRefGoogle Scholar
  13. Carroll, K.: Analysis of progression-free survival in oncology trials: Some common statistical issues. Pharm. Stat. 6, 99–113 (2007)CrossRefGoogle Scholar
  14. Cerrito, P.B.: Data mining and biopharmaceutical research. In: Chow, S.C. (ed.) Encyclopedia of Biopharmaceutical Statistics. Marcel Dekker, Boca Raton (2003)Google Scholar
  15. Chang, M.: Monte Carlo Simulation for the Pharmaceutical Industry. CRC, Boca Raton (2010)CrossRefGoogle Scholar
  16. Committee for Proprietary Medicinal Products (CPMP): Points to Consider on Switching between Superiority and Non-inferiority. London (2000)Google Scholar
  17. Committee for Proprietary Medicinal Products (CPMP): Points to Consider on Multiplicity Issues in Clinical Trials. London (2002)Google Scholar
  18. Crowe, B.J., Xia, H.A., Berlin, J.A., Watson, D.J., Shi, H., et al.: Recommendations for safety planning, data collection, evaluation and reporting during drug, biologic and vaccine development: A report of the safety planning, evaluation, and reporting team. Clin. Trials 6, 430–440 (2009)CrossRefGoogle Scholar
  19. d’Inverno, M., Prophet, J.: Multidisciplinary investigation into adult stem cell behaviour. In: Priami, C., Merelli, E., Gonzalez, P., Omicini, A. (eds.) Transactions on Computational Systems Biology III. Lecture Notes in Computer Science, vol. 3737, pp. 49–64. Springer, Berlin (2005)Google Scholar
  20. DuMouchel, W.: Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system (Discussion pp. 190–202). Am. Stat. 53, 177–190 (1999)Google Scholar
  21. EMEA: ICH Topic E9: Statistical principles for clinical trials. (1998). Accessed 10 Oct 2010
  22. Emonet, T., Macal, C.M., North, M.J., Wickersham, C.E., Cluzel, P.: AgentCell: A digital single-cell assay for bacterial chemotaxis. Bioinformatics 21, 2714–2721 (2005)CrossRefGoogle Scholar
  23. Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)zbMATHCrossRefGoogle Scholar
  24. Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)zbMATHGoogle Scholar
  25. Gould, A.L.: Accounting for multiplicity in the evaluation of signals obtained by data mining from spontaneous report adverse event databases. Biom. J. 49, 151–165 (2007)CrossRefMathSciNetGoogle Scholar
  26. Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)Google Scholar
  27. Harper, G., Bradshaw, J., Gittins, J.C., Green, D.V.S., Leach, A.R.: Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Info. Comput. Sci. 41, 1295–1300 (2001)Google Scholar
  28. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, London (2001, 2nd edn., 2009)Google Scholar
  29. Ho, T.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRefGoogle Scholar
  30. June, A., Joseph, T.M., Gouid, L.A., Ana, S. Manfred, H., Rita, O.H., et al.: Perspectives on the use of data mining in pharmacovigilance. Drug Saf. 28(11), 981–1007 (2005)CrossRefGoogle Scholar
  31. Kier, L.B., Cheng, C.K., Testa, B., Carrupt, P.A.: A cellular automata model of micelle formation. Pharm. Res. 13, 1419–1422 (1996)CrossRefGoogle Scholar
  32. Kier, L.B., Cheng, C.K., Testa, B., Carrupt, P.A.: A cellular automata model of diffusion in aqueous systems. J. Pharm. Sci. 86, 774–778 (1997)CrossRefGoogle Scholar
  33. Kleinberg, J.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 577–603 (1999)CrossRefMathSciNetGoogle Scholar
  34. Kulldorff, M.: A maximized sequential probability ratio test for drug and vaccine adverse event surveillance. Presented at the Vaccine Safety Datalink Annual Meeting, Berkeley, 11 May 2006Google Scholar
  35. Langville, A.N., Meyer, C.D.: Deeper inside PageRank. Internet Math. 1(3), 335–380 (2005)CrossRefMathSciNetGoogle Scholar
  36. Li, L.: A conditional sequential sampling procedure for drug safety surveillance. Stat. Med. 28, 3124–3138 (2009)CrossRefGoogle Scholar
  37. Li, L., Kulldorff, M.: A conditional maximized sequential probability ratio test for pharmacovigilance. Stat. Med. 29, 284–295 (2010)MathSciNetGoogle Scholar
  38. Lollini, P.L., Motta, S., Pappalardo, F.: Discovery of cancer vaccination protocols with a genetic algorithm driving an agent-based simulator. BMC Bioinfom. 7, 352–352 (2006)CrossRefGoogle Scholar
  39. Materi, W., Wishart, D.S.: Computational systems biology in drug discovery and development: Methods and applications. Drug Discov. Today 12(7/8) (2007)Google Scholar
  40. Mehrotra, D.V., Heyse, J.F.: Multiplicity considerations in clinical safety analysis. Stat. Meth. Med. Res. 13, 227–238 (2004)zbMATHMathSciNetGoogle Scholar
  41. Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link analysis, eigenvectors and stability. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 903–910. Morgan Kaufmann Publishers, San Francisco (2001)Google Scholar
  42. Peer, M.A., Shah, N.A., Khan, K.A.: Cellular automata and its advances to drug therapy for HIV infection. Indian J. Exp. Biol. 42, 131–137 (2004)Google Scholar
  43. Poli, R., Langdon, W.R., McPhee, N.F.: A field guide to genetic programming, Creative Commons Attribution. England & Wales License, UK (2008)Google Scholar
  44. Politopoulos, I.: Review and Analysis of Agent-Based Models in Biology. University of Liverpool, Liverpool (2007)Google Scholar
  45. Posch, M., Zehetmayer, S., Bauer, P.: Hunting for significance with the false discovery rate. J. Am. Stat. Assoc. 104, 832–840 (2009).CrossRefMathSciNetGoogle Scholar
  46. Sadooghi-Alvandi, S.M., Nematollahi, A.R.: On the distribution of sum of independent uniform random variables. Stat. Pap. 50, 171–175 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  47. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit (with discussion). J. R. Stat. Soc. B Ser. 64, 583–640 (2002)zbMATHCrossRefGoogle Scholar
  48. Southworth, H., O’Connell, M.: Data mining and statistically guided clinical review of adverse event data in clinical trials. J. Biopharm. Stat. 19, 803–817 (2009)CrossRefMathSciNetGoogle Scholar
  49. Szarfman, A., Machado, S., O’Neill, R.: Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA’s spontaneous reports database. Drug Saf. 25(6), 381–392 (2002)CrossRefGoogle Scholar
  50. von Neumann, J.: Elementary cellular automata. In: Burks, A. (ed.) The Theory of Self-Reproducing Automata. University of Illinois Press, Urbana-Champaign (1966)Google Scholar
  51. Walker, D.C., Hill, G., Wood, S.M., Smallwood, R.H., Southgate, J.: Agent-based computational modeling of wounded epithelial cell monolayers. IEEE Trans. Nanobiosci. 3, 153–163 (2006)CrossRefGoogle Scholar
  52. Wilton, D.J., Harrison, R.F., Willett, P.: Virtual screening using binary kernel discrimination: analysis of pesticide data. J. Chem. Info. Model. 46, 471–477 (2006)CrossRefGoogle Scholar
  53. Wishart, D.S., Yang, R., Arndt, D., Tang, P., Cruz, J.: Dynamic cellular automata: an alternative approach to cellular simulation. In Silico Biol. 5, 139–161 (2005).Google Scholar
  54. Wolfram, S.: A new kind of science. Wolfram Media. (2002). Accessed 15 Oct 2010
  55. Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. Chapman and Hall/CRC, Boca Raton (2009)zbMATHCrossRefGoogle Scholar
  56. Zygourakis, K., Markenscoff, P.A.: Computer-aided design of bioerodible devices with optimal release characteristics: A cellular automata approach. Biomaterials 17, 125–135 (1996)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Mark Chang
    • 1
  1. 1.BiometricsAMAG Pharmaceuticals, Inc.LexingtonUSA

Personalised recommendations