Cluster Computing

, 12:373 | Cite as

Experience with BXGrid: a data repository and computing grid for biometrics research

  • Hoang Bui
  • Michael Kelly
  • Christopher Lyon
  • Mark Pasquier
  • Deborah Thomas
  • Patrick Flynn
  • Douglas Thain


Research in the field of biometrics depends on the effective management and analysis of many terabytes of digital data. The quality of an experimental result is often highly dependent upon the sheer amount of data marshalled to support it. However, the current state of the art requires researchers to have a heroic level of expertise in systems software to perform large scale experiments. To address this, we have designed and implemented BXGrid, a data repository and workflow abstraction for biometrics research. The system is composed of a relational database, an active storage cluster, and a campus computing grid. End users interact with the system through a high level abstraction of four stages: Select, Transform, AllPairs, and Analyze. A high degree of availability and reliability is achieved through transparent fail over, three phase operations, and independent auditing. BXGrid is currently in daily production use by an active biometrics research group at the University of Notre Dame. We discuss our experience in constructing and using the system and offer lessons learned in conducting collaborative research in e-Science.


e-science Grid computing Biometrics Abstractions 


  1. 1.
    Baru, C., Moore, R., Rajasekar, A., Wan, M.: The SDSC storage resource broker. In: Proceedings of CASCON, Toronto, Canada, 1998 Google Scholar
  2. 2.
    Daugman, J.: How Iris recognition works. IEEE Trans. Circuits Syst. Video Technol. 14(1), 21–30 (2004) CrossRefGoogle Scholar
  3. 3.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large cluster. In: Operating Systems Design and Implementation, 2004 Google Scholar
  4. 4.
    Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, B., Good, J., Laity, A., Jacob, J., Katz, D.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. J. 13(3) (2005) Google Scholar
  5. 5.
    Dongarra, J.J., Walker, D.W.: MPI: a standard message passing interface. Supercomputer (January), 56–68 (1996) Google Scholar
  6. 6.
    Gray, J., Szalay, A.: Where the rubber meets the sky: bridging the gap between databases and science. IEEE Data Eng. Bull. 27, 3–11 (2004) Google Scholar
  7. 7.
    Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., West, M.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988) CrossRefGoogle Scholar
  8. 8.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data parallel programs from sequential building blocks. In: Proceedings of EuroSys, March 2007 Google Scholar
  9. 9.
    Jain, A.K., Ross, A., Pankanti, S.: A prototype hand geometry-based verification system. In: Proc. Audio- and Video-Based Biometric Person Authentication (AVBPA), pp. 166–171, 1999 Google Scholar
  10. 10.
    Moretti, C., Bulosan, J., Flynn, P., Thain, D.: All-pairs: an abstraction for data intensive cloud computing. In: International Parallel and Distributed Processing Symposium (IPDPS), 2008 Google Scholar
  11. 11.
    No, J., Thakur, R., Choudhary, A.: Integrating parallel file i/o and database support for high-performance scientific data management. In: IEEE High Performance Networking and Computing, 2000 Google Scholar
  12. 12.
    Pinheiro, E., Weber, W.-D., Barroso, L.A.: Failure trends in a large disk drive population. In: USENIX File and Storage Technologies, 2007 Google Scholar
  13. 13.
    Ratha, N., Bolle, R.: Automatic Fingerprint Recognition Systems. Springer, Berlin (2004) CrossRefGoogle Scholar
  14. 14.
    Riedel, E., Gibson, G.A., Faloutsos, C.: Active storage for large scale data mining and multimedia. In: Very Large Databases (VLDB), 1998 Google Scholar
  15. 15.
    Searcs, R., Ingen, C.V., Gray, J.: To blob or not to blob: large object storage in a database or a filesystem. Technical Report MSR-TR-2006-45, Microsoft Research, April (2006) Google Scholar
  16. 16.
    Stolte, E., von Praun, C., Alonso, G., Gross, T.: Scientific data repositories. Designing for a moving target. In: SIGMOD, 2003 Google Scholar
  17. 17.
    Szalay, A.S., Kunszt, P., Thakar, A., Gray, J., Slutz, D., Brenner, R.J.: Designing and mining multi-terabyte astronomy archives: the sloan digital sky survey. Technical Report MSR-TR-99-30, Microsoft Research, Feb (2000) Google Scholar
  18. 18.
    Thain, D., Moretti, C., Hemmes, J.: Chirp: a practical global file system for cluster and grid computing. J. Grid Comput. 7(1), 51–72 (2009) CrossRefGoogle Scholar
  19. 19.
    Thain, D., Tannenbaum, T., Livny, M.: Condor and the grid. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2003) Google Scholar
  20. 20.
    Wan, M., Moore, R., Schroeder, W.: A prototype rule-based distributed data management system rajasekar. In: HPDC Workshop on Next Generation Distributed Data Management, May 2006 Google Scholar
  21. 21.
    Yan, P., Bowyer, K.W.: A fast algorithm for icp-based 3d shape biometrics. Comput. Vis. Image Underst. 107(3), 195–202 (2007) CrossRefGoogle Scholar
  22. 22.
    Zhao, W., Chellappa, R., Phillips, P., Rosenfeld, A.: Face Recognition: A Literature Survey. ACM Comput. Surv. 34(4), 299–458 (2003) Google Scholar
  23. 23.
    Zhao, Y., Dobson, J., Moreau, L., Foster, I., Wilde, M.: A notation and system for expressing and executing cleanly typed workflows on messy scientific data. In: SIGMOD, 2005 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Hoang Bui
    • 1
  • Michael Kelly
    • 1
  • Christopher Lyon
    • 1
  • Mark Pasquier
    • 1
  • Deborah Thomas
    • 1
  • Patrick Flynn
    • 1
  • Douglas Thain
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of Notre DameNotre DameUSA

Personalised recommendations