Discovering the Structures of Open Source Programs from Their Developer Mailing Lists

  • Dinh Anh Nguyen
  • Koichiro Doi
  • Akihiro Yamamoto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5808)


This paper presents a method which discovers the structure of given open source programs from their developer mailing lists. Our goal is to help successive developers understand the structures and the components of open source programs even if documents about them are not provided sufficiently. Our method consists of two phases: (1) producing a mapping between the source files and the emails, and (2) constructing a lattice from the produced mapping and then reducing it with a novel algorithm, called PRUNIA (PRUNing Algorithm Based on Introduced Attributes), in order to obtain a more compact structure. We performed experiments with some open source projects which are originally from or popular in Japan such as Namazu and Ruby. The experimental results reveal that the extracted structures reflect very well important parts of the hidden structures of the programs.


mailing lists open source programs extraction of structures concept lattice 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Cimitile, A., Visaggio, G.: Software salvaging and the call dominance tree. Journal of Systems and Software 28(2), 117–127 (1995)CrossRefGoogle Scholar
  3. 3.
    Ganter, B., Wille, R.: Applied lattice theory–Formal concept analysis. In: Gratzer, G. (ed.) General Lattice Theory. Birkhauser, Basel (1997)Google Scholar
  4. 4.
    Ganter, B., Wille, R.: Formal Concept Analysis–Mathematical Foundations. Springer, Heidelberg (1999)CrossRefMATHGoogle Scholar
  5. 5.
  6. 6.
    Lindig, C.: Colibri–command line tool for concept analysis,
  7. 7.
    Lindig, C., Snelting, G.: Assessing modular structure of legacy code based on mathematical concept analysis. In: Proceedings of the 19th International Conference on Software Engineering (ICSE 1997), pp. 349–359 (1997)Google Scholar
  8. 8.
  9. 9.
    Nicolas, P., Yves, B., Rafik, T., Lotfi, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24, 25–46 (1999)CrossRefGoogle Scholar
  10. 10.
    Rasinen, A., Hollmen, J., Mannila, H.: Analysis of Linux evolution using aligned source code segments. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 209–218. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
  12. 12.
    Schwanke, R.W.: An intelligent tool for re-engineering software modularity. In: Proceedings of the 13th International Conference on Software Engineering (ICSE 1991), pp. 83–92. IEEE Computer Society Press, Los Alamitos (1991)CrossRefGoogle Scholar
  13. 13.
    Snelting, G.: Concept analysis–A new framework for program understanding. In: Proceedings of the 1998 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 1998), pp. 1–10. ACM, New York (1998)CrossRefGoogle Scholar
  14. 14.
    Tanaka, K., Akaishi, M., Takasu, A.: Topic change extraction and reorganization from problem-solving records. In: Proceedings of International Conference on Software Knowledge Information Management and Applications, pp. 153–158 (2006)Google Scholar
  15. 15.
    Tang, J., Li, H., Cao, Y., Tang, Z.: Email data cleaning. In: Proceedings of the 11th International Conference on Knowledge Discovery in Data Mining (KDD 2005), pp. 489–498 (2005)Google Scholar
  16. 16.
    Washizaki, H., Fukazawa, Y.: A technique for automatic component extraction from object-oriented programs by refactoring. Sci. Comput. Program. 56(1-2), 99–116 (2005)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Wille, R.: Restructuring lattice theory–An approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht (1982)CrossRefGoogle Scholar
  18. 18.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  19. 19.
    Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Dinh Anh Nguyen
    • 1
  • Koichiro Doi
    • 1
  • Akihiro Yamamoto
    • 1
  1. 1.Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations