Mining E-Documents to Uncover Structures

  • Azita Bahrami
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 132)


An e-Document, D, coded in HTML is comprised of a body and a head. The body includes the contents of the e-document, and the head includes, among other things, metadata.


Logical Structure Order Number Precedence Number Explicit Term Multivalue Dependency 



I would like to extend my special thanks to Mr. Clinton Carey for his coding of the explicit and implicit terms identification.


  1. Agichtein E and Ganti V (2004) Mining Reference Tables for Automatic Text Segmentation. Proceedings of the 10th ACM SIGKDD International Conference on knowledge Discovery and Data Mining (KDD’04), Kim W, Kohavi R, Gehrke J, and DuMouchel W (eds.), Seattle, WA. pp. 20-29.Google Scholar
  2. Bahrami A (2005) A Framework for Development and Management of E-Lessons in E-Learning. Proceedings of 2005 International Conference on Web Information Systems and Technologies (WEBIST’05), Cordeiro J, Pedrosa V, Encarnacão B, and Filipe J (eds.), Miami, Florida, USA, pp. 504-509.Google Scholar
  3. Bahrami A (2006) Integration of Active Learning in E-Lessons. Proceedings of 2006 International Conference on E-Learning (ICEL’06), Remenyi D (ed.), Montreal, Canada, pp. 13-21.Google Scholar
  4. Bahrami A (2006) Structural Discovery of E-lessons, Proceedings of the 2006 International Conference on E-Learning, E-Business, Enterprise Information Systems, E-Government, and Outsourcing (EEE'06), Arabnia H (ed), Las Vegas, Nevada, USA, pp. 3-9.Google Scholar
  5. Bahrami A (2007) An L-Tree Based Analysis of E-lessons. Proceedings of the 2007 International Conference on Information Technology: New Generation (ITNG’07), Latifi S (ed.), Las Vegas, Nevada, pp. 329-334.Google Scholar
  6. Bahrami A and Carey C (2007) Ontology-Based Identification of Explicit and Implicit Metadata Terms. Proceedings of the 2007 International Conference on E-Learning, E-Business, Enterprise Information Systems, E-Government, and Outsourcing (EEE'07), Arabnia H and Bahrami A (eds.), Las Vegas, Nevada, USA, pp. 210-214.Google Scholar
  7. Berners-Lee T, Hendler J and Lassila O (2001) The Semantic Web, online journal of Scientific American,
  8. Carey C (2007) A System for the Efficient Storage and Retrieval of Term Relevant E-Lessons. Masters Project, Armstrong Atlantic State University, Savannah, Georgia, May 2007.Google Scholar
  9. Cassel L “The Ontology Project” (2007)
  10. Crescenzi V, Mecca G, and Merialdo P (2001) RoadRunner: Towards Automatic Data Extraction from Large Web Sites. Proceedings of 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 109-118.Google Scholar
  11. Ford C, Chiang C, Wu H, Chilka R, Talburt J (2005) Text Data Mining: A Case Study. Proceedings of the 2005 International Conference on Information Technology: Coding and Computing (ITCC-2005), Srimani P K (ed), Las Vegas, Nevada, pp. 122-127.Google Scholar
  12. Garofalakis M N, Rastogi R, Shim k (1999) SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Proceedings of 25th International Conference on Very Large Data Bases (VLDB’99), Atkinson M P, Orlowska M E, Valduriez P, Zdonik S B, Michael L. Brodie M L (eds.), Edinburgh, Scotland, UK, pp. 223-234.Google Scholar
  13. Han J and Kamber M (2005) Data mining, Concepts and Techniques, Morgan Kaufmann Publishers, 2nd ed.Google Scholar
  14. Knoblock C A, Lerman K, Minton S, and Muslea I (2000) Accurately and Reliably Extracting Data from the Web: A machine Learning Approach. IEEE Data Engineering Bulletin, 23 (4):33-41.Google Scholar
  15. Lin T Y (2005) Granular computing: examples, intuitions and modeling Proceedings of the 2005 IEEE International Conference on Granular Computing (GrC 2005), Hu X, Liu Q, Skowron A, Lin T Y, Yager R R, Zhang B (eds.), Beijing, China, pp. 40-44.Google Scholar
  16. Lin T Y (2007) Granular Computing and Modeling the Human Thoughts in Web Documents. Proceedings (Lecture Notes in Computer Science ) of the 12th International Fuzzy Systems Association World Congress (IFSA’07), Patricia Melin P, Castillo O, Aguilar L T, Kacprzyk J, Pedrycs W (eds.), Cancun, Mexico, pp.263-270.Google Scholar
  17. Peter. F. Patel-Schneider P F, Hayes P, and Horrocks I (2004) OWL web ontology language: Semantics and abstract syntax. W3C Recommendation,, 2004.
  18. Porter M F (1980) An algorithm for Suffix Stripping. Program, 14(3), pp.130-137, 1980.CrossRefGoogle Scholar
  19. Stoilos G, Stamou G, and Kollias S (2005) String Metric for Ontology Alignment. Proceedings (Lecture Notes in Computer Science) of the International Semantic Web Conference (ISWC’05), Gil Y, Motta E, Benjamins V R, and Musen M (eds.), Galway, Ireland, pp. 623-637.Google Scholar
  20. Stojanovic L, Staab S, and Studer R (2001) Knowledge Technologies for the semantic Web. Proceedings of the WebNet2001-World Conference on the WWW and Internet, Lawrence-Fowler W A, Hasebrook J (eds.), Orlando, Florida, pp. 1174-1183.Google Scholar
  21. Wang Q, Wang X, Zhao M, and Wang D (2003) Conceptual hierarchy based rough set model. Proceedings of the 2003 International Conference on Machine Learning and Cybernetics, Vol. pp. 402-406.Google Scholar
  22. Willem Robert Van Hage, Sophia Katrenko, Guus Schreiber, “A Method to Combine Linguistic Ontology-Mapping Techniques”, Proceedings (Lecture Notes in Computer Science) of the International Semantic Web Conference (ISWC’05), Yolanda Gil, Enrico Motta, V. Richard Benjamins, and Mark Musen (eds.), Galway, Ireland, pp. 732-744.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Azita Bahrami
    • 1
  1. 1.Department of Information TechnologyArmstrong Atlantic State UniversitySavannahUSA

Personalised recommendations