Mining E-Documents to Uncover Structures

  • Azita Bahrami
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 132)


An e-Document, D, coded in HTML is comprised of a body and a head. The body includes the contents of the e-document, and the head includes, among other things, metadata.


Logical Structure Order Number Precedence Number Explicit Term Multivalue Dependency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



I would like to extend my special thanks to Mr. Clinton Carey for his coding of the explicit and implicit terms identification.


  1. Agichtein E and Ganti V (2004) Mining Reference Tables for Automatic Text Segmentation. Proceedings of the 10th ACM SIGKDD International Conference on knowledge Discovery and Data Mining (KDD’04), Kim W, Kohavi R, Gehrke J, and DuMouchel W (eds.), Seattle, WA. pp. 20-29.Google Scholar
  2. Bahrami A (2005) A Framework for Development and Management of E-Lessons in E-Learning. Proceedings of 2005 International Conference on Web Information Systems and Technologies (WEBIST’05), Cordeiro J, Pedrosa V, Encarnacão B, and Filipe J (eds.), Miami, Florida, USA, pp. 504-509.Google Scholar
  3. Bahrami A (2006) Integration of Active Learning in E-Lessons. Proceedings of 2006 International Conference on E-Learning (ICEL’06), Remenyi D (ed.), Montreal, Canada, pp. 13-21.Google Scholar
  4. Bahrami A (2006) Structural Discovery of E-lessons, Proceedings of the 2006 International Conference on E-Learning, E-Business, Enterprise Information Systems, E-Government, and Outsourcing (EEE'06), Arabnia H (ed), Las Vegas, Nevada, USA, pp. 3-9.Google Scholar
  5. Bahrami A (2007) An L-Tree Based Analysis of E-lessons. Proceedings of the 2007 International Conference on Information Technology: New Generation (ITNG’07), Latifi S (ed.), Las Vegas, Nevada, pp. 329-334.Google Scholar
  6. Bahrami A and Carey C (2007) Ontology-Based Identification of Explicit and Implicit Metadata Terms. Proceedings of the 2007 International Conference on E-Learning, E-Business, Enterprise Information Systems, E-Government, and Outsourcing (EEE'07), Arabnia H and Bahrami A (eds.), Las Vegas, Nevada, USA, pp. 210-214.Google Scholar
  7. Berners-Lee T, Hendler J and Lassila O (2001) The Semantic Web, online journal of Scientific American,
  8. Carey C (2007) A System for the Efficient Storage and Retrieval of Term Relevant E-Lessons. Masters Project, Armstrong Atlantic State University, Savannah, Georgia, May 2007.Google Scholar
  9. Cassel L “The Ontology Project” (2007)
  10. Crescenzi V, Mecca G, and Merialdo P (2001) RoadRunner: Towards Automatic Data Extraction from Large Web Sites. Proceedings of 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 109-118.Google Scholar
  11. Ford C, Chiang C, Wu H, Chilka R, Talburt J (2005) Text Data Mining: A Case Study. Proceedings of the 2005 International Conference on Information Technology: Coding and Computing (ITCC-2005), Srimani P K (ed), Las Vegas, Nevada, pp. 122-127.Google Scholar
  12. Garofalakis M N, Rastogi R, Shim k (1999) SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Proceedings of 25th International Conference on Very Large Data Bases (VLDB’99), Atkinson M P, Orlowska M E, Valduriez P, Zdonik S B, Michael L. Brodie M L (eds.), Edinburgh, Scotland, UK, pp. 223-234.Google Scholar
  13. Han J and Kamber M (2005) Data mining, Concepts and Techniques, Morgan Kaufmann Publishers, 2nd ed.Google Scholar
  14. Knoblock C A, Lerman K, Minton S, and Muslea I (2000) Accurately and Reliably Extracting Data from the Web: A machine Learning Approach. IEEE Data Engineering Bulletin, 23 (4):33-41.Google Scholar
  15. Lin T Y (2005) Granular computing: examples, intuitions and modeling Proceedings of the 2005 IEEE International Conference on Granular Computing (GrC 2005), Hu X, Liu Q, Skowron A, Lin T Y, Yager R R, Zhang B (eds.), Beijing, China, pp. 40-44.Google Scholar
  16. Lin T Y (2007) Granular Computing and Modeling the Human Thoughts in Web Documents. Proceedings (Lecture Notes in Computer Science ) of the 12th International Fuzzy Systems Association World Congress (IFSA’07), Patricia Melin P, Castillo O, Aguilar L T, Kacprzyk J, Pedrycs W (eds.), Cancun, Mexico, pp.263-270.Google Scholar
  17. Peter. F. Patel-Schneider P F, Hayes P, and Horrocks I (2004) OWL web ontology language: Semantics and abstract syntax. W3C Recommendation,, 2004.
  18. Porter M F (1980) An algorithm for Suffix Stripping. Program, 14(3), pp.130-137, 1980.CrossRefGoogle Scholar
  19. Stoilos G, Stamou G, and Kollias S (2005) String Metric for Ontology Alignment. Proceedings (Lecture Notes in Computer Science) of the International Semantic Web Conference (ISWC’05), Gil Y, Motta E, Benjamins V R, and Musen M (eds.), Galway, Ireland, pp. 623-637.Google Scholar
  20. Stojanovic L, Staab S, and Studer R (2001) Knowledge Technologies for the semantic Web. Proceedings of the WebNet2001-World Conference on the WWW and Internet, Lawrence-Fowler W A, Hasebrook J (eds.), Orlando, Florida, pp. 1174-1183.Google Scholar
  21. Wang Q, Wang X, Zhao M, and Wang D (2003) Conceptual hierarchy based rough set model. Proceedings of the 2003 International Conference on Machine Learning and Cybernetics, Vol. pp. 402-406.Google Scholar
  22. Willem Robert Van Hage, Sophia Katrenko, Guus Schreiber, “A Method to Combine Linguistic Ontology-Mapping Techniques”, Proceedings (Lecture Notes in Computer Science) of the International Semantic Web Conference (ISWC’05), Yolanda Gil, Enrico Motta, V. Richard Benjamins, and Mark Musen (eds.), Galway, Ireland, pp. 732-744.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Azita Bahrami
    • 1
  1. 1.Department of Information TechnologyArmstrong Atlantic State UniversitySavannahUSA

Personalised recommendations