Topic-Based Website Feature Analysis for Enterprise Search from the Web

  • Baoli Dong
  • Huimei Liu
  • Zhaoyong Hou
  • Xizhe Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4255)


Efficient and accurate enterprise search is a challenging and important problem for specified resources available on the web. Domain-specific enterprise websites are similar in the topic structures and textual contents. Considering the semantic information of website content terms, a novel website feature vector modelling method representing website topic were proposed on the basis of vector space model. The feature vector elements integrated textual semantic information about topic content and structure information through different semantic terms and weighting schema respectively. The contrast recognition performances demonstrate that this feature analysis approach to website topic gives full potentials for specific enterprise web search.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chakrabarti, S., Dom, B., van den Berg, M.: Focused Crawling: a New Approach to Topic-specific Web Resource Discovery. Computer Networks 31, 1623–1640 (1999)CrossRefGoogle Scholar
  2. 2.
    Ester, M., Kriegel, H.-P., Schubert, M.: Website Mining: A New Way to Spot Competitors, Customers and Suppliers in the World Wide Web. In: Proc. 8th ACM SIGKDD 2002, Edmonton, pp. 249–258 (2002)Google Scholar
  3. 3.
    Kriegel, H.-P., Schubert, M.: Classification of Websites as Sets of Feature Vectors. In: Proc. International Conference on Databases and Applications (DBA 2004), Innsbruck, pp. 127–132 (2004)Google Scholar
  4. 4.
    Ester, M., Kriegel, H.-P., Schubert, M.: Accurate and Efficient Crawling for Relevant Websites. In: Proc. 30th International Conference on Very Large Databases (VLDB 2004), Toronto, pp. 396–407 (2004)Google Scholar
  5. 5.
    Chen, X.Q., Yu, Z.H., Bai, S., et al.: Automatic Information Extraction and Classification of Web Sites. In: Proc. JSCL 1999, Beijing, pp. 87–92 (1999)Google Scholar
  6. 6.
    Tian, Y.H., Huang, T.J., Gao, W.: A Web Site Representation and Mining Algorithm Using a Multiscale Tree Model. Journal of Software 15, 1393–1404 (2004)MATHGoogle Scholar
  7. 7.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Han, E.-H., Karypis, G.: Centroid-based Document Classification: Analysis and Experimental Results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 424–431. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Dong, B.L., Liu, H.M.: Implementation Web Resource Service to Product Design. In: Proc. International Conference on Programming Language for Machine Tools, Shanghai, pp. 972–977 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Baoli Dong
    • 1
    • 2
  • Huimei Liu
    • 3
  • Zhaoyong Hou
    • 3
  • Xizhe Liu
    • 2
  1. 1.Department of Mechanical EngeeringTaiyuan University of Science and TechnologyTaiyuanChina
  2. 2.Institute of Manufacturing EngineeringZhejiang UniversityHangzhouChina
  3. 3.School of ScienceTaiyuan University of TechnologyTaiyuanChina

Personalised recommendations