Advertisement

A Survey of Emerging Trend Detection in Textual Data Mining

  • April Kontostathis
  • Leon M. Galitsky
  • William M. Pottenger
  • Soma Roy
  • Daniel J. Phelps

Abstract

In this chapter we describe several systems that detect emerging trends in textual data. Some of the systems are semiautomatic, requiring user input to begin processing, and others are fully automatic, producing output from the input corpus without guidance. For each Emerging Trend Detection (ETD) system we describe components including linguistic and statistical features, learning algorithms, training and test set generation, visualization, and evaluation. We also provide a brief overview of several commercial products with capabilities of detecting trends in textual data, followed by an industrial viewpoint describing the importance of trend detection tools, and an overview of how such tools are used.

Keywords

World Wide News Story Frequency Number Trend Detection Theme Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [ABC+95]
    J. Allan, L. Ballesteros, J. Callan, W. Croft, and Z. Lu.Recent experiments with inquery.In Proceedings of the Fourth Text Retrieval Conference (TREC-4), pages 49–63, 1995.Google Scholar
  2. [APL98]
    J. Allan, R. Papka, and V. Lavrenko.On-line new event detection and tracking.In Proceedings of ACM SIGIR, pages 37–45, 1998.Google Scholar
  3. [App]
    Applied Semantics [online, cited July 2002]. Available from World Wide Web: www.appliedsemantics.corn.Google Scholar
  4. [APWZ95]
    R. Agrawal, G. Psaila, E.L. Wimmers, and M. Zait.Querying shapes of histories.In Proceedings of the 21st International Conference on Very Large Databases,Zurich, Sep 1995.Google Scholar
  5. [AS95]
    R. Agrawal and R. Srikant.Mining sequential patterns.In Proceedings of the International Conference on Data Engineering (ICDE),Taipei, Mar 1995.Google Scholar
  6. [Auta]
    Autonomy [online, cited July 2002].Available from World Wide Web: www.autonomy. corn.Google Scholar
  7. [Autb]
    Autonomy [online, cited July 2002].Available from World Wide Web: www.autonomy. com/Content/Technology/Background/ IntellectualFoundations.Google Scholar
  8. [Aut99]
    Knowlege Suite (Review) [online].1999 [cited July 2002 ]. Available fromWorld Wide Web: www. autonomy. com/Extranet/Marketing/ Analyst White Papers/Butler Report on Autonomy Suite 200299.pdf.Google Scholar
  9. [Ban]
    Banter [online, cited July 2002].Available from World Wide Web: www.banter. corn.Google Scholar
  10. [BCG+01]
    R. Bader, M. Callahan, D. Grim, J. Krause, N. Miller, and W.M. Pottenger.The role of the HDDITM collection builder in hierarchical distributed dynamic indexing.In Proceedings of the Textmine ‘01 Workshop, First SIAM International Conference on Data Mining,Apr 2001.Google Scholar
  11. [BMSW97]
    D. Bikel, S. Miller, R. Schwartz, and R. Weischedel.Nymble: A high-performance learning name-finder.In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 194–201, 1997.Google Scholar
  12. [BP00]
    F. Bouskila and W.M. Pottenger.The role of semantic locality in hierarchical distributed dynamic indexing.In Proceedings of the 2000 International Conference on Artificial Intelligence (IC-Al 2000),Las Vegas, Jun 2000.Google Scholar
  13. [BPK+01]
    G.D. Blank, W.M. Pottenger, G.D. Kessler, M. Herr, H. Jaffe, S. Roy, D. Gevry, and Q. Wang.Cimel: Constructive, collaborative inquiry-based multimedia elearning.In Proceedings of the Sixth Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE),Jun 2001.Google Scholar
  14. [BPK+02]
    G.D. Blank, W.M. Pottenger, G.D. Kessler, S. Roy, D.R. Gevry, J.J. Heigl, S.A. Sahasrabudhe, and Q. Wang.Design and evaluation of multimedia to teach Java and object-oriented software engineering.American Society for Engineering Education, Jun 2002.Google Scholar
  15. Bri92] E. Brill.A simple rule-based part of speech tagger.In Proceedings of the Third Conference on Applied Natural Language Processing. ACL, 1992.Google Scholar
  16. [Bry02]
    D. Bryan, Jul 2002. Email correspondence.Google Scholar
  17. [Cap]
    Captiva [online, cited July 2002].Available from World Wide Web: www.captivacorp.com.Google Scholar
  18. [CC99]
    C. Chen and L. Car.A semantic-centric approach to information visualization.In Proceedings of the 1999 International Conference on Information Visualization, pages 18–23, 1999.Google Scholar
  19. [CIM]
    CIMEL [online, cited July 2002].Available from World Wide Web: www.cse.lehigh.edu/”cimel.Google Scholar
  20. [CL92]
    H. Chen and K.J. Lynch.Automatic construction of networks of concepts characterizing document databases.IEEE Transactions on Systems, Man and Cybernetics, 22 (5): 885–902, 1992.CrossRefGoogle Scholar
  21. [Cie]
    ClearForest [online, cited July 2002 ]. Available from World Wide Web: www. clearforest. corn.Google Scholar
  22. [Clu]
    ClusterizerTM [online, cited July 20021.Available from World Wide Web: www.autonomy.com/Extranet/Technical/Modules/ TB Autonomy Clusterizer.pdf.Google Scholar
  23. [COM]
    COMPENDEX® [online, cited July 2002].Available from World Wide Web: edina.ac.uk/compendex.Google Scholar
  24. [Del]
    Delphion [online, cited July 2002].Available from World Wide Web: www.delphion. corn.Google Scholar
  25. [DHJ+98]
    G.S. Davidson, B. Hendrickson, D.K. Johnson, C. E. Meyers, and B.N. Wylie.Knowledge mining with VxlnsightTM: Discovery through interaction.Journal of Intelligent Information Systems, 11 (3): 259–285, 1998.Google Scholar
  26. [Edg95]
    E. Edgington.Randomization Tests.Marcel Dekker, New York, 1995.Google Scholar
  27. [Fac]
    Factiva [online, cited July 2002].Available from World Wide Web: www.factiva.com.Google Scholar
  28. [FD95]
    R. Feldman and I. Dagan.Knowledge discovery in textual databases.In Proceedings of the First International Conference on Knowledge Discovery (KDD-95). ACM, New York, Aug 1995.Google Scholar
  29. [FSM+95]
    D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert.Description of the UMASS systems as used for MUC-6.In Proceedings of the Sixth Message Understanding Conference, pages 127–140, Nov 1995.Google Scholar
  30. [Gar]
    GartnerG2 [online, cited July 2002].Available from World Wide Web: www.gartnerg2.com/site/default. asp.Google Scholar
  31. [Gev02]
    D. Gevry.Detection of emerging trends: Automation of domain expert practices.Master’s thesis, Department of Computer Science and Engineering at Lehigh University, 2002.Google Scholar
  32. [Gra02]
    B. Graubart.White paper, turning unstructured data overload into a competitive advantage, Jul 2002. Email attachment.Google Scholar
  33. [HDD]
    HDDITM [online, cited July 2002].Available from World Wide Web: hddi cse.lehigh.edu.Google Scholar
  34. [HHWNO2]
    S. Havre, E. Hetzler, P. Whitney, and L. Nowell.ThemeRiver: Visualizing thematic changes in large document collections.IEEE Transactions on Visualization and Computer Graphics, 8(1), Jan — Mar 2002.Google Scholar
  35. [HyB]
    HyBrix [online, cited July 2002].Available from World Wide Web: www.siemens.com/index.jsp.Google Scholar
  36. [IDC]
    IDC [online, cited July 2002 ]. Available from World Wide Web: www. idc.com.Google Scholar
  37. [INS]
    INSPEC® [online, cited July 2002].Available from World Wide Web: www.iee.org.uk/Publish/INSPEC.Google Scholar
  38. [Int]
    Interwoven [online, cited July 2002].Available from World Wide Web: www.interwoven.com/products.Google Scholar
  39. [LA00a]
    A. Leuski and J. Allan.Lighthouse: Showing the way to relevant information.In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), pages 125–130, 2000.Google Scholar
  40. [LA00b]
    A. Leuski and J. Allan.Strategy-based interactive cluster visualization for information retrieval international Journal on Digital Libraries, 3 (2): 170–184, 2000.CrossRefGoogle Scholar
  41. [LAS97]
    B. Lent, R. Agrawal, and R. Srikant.Discovering trends in text databases.In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, pages 227–230, 1997.Google Scholar
  42. [Lex]
    LexisNexis [online, cited July 2002].Available from World Wide Web: www.lexisnexis.corn.Google Scholar
  43. [Ley02]
    L. Leydesdorff.Indicators of structural change in the dynamics of science: Entropy statistics of the sci journal citation reports.Scientometrics,53(1):131159, 2002.Google Scholar
  44. [Lin]
    Linguistic Data Consortium [online, cited July 2002 ]. Available from World Wide Web: www. ldc. upenn. edu.Google Scholar
  45. [Loc]
    Lockheed-Martin [online, cited July 2002].Available from World Wide Web: www.lockheedmartin.com.Google Scholar
  46. [LSL+00]
    V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan.Mining of concurrent text and time-series.In Proceedings of the ACM KDD-2000 Text Mining Workshop,2000.Google Scholar
  47. [MDOP97]
    A. Martin, T.K.G. Doddington, M. Ordowski, and M. Przybocki.The DET curve in assessment of detection task performance.In Proceedings of EuroSpeech ‘97, vol. 4, pages 1895–1898, 1997.Google Scholar
  48. [Mor]
    Moreover [online, cited July 2002].Available from World Wide Web: www.moreover. corn.Google Scholar
  49. [NFH+96]
    L.T. Nowell, R.K. France, D. Hix, L. S Heath, and E.A. Fox.Visualizing search results: Some alternatives to query-document similarity.In Proceedings of SIGIR’96, Zurich, pages 67–75, 1996.Google Scholar
  50. [Nor]
    Northern Light [online, cited July 2002].Available from World Wide Web: www.northernlight.corn.Google Scholar
  51. [PCP01]
    W.M. Pottenger, M.R. Callahan, and M.A. Padgett.Distributed information management.Annual Review of Information Science and Technology (ARIST), 35, 2001.Google Scholar
  52. [PD95]
    A.L. Porter and M.J. Detampel.Technology opportunities analysis. Technological Forecasting and Social Change, 49: 237–255, 1995.CrossRefGoogle Scholar
  53. [PFL+00]
    A. Popescul, G.W. Flake, S. Lawrence, L. Ungar, and C.L. Giles.Clustering and identifying temporal trends in document databases.In Proceedings of IEEE Advances in Digital Libraries, pages 173–182, 2000.Google Scholar
  54. [PKM01]
    W.M. Pottenger, Y. Kim, and D.D. Meling.HDDITM: Hierarchical distributed dynamic indexing.In Data Mining for Scientific and Engineering Applications, Robert Grossman, Chandrika Kamath, Vipin Kumar and Raju Namburu, eds., Jul 2001.Google Scholar
  55. [PMS+98]
    C. Plaisant, R. Mushlin, A. Snyder, J. Li, D. Heller, and B. Shneiderman.Lifelines: Using visualization to enhance navigation and analysis of patient records.In Proceedings of the 1998 American Medical Informatic Association Annual Fall Symposium, pages 76–80, 1998.Google Scholar
  56. [PY01]
    W.M. Pottenger and T. Yang.Detecting emerging concepts in textual data mining.In Computational Information Retrieval, M.W. Berry, ed., pages 89–105, SIAM, Philadelphia, 2001.Google Scholar
  57. [RGP02]
    S. Roy, D. Gevry, and W.M. Pottenger.Methodologies for trend detection in textual data mining.In Proceedings of the Textmine ‘02 Workshop, Second SIAM International Conference on Data Mining,Apr 2002.Google Scholar
  58. [Roy02]
    S. Roy.A multimedia interface for emerging trend detection in inquiry-based learning.Master’s thesis, Department of Computer Science and Engineering at Lehigh University, May 2002.Google Scholar
  59. [SA96]
    R. Srikant and R. Agrawal.Mining sequential patterns: Generalizations and performance improvements.In Proceedings of the Fifth International Conference on Extending Database Technology (EDBT),Avignon, 1996.Google Scholar
  60. [SA00]
    R. Swan and J. Allan.Automatic generation of overview timelines.In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, ACM, New York, pages 49–56, 2000.Google Scholar
  61. [Sem]
    Semio [online, cited July 2002].Available from World Wide Web: www.semio.com.Google Scholar
  62. [Ser]
    Ser Solutions [online, cited July 2002]. Available from World Wide Web: www.sersolutions.com.Google Scholar
  63. [SJ00]
    R. Swan and D. Jensen.TimeMines: Constructing timelines with statistical models of word usage.In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000.Google Scholar
  64. [SPSa]
    SPSS Clementine [online, cited July 2002].Available from World Wide Web:www.spss.corn/spssbi/clementine.Google Scholar
  65. [SPSb]
    SPSS LexiQuest [online, cited July 2002].Available from World Wide Web:www.spss.com/spssbi/lexiquest.Google Scholar
  66. [Str]
    Stratify [online, cited July 2002].Available from World Wide Web: www.stratify. corn.Google Scholar
  67. [TDT]
    TDT [online, cited July 2002 ]. Available from World Wide Web: www. ni s t.gov/speech/tests/tdt/index.htm.Google Scholar
  68. [Tex]
    TextAnalyst [online, cited July 2002].Available from World Wide Web: www.megaputer.com/products/ta/index.php3.Google Scholar
  69. [Tho]
    ThoughtShare [online, cited July 2002 ]. Available from World Wide Web:www. thought share.corn.Google Scholar
  70. [UIU]
    University of Illinois at Urbana-Champaign Digital Library Initiative [online,cited July 2002 ]. Available from World Wide Web: dl i. grainger. uiuc. edu.Google Scholar
  71. [US]
    US Patent Site [online, cited July 2002].Available from World Wide Web:www.uspto.gov/main/patents.htm.Google Scholar
  72. [Ver]
    Verity [online, cited July 2002].Available from World Wide Web: www.verity. corn.Google Scholar
  73. [XBC94]
    J. Xu, J. Broglio, and W.B. Croft. The design and implementation of a partof speech tagger for English.Technical report, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, Technical Report IR-52, 1994.Google Scholar
  74. [Yan00]
    T. Yang.Detecting emerging conceptual contexts in textual collections.Master’s thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, 2000.Google Scholar
  75. [YPC98]
    Y. Yang, T. Pierce, and J. Carbonell.A study on retrospective and on-line event detection.In Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval,1998.Google Scholar
  76. [Zho00]
    L. Zhou.Machine learning classification for detecting trends in textual collections.Master’s thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, December 2000.Google Scholar

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • April Kontostathis
  • Leon M. Galitsky
  • William M. Pottenger
  • Soma Roy
  • Daniel J. Phelps

There are no affiliations available

Personalised recommendations