News Timeline Generation: Accounting for Structural Aspects and Temporal Nature of News Stream

Tikhomirov, Mikhail; Dobrov, Boris

doi:10.1007/978-3-319-96553-6_19

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 822))

Included in the following conference series:

International Conference on Data Analytics and Management in Data Intensive Domains

786 Accesses
2 Citations

Abstract

The number of news articles that are published daily is larger than any person can afford to study. Correct summarization of the information allows for an easy search for the event of interest. This research was designed to address the issue of constructing annotations of news story. Standard multi-document summarization approaches are not able to extract all information relevant to the event. This is due to the fact that such approaches do not take into account the variability of the event context in time. We have implemented a system that automatically builds timeline summary. We investigated impact of three factors: query extension, accounting for temporal nature and structure of news article in form of inverted pyramid. The annotations that we generate are composed of sentences sorted in chronological order, which together contain the main details of the news story. The paper shows that taking into account the described factors positively affects the quality of the annotations created.

You have full access to this open access chapter, Download conference paper PDF

Timeline Summarization from Relevant Headlines

Time-Matters: Temporal Unfolding of Texts

Automatic Generation of Timelines for Past-Web Events

Keywords

1 Introduction

Due to the explosive growth of the amount of content on the Internet, the problems of extraction and automatically summarizing useful information in the incoming data stream arises. One of such problems is the summarization of news articles on an event. The news story - is a set of news reports from various sources dedicated to describing an event. Such problems are often investigated and solved by news aggregators, for example, Google.News^{Footnote 1} [17] or Yandex.News.^{Footnote 2} This is due to the fact that to work with such problems the researcher needs a huge and diverse collection of news articles.

The typical “lifetime” of the news story (the time of active discussion of the event) is usually a day or two, but not all events are so short. Some news stories have a “history” in the form of a set of previous events that occurred at different moments and are more or less related to each other. Existing multi-document summarization approaches do not take into account the fact that the context, actors, geography and other event properties can vary over time.

The fact that journalists are returning to the same events, for example, with the appearance of new data, indicates that such events are important for the society. The need for a brief summary of the event raises the problem of forming a “timeline summary”. Timeline summary is a type of multi-document summary, containing the essential details of the subject matter under discussion. The construction of such annotations is a complex task, performed by journalists or analysts manually. This implies that the automation of such a process is a urgent problem.

In this paper we consider challenges and solutions for the automatic generation of temporal summaries. We consider this problem as a multi-document summarization on a query over a representative collection of news documents. The query in this case is the text of the news message. The situation corresponds to the scenario when a user would like to receive a timeline summary after reading the news document. The result should be a time-ordered list of descriptions of the key sub-events related to main event. The result consists of parts of existing sentences, since our solution refers to extractive summarization approaches.

A system was developed to automate the timeline summarization process. Experiments were conducted over a collection of 2 million Russian news for the first half of 2015. Three new factors were investigated to improve the results of constructing a timeline summary: query extension using pseudo-relevance feedback, accounting for the timing characteristics of news stories and the structure of the inverted pyramid.

This is a follow-up study of timeline summarization problem reported in previous paper [25]. In this study, we expanded the collection of standard annotations three-fold. The evaluation process was improved by dividing the collection into a training and test parts. An optimization module was added for fitting the configurations. As a result, substantial progress was achieved. Taking into account the structure of the inverted pyramid showed a significant increase in the values of metrics, which was not achieved in the previous article.

2 Related Work

2.1 Automatic Text Summarization Problem

Currently, there are quite a number of methods for automatic text summarization [3]. Some methods that use large linguistic ontologies [12, 15], that may be automatically supplemented during the analysis. Other methods are based on the statistical properties of texts [16] or machine learning [13].

During the generation of the annotations, the following problems occur [3, 7, 11]:

Ensuring the completeness of the presentation of information, including the most up-to-date information.
Decreasing of redundancy in the information provided.
Ensuring the coherence and understandability of the information provided.

To ensure the completeness of the resulting annotation, it is often necessary to find links between sentences or documents [20].

To determine the redundancy in the generated annotations, various measures of similarity between sentences are used. One of the most common approaches is clustering - the selection of content groups of sentences [6]. Another approach to reduce redundancy is to compare candidate sentence with sentences that have already been included in the summary and to evaluate novel information. Example of such approach is the Maximal Marginal Relevance (MMR) [2].

The problem of ensuring the coherence of information in the summary arises both in the methods of generating the annotation [18, 19], and in the methods of evaluation, because in order to assess the connectivity and linguistic qualities of the annotation, it is necessary to perform a manual evaluation.

2.2 Timeline Summary

The problem of timeline summary construction has a number of differences from the standard summarization problem. For example, the temporal nature of events must be taken into account [9]. Also, to ensure completeness of the information provided, it is required to find documents from all sub-events of the topic under consideration.

When constructing timeline summary, data processing is mainly carried out over huge collections. In such collections, most of the information is not relevant to the user’s request. This problem can be solved by using clustering methods [10, 14]. But the clustering methods have some issues. First, such a task should be solved many times over huge collections of documents, which affects the response time of the system. Secondly, the degree of closeness can be significantly smaller with standard measures of similarity for documents that describe far-in-time but related events. And, of course, it is required to identify the most characteristic objects [1, 9], for example, taking into account the structural features of the flow of documents [5, 8].

3 Statement of the Problem

3.1 General Description

The problem of constructing a timeline summary is a query-oriented. In the most general case, the user has a news document as a query. So further this problem will be considered as a problem of automatic creation of a summary on a query in the form of a text document. The output of the system is an annotation of n sentences. The connectivity between the sentences in this paper is not required. Figure 1 provides an example of a possible summary about the conflict on cemetery taken from the Interfax website.^{Footnote 3}

The aim of the work is to study the influence of various factors on the quality of the annotation.

3.2 Mathematical Statement of the Problem

The problem described above can be formalized in the following way. Let $ Q = \left\{ {q_{1} , q_{2} , \ldots , q_{m} } \right\} $ be a set of queries and an associated set of reference annotations $ D_{g} = \left\{ {D_{g}^{{q_{1} }} , D_{g}^{{q_{2} }} , \ldots , D_{g}^{{q_{m} }} } \right\} $ be an associated set of reference annotations. The system generates a set of summary $ D_{A} = \left\{ {D_{A}^{{q_{1} }} , D_{A}^{{q_{2} }} , \ldots , D_{A}^{{q_{m} }} } \right\} $ in response to queries $ Q $ by algorithm $ A $. Then the problem is reduced to maximizing the following functional:

$$ \frac{{\mathop \sum \nolimits_{i = 1}^{i = \left| Q \right|} M\left( {D_{A}^{{q_{i} }} , D_{g}^{{q_{i} }} } \right)}}{\left| Q \right|} \to max $$

(1)

where $ M $ is the proximity function between the annotations. Optimization is carried out for all parameters of the algorithm.

4 Approach

4.1 Collection Processing

As mentioned earlier, the input collection contains 2 million news articles. It is not possible to work directly with such amount of information, therefore, it was decided to interact with the collection through a search engine. Search engine allows:

Get a list of documents by text request.
For a given document from collection, get the basic information: text, index, meta-information.

4.2 Studied Features

In this paper the following factors were investigated:

Query extension strategy.
Accounting for the temporal nature of news stories.
Accounting for the structure of a news article in the form of an inverted pyramid.

4.3 Query Extension Strategy

Information that can be obtained from a query document is basically not enough to effectively build this type of annotation. This fact is a consequence of the fact that most news articles are not a general description of the event, but a discussion of some particular incident or fact.

To avoid this problem, it is necessary to use the query extension techniques. The developed algorithm uses the idea of pseudo-relevance feedback, which is widely used in information retrieval problems [21]. For the query-document, the algorithm includes the following steps:

1.
The most significant K-terms are chosen on the basis of tf-idf weights forming thus the first-level query.
2.
Further on the basis of the first-level query documents are retrieved.
3.
The extracted cluster of documents is analyzed to find the most important terms forming thus the second-level query:
1. a.
  For each document, the most significant $ T $ terms are considered.
2. b.
  For each term, it is calculated how often it was in the top $ T $ terms on all cluster.
3. c.
  The list of terms is sorted by frequency, the best $ M $ terms are selected.
4.
Steps 2–3 are repeated (A double query extension process that forms a third-level query).
5.
Output of the algorithm is a vector of $ N $ terms representing to some extent the semantics of the input document.

Note that $ K, T, M, N $ are parameters of the algorithm and they must be configured. As an example of the work of the query extension module, consider the algorithm steps on a news article about the terrorist attack in Paris (Table 1).

Table 1. Query extension algorithm stages example.

Full size table

The table shows that a higher-level query has more significant terms for this event.

4.4 Temporal Nature of News Stories

Since any event depends on time, the content of publications and their number also depend on the time. As an example, Fig. 2 shows a graph of the time dependence of publications on the “Earthquake in Nepal” event.

To take into account this factor, for the set $ D $ of extracted documents the following procedure is undertaken:

1.
The entire timeline of the event is divided into days with labels $ T = \left\{ {t_{1} , t_{2} , \ldots , t_{n} } \right\} $.
2.
Each document receives a label from $ T $ based on the publication date $ D_{i}^{t} $.
3.
Documents published on days with a number of publications less then the $ NDoc_{tr} $ threshold (2) are discarded.

$$ NDoc_{tr} = 0.2*MEAN_{top\,3} \left( D \right) $$
(2)
4.
The output is a sorted list of collections $ C = \left\{ {C_{{t_{1} }} , C_{{t_{2} }} , \ldots , C_{{t_{n} }} } \right\} $, where each collection $ C_{{t_{i} }} $ contains only documents with the label $ t_{\text{i}} $.

4.5 Inverted Pyramid

The strategy of writing a high-quality news article often relies on the structure of the “inverted pyramid” (Fig. 3). The greatest interest is the upper and lower parts of the pyramid:

The upper part contains the most concentrated information about the event under discussion.
The lower part may contain references to important related events in the past.

This structure is taken into account in two ways:

1.
Inter-document feature based on the graph approach.
2.
Intra-document feature, which increases the weight of sentences located in the upper and lower parts of the inverted pyramid.

Inter-Document Feature.

This feature is taken into account in the following way:

1.
For a set of documents $ D $, a similarity matrix between the upper and lower parts of the documents is constructed. If the specified similarity threshold is exceeded, it is considered that there is a link between the documents $ D_{i} $ and $ D_{j} $.
2.
The importance of documents is calculated by using the LexRank algorithm over the constructed graph [4].
3.
For documents whose weight is greater than a certain threshold, the previously described procedure for expanding the query is performed.

As a result, the output is a ranked list of documents $ D $ and a set of $ Q_{D} $ new queries, which further, together with accounting for the temporal nature of the news story, will help in sentence ranking algorithm. Among other things, document weights will also be taken into account in the ranking functions.

Intra-Document Feature.

To this feature into account for the following procedure is undertaken: during the ranking of sentences, the weight of the sentence is multiplied by a coefficient that lowers the weight of sentences in the middle of the document.

Also, after described inter-document procedure, all constructed extended queries $ Q_{D} $ are mapped to $ t_{\text{i}} $ labels from $ T $ (Fig. 4).

4.6 Similarity of Sentences

At various stages of the algorithm, there are a number of points where the measure of closeness between sentences is calculated. For these purpose a cosine measure of similarity (3) is used in all cases.

$$ Sim_{cos} \left( {S_{i} , S_{j} } \right) = \frac{{\left( {S_{i} , S_{j} } \right)}}{{\left| {S_{i} \left| * \right|S_{j} } \right|}} $$

(3)

The choice of representation of a sentence plays an important role for calculating similarity. In this article we used the standard tf-idf representation. But to calculate the similarity between sentences when searching for links between documents, word2vec [24] representation was used. To achieve this, the resulting sentence vector is represented as a weighted mean of word2vec word representations. Weighing was carried out by tf-idf.

Word2vec model was trained on the entire collection of 2 million news articles. During preprocessing removal of stop words and lemmatization were applied. The width of the window was chosen to be 5, and the length of the vector was 100.

4.7 Sentence Ranking Module

This module deals with the ranking of sentences. The ranking is a modified version of the $ MMR $ algorithm – $ MMRT $ (4) taking into account all the factors described in Sect. 4.2:

$$ MMRT_{{s_{i}^{t} }} = INC_{{s_{i}^{t} }} - DEC_{{s_{i}^{t} }} $$

(4)

where $ INC_{{s_{i}^{t} }} $ – is a term describing the positive part of the formula, which depends on the similarity of the sentence to the query, the weight of the document from which the sentence is taken, and the sentence number in the document.

$$ INC_{{s_{i}^{t} }} = \left( {1 + \alpha *I_{i} } \right)* \gamma * \lambda *Sim\left( {Q^{t} , S_{i}^{t} } \right) $$

(5)

$$ \gamma = 1 - 0.5*{ \sin }\left( {\frac{i* \pi }{{\left| {D_{s} } \right|}}} \right) $$

(6)

The parameters $ \alpha $ and $ \lambda $ are configurable parameters of the algorithm, $ I_{i} $ – is document weight $ D_{s} $, which includes a sentence under the index $ i $, $ S_{i}^{t} $ – is estimated sentence under the index $ i $ with label $ t $, $ Q^{t} $ – query for this time label, $ \gamma $ – multiplier, which reduce the weight of sentences from the middle of the document.

$ DEC_{{s_{i}^{t} }} $ is the penalty term. It depends on the similarity to the already extracted sentences:

$$ DEC_{{s_{i}^{t} }} = \left( {1 - \lambda } \right)*\mathop {\hbox{max} }\nolimits_{{S_{j} \in S}} Sim(S_{j} , S_{i}^{t} ) $$

(7)

where $ S_{j} $ is one of the extracted sentences, $ S $ is the set of all already extracted sentences.

Processing of sentences occurs in chronological order with a restriction on the maximum number of sentences per day.

4.8 System Diagram

The features described in Subsect. 4.2 are realized at various stages of the system. The general scheme of the algorithm is shown on Fig. 5.

5 Evaluation

5.1 Metrics for Evaluation

The system was evaluated using several metrics: ROUGE-1, ROUGE-2, and Sentence Recall $ R^{sent} $:

$$ ROUGE - N = \frac{{\left| {N_{A} \cap N_{g} } \right|}}{{\left| {N_{g} } \right|}} $$

(8)

where $ N_{A} $ is the set of n-grams for the constructed annotations, $ N_{g} $ is the set of n-grams for the reference (gold) annotations.

$$ R^{sent} = \frac{{\left| {S_{A} \equiv S_{g} } \right|}}{{\left| {S_{g} } \right|}}, $$

(9)

where $ S_{A} $ is the set of sentences from the constructed annotations, $ S_{g} $ is the set of sentences from the reference annotations. Operator ≡ denotes the following: the result of the $ \left| {S_{A} \equiv S_{g} } \right| $ is a subset of $ S_{A} $ such that their semantic equivalent is present in $ S_{g} $.

5.2 Data Preparation

Since a test set of annotations is required for evaluating procedure, in the course of the research, timeline summaries were manually prepared. The procedure for the formation of such a collection was as follows:

1.
At the first stage with the help of Wikipedia there high-profile events were selected, which were actively covered in the press for the beginning of 2015.
2.
Further, for most of the events on the site “Interfax”, the search for the corresponding story was carried out. On the basis of documents corresponding to the story, a timeline summary was created.
3.
If there is no corresponding story on the “Interfax”, the materials were studied on the topic and a timeline summary was created on the basis of the documents read.

As a result, 45 annotations on 15 news stories were created (Table 2).

Table 2. News stories on which the reference annotations are made.

Full size table

5.3 Optimization of Algorithm Parameters

Since the system contains a large number of parameters (total 23 parameters), some of which are presented in Table 3, there was a need to optimize the choice of the values of these parameters.

Table 3. Some system parameters.

Full size table

To achieve this, the entire collection of the reference annotations was divided into train and test parts with the ratio 2 to 1. Further, the functional (1) was implemented in Python using an open hyperopt [22] package based on machine learning. This package uses the technique of Sequential model-based optimization (SMBO) [23] for the parameters selection. The parameters were trained on the training part. After that, the final evaluation of the configurations took place on the test part.

6 Results

In order to evaluate the contribution of the considered features, a fitting and evaluation of the following 6 configurations was made:

1.
baseline – a simple approach to summarization, without taking into account the factors considered, using MMR as ranking algorithm.
2.
querry-ex – adding a query extension strategy feature to baseline (Sect. 4.3), but without double query extension.
3.
double-ex – querry-ex + double query extension (Sect. 4.3).
4.
temporal – double-ex + accounting for the temporal nature of news stories (Sect. 4.4).
5.
importance – temporal + accounting for the structure of a news article in the form of an inverted pyramid, when tf-idf representation is used (Sect. 4.5).
6.
w2v-imp – importance, but using w2v for computing sentence similarity when accounting for the structure of a news article (Sect. 4.6).

The result of evaluation of the configurations can be seen in Table 4. This table shows that each of the features considered gives a positive contribution to the quality of generation of timeline summary. As an example of the final annotation, one can consider a fragment of the annotation on the previously mentioned incident of the crash in Taiwan in Table 5.

Table 4. Evaluation results.

Full size table

Table 5. The generated timeline summary fragment about the plane crash in Taiwan.

Full size table

7 Conclusions and Future Work

In this article we presented an approach for building a timeline summary. The conducted research shows that the problem of constructing the timeline summary differs from the standard MDS problem. The effectiveness of using the following features was shown:

Query extension strategy.
Accounting for the temporal nature of news stories.
Accounting for the structure of a news article in the form of an inverted pyramid.

Extending the query, as expected, has a positive effect on the event representation discussed in the document. But the interesting fact is that re-extension the query (double query extension) has a much greater effect. This is because the documents that are retrieved on the first-level query are not sufficient for a good presentation of the event.

The fact that accounting for the temporal nature of news stories improves the quality of the annotation is an obvious consequence of the fact that news stories and events have temporal characteristics.

Taking into account the structure of the inverted pyramid gives an improvement. Increase the values of metrics on the w2v-imp configuration means that the correctness of the recognized links between the documents plays a significant role. This fact raises challenges for future research.

Using structural features of news articles make it possible to obtain information, the use of which can significantly improve the quality of generated annotations.

Notes

References

Binh, T.G., Alrifai, M., Quoc Nguyen, D.: Predicting relevant news events for timeline summaries. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 91–92. ACM (2013)
Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998)
Google Scholar
Dang, H.T.: Overview of DUC 2006. In: Proceedings of the Document Understanding Workshop, Presented at HLT-NAACL 2006 (2006). http://duc.nist.gov/pubs/2006papers/duc2006.pdf
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Google Scholar
Hu, P., Huang, M.L., Zhu, X.Y.: Exploring the interactions of storylines from informative news events. J. Comput. Sci. Technol. 29(3), 502–518 (2014)
Article Google Scholar
Radev, D., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple docuemtns: sentence extraction, utility-based evaluation, and user studies. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, Seattle, pp. 21–30 (2000)
Google Scholar
Radev, D., McKeown, K., Hovy, E.: Introduction to the special issue on summarization. Comput. Linguist. 28(4), 399–408 (2002)
Article Google Scholar
Shahaf, D., Guestrin, C.: Connecting two (or less) dots: discovering structure in news articles. ACM Trans. Knowl. Discov. Data (TKDD) 5(4), 24–54 (2012)
Google Scholar
Tran, G., Alrifai, M., Herder, E.: Timeline summarization from relevant headlines. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 245–256. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_26
Chapter Google Scholar
Yan, R., et al.: Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 24–28 July 2011, pp. 745–754. ACM (2011). https://doi.org/10.1145/2009916.2010016
Wu, Z., Lei, L., Li, G., Huang, H., Zheng, C., Chen, E., Xu, G.: A topic modeling based approach to novel document automatic summarization. Expert Syst. Appl. 84, 12–23 (2017)
Article Google Scholar
Hennig, L., Umbrath, W., Wetzker, R.: An ontology-based approach to text summarization. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 291–294 (2008)
Google Scholar
Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)
Wei, T., Lu, Y., Chang, H., Zhou, Q., Bao, X.: A semantic approach for text clustering using WordNet and lexical chains. Expert Syst. Appl. 42(4), 2264–2275 (2015)
Article Google Scholar
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015)
Hertzfeld, A.: Introducing Google News Timeline. https://news.googleblog.com/2009/04/introducing-google-news-timeline.html. Accessed 10 Jan 2018
Christensen, J., Mausam, S.S., Soderland, S., Etzioni, O.: Towards Coherent Multi-Document Summarization. In: HLT-NAACL, pp. 1163–1173 (2013)
Google Scholar
Nishikawa, H., Arita, K., Tanaka, K., Hirao, T., Makino, T., Matsuo, Y.: Learning to generate coherent summary with discriminative hidden semi-markov model. In: COLING, pp. 1648–1659 (2014)
Google Scholar
Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Advances in Automatic Text Summarization, pp. 111–121 (1999)
Google Scholar
Jiang, L., Mitamura, T., Yu, S.I., Hauptmann, A.G.: Zero-example event search using multimodal pseudo relevance feedback. In: Proceedings of International Conference on Multimedia Retrieval, p. 297 (2014)
Google Scholar
Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., Cox, D.D.: Hyperopt: a python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 8(1), 014008 (2015)
Article Google Scholar
Hutter, F., Hoos, Holger H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25566-3_40
Chapter Google Scholar
Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Tikhomirov, M.M., Dobrov, B.V.: Using news corpora for temporal summary formation (in Russian). In: Selected Papers of the XIX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2017), CEUR Workshop Proceedings, Moscow, Russia, vol. 2022, pp. 165–171 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Lomonosov Moscow State University, Moscow, Russia
Mikhail Tikhomirov & Boris Dobrov

Authors

Mikhail Tikhomirov
View author publications
You can also search for this author in PubMed Google Scholar
Boris Dobrov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mikhail Tikhomirov .

Editor information

Editors and Affiliations

Federal Research Center “Computer Science and Control”, Russian Academy of Sciences, Moscow, Russia
Leonid Kalinichenko
Open University of Cyprus, Latsia, Cyprus
Yannis Manolopoulos
Institute of Astronomy, Russian Academy of Sciences, Moscow, Russia
Oleg Malkov
Federal Research Center “Computer Science and Control”, Russian Academy of Sciences, Moscow, Russia
Nikolay Skvortsov
Federal Research Center “Computer Science and Control”, Russian Academy of Sciences, Moscow, Russia
Sergey Stupnikov
Moscow State University, Moscow, Russia
Vladimir Sukhomlin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tikhomirov, M., Dobrov, B. (2018). News Timeline Generation: Accounting for Structural Aspects and Temporal Nature of News Stream. In: Kalinichenko, L., Manolopoulos, Y., Malkov, O., Skvortsov, N., Stupnikov, S., Sukhomlin, V. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2017. Communications in Computer and Information Science, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-319-96553-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-96553-6_19
Published: 13 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96552-9
Online ISBN: 978-3-319-96553-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

News Timeline Generation: Accounting for Structural Aspects and Temporal Nature of News Stream

Abstract

Similar content being viewed by others

Timeline Summarization from Relevant Headlines

Time-Matters: Temporal Unfolding of Texts

Automatic Generation of Timelines for Past-Web Events

Keywords

1 Introduction

2 Related Work

2.1 Automatic Text Summarization Problem

2.2 Timeline Summary

3 Statement of the Problem

3.1 General Description

3.2 Mathematical Statement of the Problem

4 Approach

4.1 Collection Processing

4.2 Studied Features

4.3 Query Extension Strategy

4.4 Temporal Nature of News Stories

4.5 Inverted Pyramid

Inter-Document Feature.

Intra-Document Feature.

4.6 Similarity of Sentences

4.7 Sentence Ranking Module

4.8 System Diagram

5 Evaluation

5.1 Metrics for Evaluation

5.2 Data Preparation

5.3 Optimization of Algorithm Parameters

6 Results

7 Conclusions and Future Work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation