Automatic Extraction of Headlines from Punjabi Newspapers

Gupta, Vishal

doi:10.1007/978-3-319-04126-1_20

Automatic Extraction of Headlines from Punjabi Newspapers

Vishal Gupta¹⁸

Conference paper

1291 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8321))

Abstract

For any language in the world, headlines of newspapers are always important and by reading headlines we can have idea of whole news without completely reading the news articles. Moreover there are many websites whose task is to extract the news headlines from online newspapers and display those headlines on their websites for information to their users. One other important application of headlines extraction is in text summarization where headline-sentences are given more importance than other sentences for including in final summary. This paper concentrates on automatic headlines extraction from Punjabi newspapers. Punjabi is the official language for state of Punjab. But Punjabi is under resource language. There are very less number of computational-linguistic resources available for Punjabi. But a lot of research is going on for developing NLP applications in Punjabi language. It is first time that automatic headlines extraction from Punjabi newspapers has been developed with four features of headlines: 1) Punctuation mark feature 2) Font feature 3) Number of words feature and 4) Title keywords feature. Weights of these four features are calculated by applying mathematical regression as machine learning approach. For extracting headlines, final scores of sentences are obtained using feature weight equation as: w ₁ f ₁ + w ₂ f ₂ + w ₃ f ₃ + w ₄ f ₄ where f ₁, f ₂, f ₃ and f ₄ are feature-scores of four features and w ₁, w ₂, w ₃ and w ₄ are learned weights of these features. The accuracy of Punjabi headline extraction system is 98.39% which is tested over fifty Punjabi single/multi news documents. A part of Punjabi headlines extraction system with Punctuation mark feature has been integrated with Punjabi Text Summarization system which is available online.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

McKeown, K., Barzilay, R., Chen, J., Elson, D., Evans, D., Klavans, J., Nenkova, A., Schiffman, B., Igelman, S.: Columbia’s NewsBlaster: New Features and Future Directions. In: Proceedings of NAACL-HLT 2003 (2003)
Google Scholar
Berry, M.W.: Survey of Text Mining: Clustering, Classification and Retrieval. Springer Verlag, LLC, New York (2004)
Google Scholar
Kyoomarsi, F., Khosravi, H., Eslami, E., Dehkordy, P.K.: Optimizing Text Summarization Based on Fuzzy Logic. In: Proceedings of Seventh IEEE/ACIS International Conference on Computer and Information Science, pp. 347–352. IEEE, University of Shahid Bahonar Kerman, UK (2008)
Google Scholar
Punjabi Ajit News Corpus
Google Scholar
Neto, J.L., Santos, A.D., Kaestner, C.A.A., Alexandre, N., Santos, D., Celso, A.A., Alex, K., Freitas, A.A., Parana, C.: Document Clustering and Text Summarization. In: Proceedings of 4th International Conference on Practical Applications of Knowledge Discovery and Data Mining, London, pp. 41–55 (2000)
Google Scholar
Gupta, V., Lehal, G.S.: Automatic Punjabi Text Extractive Summarization System. In: Proceedings of COLING, pp. 191–198 (2012)
Google Scholar
Gupta, V., Lehal, G.S.: Feature Selection and Weight Learning for Punjabi Text Summarization. Proceedings of International Journal of Engineering Trends and Technology, 45–48 (2011)
Google Scholar
Fattah, M.A., Ren, F.: Automatic Text Summarization. Proceedings of Journal of World Academy of Science, Engineering and Technology, 192–195 (2008)
Google Scholar
Gupta, V., Lehal, G.S.: Automatic Text Summarization System for Punjabi Language. International Journal of Emerging Technologies in Web Intelligence 5, 257–271 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science & Engineering, University Institute of Engineering & Technology, Panjab University Chandigarh, India
Vishal Gupta

Authors

Vishal Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science and Engineering, Heritage Institute of Technology, Chowbaga Road, Anandapur, 700107, Kolkata, India
Prosenjit Gupta
Department of Computer Engineering and Informatics, University of Patras, 26500, Patras, Greece
Christos Zaroliagis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, V. (2014). Automatic Extraction of Headlines from Punjabi Newspapers. In: Gupta, P., Zaroliagis, C. (eds) Applied Algorithms. ICAA 2014. Lecture Notes in Computer Science, vol 8321. Springer, Cham. https://doi.org/10.1007/978-3-319-04126-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-04126-1_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04125-4
Online ISBN: 978-3-319-04126-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics