Skip to main content

Automatic Extraction of Headlines from Punjabi Newspapers

  • Conference paper
  • 1291 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8321))

Abstract

For any language in the world, headlines of newspapers are always important and by reading headlines we can have idea of whole news without completely reading the news articles. Moreover there are many websites whose task is to extract the news headlines from online newspapers and display those headlines on their websites for information to their users. One other important application of headlines extraction is in text summarization where headline-sentences are given more importance than other sentences for including in final summary. This paper concentrates on automatic headlines extraction from Punjabi newspapers. Punjabi is the official language for state of Punjab. But Punjabi is under resource language. There are very less number of computational-linguistic resources available for Punjabi. But a lot of research is going on for developing NLP applications in Punjabi language. It is first time that automatic headlines extraction from Punjabi newspapers has been developed with four features of headlines: 1) Punctuation mark feature 2) Font feature 3) Number of words feature and 4) Title keywords feature. Weights of these four features are calculated by applying mathematical regression as machine learning approach. For extracting headlines, final scores of sentences are obtained using feature weight equation as: w 1 f 1 + w 2 f 2 + w 3 f 3 + w 4 f 4 where f 1, f 2, f 3 and f 4 are feature-scores of four features and w 1, w 2, w 3 and w 4 are learned weights of these features. The accuracy of Punjabi headline extraction system is 98.39% which is tested over fifty Punjabi single/multi news documents. A part of Punjabi headlines extraction system with Punctuation mark feature has been integrated with Punjabi Text Summarization system which is available online.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McKeown, K., Barzilay, R., Chen, J., Elson, D., Evans, D., Klavans, J., Nenkova, A., Schiffman, B., Igelman, S.: Columbia’s NewsBlaster: New Features and Future Directions. In: Proceedings of NAACL-HLT 2003 (2003)

    Google Scholar 

  2. Berry, M.W.: Survey of Text Mining: Clustering, Classification and Retrieval. Springer Verlag, LLC, New York (2004)

    Google Scholar 

  3. Kyoomarsi, F., Khosravi, H., Eslami, E., Dehkordy, P.K.: Optimizing Text Summarization Based on Fuzzy Logic. In: Proceedings of Seventh IEEE/ACIS International Conference on Computer and Information Science, pp. 347–352. IEEE, University of Shahid Bahonar Kerman, UK (2008)

    Google Scholar 

  4. Punjabi Ajit News Corpus

    Google Scholar 

  5. Neto, J.L., Santos, A.D., Kaestner, C.A.A., Alexandre, N., Santos, D., Celso, A.A., Alex, K., Freitas, A.A., Parana, C.: Document Clustering and Text Summarization. In: Proceedings of 4th International Conference on Practical Applications of Knowledge Discovery and Data Mining, London, pp. 41–55 (2000)

    Google Scholar 

  6. Gupta, V., Lehal, G.S.: Automatic Punjabi Text Extractive Summarization System. In: Proceedings of COLING, pp. 191–198 (2012)

    Google Scholar 

  7. Gupta, V., Lehal, G.S.: Feature Selection and Weight Learning for Punjabi Text Summarization. Proceedings of International Journal of Engineering Trends and Technology, 45–48 (2011)

    Google Scholar 

  8. Fattah, M.A., Ren, F.: Automatic Text Summarization. Proceedings of Journal of World Academy of Science, Engineering and Technology, 192–195 (2008)

    Google Scholar 

  9. Gupta, V., Lehal, G.S.: Automatic Text Summarization System for Punjabi Language. International Journal of Emerging Technologies in Web Intelligence 5, 257–271 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Gupta, V. (2014). Automatic Extraction of Headlines from Punjabi Newspapers. In: Gupta, P., Zaroliagis, C. (eds) Applied Algorithms. ICAA 2014. Lecture Notes in Computer Science, vol 8321. Springer, Cham. https://doi.org/10.1007/978-3-319-04126-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04126-1_20

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04125-4

  • Online ISBN: 978-3-319-04126-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics