Skip to main content

A Straightforward Author Profiling Approach in MapReduce

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8864)

Abstract

Most natural language processing tasks deal with large amounts of data, which takes a lot of time to process. For better results, a larger dataset and a good set of features are very helpful. But larger volumes of text and high dimensionality of features will mean slower performance. Thus, natural language processing and distributed computing are a good match. In the PAN 2013 competition, the test runtimes for author profiling range from several minutes to several days. Most author profiling systems available now are either inaccurate or slow or both. Our system, written entirely in MapReduce, employs nearly 3 million features and still manages to finish the task in a fraction of time than state-of-the-art systems and with better accuracy. Our system demonstrates that when we deal with a huge amount of data and/or a large number of features, using distributed systems makes perfect sense.

Keywords

  • Natural Language Processing
  • Statistical Machine Translation
  • Runtime Performance
  • Hadoop Distribute File System
  • Early Bird

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-12027-0_8
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-12027-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN: In: Notebook Papers of CLEF 2013 LABs and Workshops, CLEF-2013, Valencia, Spain, pp. 23–26 (September 2013)

    Google Scholar 

  2. Estival, D., Gaustad, T., Pham, S.B., Radford, W., Hutchinson, B.: Author profiling for english emails. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, pp. 263–272 (2007)

    Google Scholar 

  3. Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E.P., Ungar, L.H.: Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE 8, e73791 (2013)

    CrossRef  Google Scholar 

  4. Meina, M., Brodzinska, K., Celmer, B., Czoków, M., Patera, M., Pezacki, J., Wilk, M.: Ensemble-based classification for author profiling using various features. In: Notebook Papers of CLEF 2013 LABs and Workshops, CLEF-2013, Valencia, Spain (September 2013)

    Google Scholar 

  5. Santosh, K., Bansal, R., Shekhar, M., Varma, V.: Author profiling: Predicting age and gender from blogs. In: Notebook Papers of CLEF 2013 LABs and Workshops, CLEF-2013, Valencia, Spain (September 2013)

    Google Scholar 

  6. López-Monroy, A.P., Montes-y Gómez, M., Escalante, H.J., Villaseñor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN’13 : Author profiling task. In: Notebook Papers of CLEF 2013 LABs and Workshops, CLEF-2013, Valencia, Spain (September 2013)

    Google Scholar 

  7. Eidelman, V., Wu, K., Ture, F., Resnik, P., Lin, J.: Mr. MIRA: Open-source large-margin structured learning on MapReduce. ACL System Demonstrations (2013)

    Google Scholar 

  8. Owen, S., Anil, R., Dunning, T., Friedman, E.: Mahout in action. Manning (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suraj Maharjan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Maharjan, S., Shrestha, P., Solorio, T., Hasan, R. (2014). A Straightforward Author Profiling Approach in MapReduce. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12027-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12026-3

  • Online ISBN: 978-3-319-12027-0

  • eBook Packages: Computer ScienceComputer Science (R0)