Bellegarda, J. R. (1997). A latent semantic analysis framework for large–span language modeling. In Proceedings of the 5th European Conference on Speech Communication and Technology (pp. 1451&1454). Vol. 3. Rhodes, Greece.
Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Advances in Neural Information Processing Systems, 13, 933–938.
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neuralprobabilistic language model. Journal of Machine Learning Reseach, 3, 1137–1155.
Article
Google Scholar
Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropyapproach to natural language processing. Computational Linguistics, 22:1, 39–72.
Google Scholar
Bridle, J. S. (1989). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical patternrecognition. In F. Fougelman-Soulie and J. Herault (Eds.), Neuro-computing: Algorithms, architectures and applicatations (pp. 227&236).
Byrne, W., Gunawardana, A., & Khudanpur, S. (1998). Information geometry and EMvariants. Technical Report CLSP Research Note (17). Department of Electrical andComputer Engineering, The Johns Hopkins University, Baltimore, MD.
Charniak, E. (2001). Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting and 10th Conference of the European Chapter of ACL (pp. 116–123). Toulouse, France.
Chelba, C. (1997). A structured language model. In ACL-EACL, Student Section (pp. 498&500). Madrid, Spain.
Chelba, C., & Jelinek, F. (2000). Structured language modeling. Computer Speech and Language, 14:4, 283–332.
Article
Google Scholar
Chelba, C., & Xu, P. (2001). Richer syntactic dependencies for structuredlanguage modeling. In Proceedings of the Automatic Speech Recognition and Understanding Workshop. Madonna di Campiglio, Trento-Italy.
Chen, S. F. & Goodman, J. (1999). An empirical study of smoothing techniquesfor language modeling. Computer Speech and Language, 13, 359–394.
Article
Google Scholar
Collins, M. (1996). A new statistical parser based on bigram lexicaldependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (pp. 184&191). Santa Cruz, CA.
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41:6, 391–407.
Article
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood fromincomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39, 1–38.
Google Scholar
Elman, J. L. (1991). Distributed representations, simple recurrent networks,and grammatical structure. Machine Learning, 7, 195–225.
Google Scholar
Emami, A. (2003). Improving a connectionist based syntactical language model. In Proceedings of the 8th European Conference on Speech Communication and Technology (pp. 413–416), Vol. 1. Geneva, Switzerland.
Emami, A., & Jelinek, F. (2004). Exact training of a neural syntactic languagemodel. In Proceedings of the IEEE International Conference onAcoustics, Speech and Signal Processing. Montreal,Quebec.
Emami, A., Xu, P., & Jelinek, F. (2003). Using a connectionist model in asyntactical based language model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 372–375). Vol. I. Hong Kong.
Fodor, J. A. & Pylyshyn, Z.W. (1988). Connectionism and cognitive structure: A critical analysis. Cognition, 28, 3–71.
Article
PubMed
Google Scholar
Goodman, J. (2001). A bit of progress in language modeling. Technical Report MSR-TR-2001-72, Microsoft Research, Redmond, WA.
Gropp,W., Lusk, E., & Skjellum, A. (1999). Using MPI: Portable parallelProgramming with themessage-passing interface. Cambridge: MA: MIT Press.
Google Scholar
Henderson, J. (2000). A neural network parser that handles sparse data. In Proceedings of 6th International Workshop on Parsing Technologies (pp. 123–134). Trento, Italy.
Henderson, J. (2003). Inducing history representations for broad coveragestatistical parsing. In Proceedings of the North American Chapter of Association Computational Linguistics and Human Language Technology Conference HLT-NAACL.
Hinton, G. E. (1986). Learning distributed representations of concepts. In R. G. M. Morris (Ed.), Parallel distributed processing:Implications for psychology and Neurobiology (pp. 46–61). Oxford, UK: Oxford University Press.
Google Scholar
Ho, E. & Chan, L. (1999). How to design a connectionist holistic parser. Neural Computation, 11:8, 1995–2016.
Article
PubMed
Google Scholar
Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge, MA and London: MIT Press.
Google Scholar
Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov sourceparameters from sparse data. In Proceedings of Workshop on Pattern Recognition in Practice (pp. 381–397). Amsterdam, The Netherlands: North Holland Publishing Co.
Kim, W., Khudanpur, S., & Wu, J. (2001). Smoothing issues in the structuredlanguage model. In Proceedings of the 7th European Conference on Speech Communication and Technology (pp. 717–720). Alborg, Denmark.
Kneser, R., & Ney, H. (1995). Improved backing-off for m-gram languagemodeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 181&184), Vol. I.
Lawrence, S., Giles, C. L., & Fong, S. (1996). Can recurrent neural networkslearn natural language grammars?. In Proceedings of the IEEE International Conference on Neural Networks (pp. 1853&1858). Piscataway, NJ: IEEE Press.
Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979). Basiclinear algebra subprograms for fortran usage. ACM Transactions on Mathematical Software, 5:3, 308–323.
Article
Google Scholar
LeCun, Y. (1985). A learning scheme for asymmetric threshold networks. In Proceedings of Cognitiva 85 (pp. 599–604). Paris, France.
Miikkulainen, R. & Dyer, M. G. (1991). Natural language processing withmodular neural networks and distributed lexicon. Cognitive Science, 15, 343–399.
Article
Google Scholar
Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilisticdependencies in stochastic language modeling.. Computer Speech and Language, 8, 1–38.
Article
Google Scholar
Paul, D. B., & Baker, J. M. (1992). The design for the wall street journal-based CSR corpus. In Proceedings of the DARPA SLS Workshop.
Ratnaparkhi, A. (1997). A linear observed time statistical parser based onmaximum entropy models. In Second Conference on Empirical Methods in Natural Language Processing (pp. 1–10). Providence, RI.
Roark, B. (2001). Robust probabilistic predictive syntactic processing: Motivations, models and applications. Ph.D. thesis, Brown University, Providence, RI.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Leaning internalrepresentations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Paralleldistributed processing, I. Cambridge, MA: MIT Press.
Google Scholar
Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for largevocabulary continuous speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, (pp. 765–768). Vol. II. Orlando, FL.
Van Uystel, D. H., Van Compernolle, D., & Wambacq, P. (2001). Maximum-likelihood training of the PLCG-based language model. In Proceedings of the Automatic Speech Recognition andUnderstanding Workshop. Madonna di Campiglio, Trento-Italy.
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysisin the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA.
Xu, P., Chelba, C., & Jelinek, F. (2002). A study on richer syntacticdependencies for structured language modeling. In Proceedings of the 40th Annual Meeting of the Associationfor Computational Linguistics. Philadelphia, PA.
Xu, P., Emami, A., & Jelinek, F. (2003). Training connectionist models for thestructured language model. In M. Collins, & M. Steedman (Eds.), Proceedings of the 2003conference on empirical methods in natural language processing. Sapporo, Japan: (pp. 160–167). Association for Computational Linguistics.
Xu, W., & Rudnicky, A. (2000). Can artificial neural networks learn languagemodels? In Proceedings of 6th International Conference on Spoken Language Processing. Beijing, China.