Abstract
This chapter presents the main data analysis of the book. First, the hypotheses are presented. Second, the framework for treating text as data is discussed, and textual data are described. Third, the data analysis itself is performed. The analysis begins exploring the extent to which the problems raised during the on-call service are indeed different from the other problematic situations judges deal with in their work. After that, we test whether the problems faced when on call are not related to theoretical doubts but mainly practical doubts, i.e., they are identifiable demands from the outer environment whose solution is not specifically contained in the legal knowledge they acquire either in the law degree or preparing the entrance examination.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Unless the information we want to extract be directly related to any form of language inflection.
- 2.
- 3.
Of course, different stemming algorithms yield different results depending on the set of rules upon which they operate (e.g., Porter 1980). See also http://snowball.tartarus.org/texts/introduction.html for an excellent introduction.
- 4.
I would like to give special thanks here to Toni Martí and Montse Nofre from CLiC—Centre de Llenguatge i Computació at the University of Barcelona—for granting my free access to the analyzer during the completion of this work. There are other open source tools available that include lemmatization among other features, such as FreeLing, available at http://nlp.lsi.upc.edu/freeling/.
- 5.
Numbers and punctuation have also been removed from the documents, as is common practice. Also, a small of other terms was also removed from the corpus for other reasons. First, a full commitment regarding full respect for respondent’s identity was specifically signed in an agreement between the General Council of the Judiciary and the research group under which this project was carried out (Institute of Law and Technology) before interviews were carried out. Besides, each interviewed judge was informed that this agreement existed before the interview was recorded. When requested, a copy of the agreement was also handed. Due to this, all information that might be interpreted as revealing either personal names or toponyms that could lead to the identification of a judge has been removed from the data. And also due to this agreement, full records of the interviews are not openly delivered in an on-line appendix to this book. Finally, the term guardia [on-call] was also removed from the corpus so as not to produce a clear bias the automatic classification of judges’ responses when applying cluster analysis or any other classification method.
- 6.
The analysis is carried out using the hclust function from the R basic stats package (R Core Team 2013).
- 7.
Quinn and Keough (2002) point out that using the Bray–Curtis dissimilarity measure in scaling methods such as classical Multidimensional Scaling may pose a number of problems because “principal coordinates represent only part of the variation in the original dissimilarities”. The problem is minimized, though, when we use such techniques as a data reduction strategy and therefore only the few first components are used.
- 8.
The classic Multidimensional scaling (MDS) model is fit using the cmdscale function (with k = 2) of the stats R package(R Core Team 2013).
- 9.
However, due to potential collinearity problems, the R 2 coefficient of Model 3 should be taken with care. Additionally, ANOVA tests on all models not reported here yield significant differences in scale positions.
- 10.
- 11.
The explanation of the graphic model draws on (Steyvers and Griffiths 2007), but some of the notation has been adapted.
- 12.
Estimation is made through maximum likelihood estimation (MLE), where the log-likelihood of the data is maximized with respect to the hyperparameters α and β. Different methods have been proposed to the computational problem of approximating the posterior distribution of the topic decomposition of a corpus, including mean field vaiational inference and Gibbs sampling. See Blei et al. (2003), Buntine and Jakulin (2006) and Steyvers and Griffiths (2007) for discussions.
- 13.
The perplexity of the models is computed with the topicmodels R package (Grün and Hornik 2011).
- 14.
We computed the Gini-Simpson index from the Simpson index, \(\lambda =\sum_{i=1}^{m}p^2\), where p is the proportion of each topic for the m topics represented in each document. The Gini-Simpson index is \(1-\lambda\).
- 15.
The effect is small, though. An OLS regression model of the Gini-Simpson diversity index of the documents on the log of their term size yields a (significant) coefficient of 0.05. This means that to an increase in 10 % in the length of the document produces on average an increase of 0.002 in that document’s diversity index.
- 16.
The visualization of correlated terms in a DocumentTerm matrix has been carried out using the \textsf{R} tm package (Feinerer 2008). This search for correlations is carried out in the vector space computing the cosine between vectors interpreted as the normalized correlation coefficient (Manning and Schütze 1999, p. 300)—with values between 0 and 1. That is, the cosine of the angle between vectors representing the documents, which is a similarity measure rather than a distance measure (Solka 2008). Due to the sparsity of the original matrix, we use here relatively low correlation thresholds (\(r \sim 0.25\)) in order to increase the density of the correlation networks. I would like to warmly thank Dr. Ingo Feinerer, the creator of the tm \textsf{R} package, at the Vienna University of Technology, for delivering a quick and detailed account and explanation of the code behind the tm library.
- 17.
Terms with no edges such as asunto and a˱o in Fig. 5.12 point to a lower coefficient of correlation with any other term in the plot.
- 18.
The term mostly used by judges and legal experts for these situations is internamiento urgente, which is a very common issue raised during the operation of the on-call service (Percellar-Giménez 2000).
- 19.
- 20.
Again, it is convenient that in Latent Dirichlet Allocation documents tend to be strongly associated with one single topic.
References
Ayuso, M., M. Bécue-Bertaut, R. Álvarez, O. Valencia, M. Álvarez, M. HernÁndez, and M. Santolino. 2002. Jueces jóvenes en España, 2002. Análisis estadístico de las encuestas a los jueces en su primer destino (promociones 48/49 y 50). Internal report (project sec-2001-2581-c02-01/02), Consejo General del Poder Judicial [General Council of the Judiciary].
Ayuso, M., M. Bécue-Bertaut, R.Álvarez, O.Valencia, M.Álvarez, and M. Santolino. 2003. AnÁlisis estadístico de las encuestas a los jueces en su primer destino. Internal Report SEC-2001-2581- C02-01/02, Consejo General del Poder Judicial [General Council of the Judiciary], Escuela Judicial (Barcelona).
Baeza-Yates, R., and B. Ribeiro-Neto. 1999. Modern information retrieval. New York: ACM.
Basu, S., and I. Davidson. 2009. Constrained partitional clustering of text data: An overview. In Text mining. Classification, clustering, and applications, ed. A. N. Srivastava and M. Sahami, 155–184. Boca Raton: Chapman and Hall.
Bécue-Bertaut, M., M. Rajman, L. Lebart, and E. Gaussier. 2005. Extraction of the useful words from a decisional corpus. Contribution of correspondence analysis. StudFuzz 185:159–179.
Benjamins, V. R., J. Contreras, P. Casanovas, M. Ayuso, M. Bécue-Bertaut, M. Lemus, and C. Urios. 2004. Ontologies of professional legal knowledge as the basis for intelligent it support for judges. Artificial Intelligence and Law 12 (4): 359–378.
Benoit, K., and M. Laver. 2007. Benchmarks for text analysis: A response to budge and pennings. Electoral Studies 26:130–135.
Benoit, K., M. Laver, and S. Mikhailov. 2009. Treating words as data with error: Uncertainty in text statements of policy positions. American Journal of Political Science 53 (2): 495–513.
Bénzecri, J.-P. 1982. L’analyse des données. Paris: Dunod.
Blasius, J., and M. Greenacre. 2006. Correspondence analysis and related methods in practice. In Multiple correspondence analysis and related methods, ed. M. Greenacre and J. Blasius, 3–40. Boca Raton: Chapman and Hall.
Blei, D. 2012. Probabilistic topic models. Communications of the ACM 55:77–84.
Blei, D., and J. D. Lafferty. 2009. Topic models. In Text mining. Classification, clustering, and applications, ed. A. N. Srivastava and M. Sahami, 71–94. Boca Raton: CRC Press.
Blei, D., A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3:993–1022.
Borg, I., and P. J. F. Groenen. 2005. Modern multidimensional scaling. Theory and applications. 2nd ed. New York: Springer.
Bray, J. R., and J. T. Curtis. 1957. An ordination of upland forest communities of southern wisconsin. Ecological Monographs 27:325–349.
Budge, I., and P. Pennings. 2007a. Do they work? Validating computerised word frequency estimates against policy series. Electoral Studies 26:121–129.
Budge, I., and P. Pennings. 2007b. Missing the message and shooting the messenger: Benoit and laver’s ‘response’. Electoral Studies 26:136–141.
Budge, I., H.-D. Klingemann, A. Volkens, and J. Bara. eds. 2001. Mapping policy preferences. Estimates for parties, electors, and governments 1945–1998. Oxford: Oxford University Press.
Buntine, W. 2009. Estimating likelihoods for topic models. NICTA Technical Report NICTA-SML-09-001, NICTA, Canberra (AU).
Buntine, W., and A. Jakulin. 2004. Applying discrete PCA in data analysis. In Proceedings of the 20th conference on Uncertainty in Artificial Intelligence (UAI 2004), ed. C. Meek and J. Halpern, Banff, Canada.
Buntine, W., and A. Jakulin. 2006. Discrete component analysis. In Subspace, latent structure and feature selection techniques, ed. C. J. Saunders, S. R. Gunn, M. Grobelnik, and J. Shawe-Taylor. Berlin: Springer-Verlag.
Carmona, J., S. Cervell, L. Márquez, M. A. Martí, L. Padró, R. Placer, H. Rodríguez, M. Taulé, and J. Turmo. 1998. An environment for morphosyntactic processing of unrestricted Spanish text. In Proceedings of the first conference on language resources and evaluation, 915–922, Granada, Spain.
Casellas, N. 2011. Legal ontology engineering: Methodologies, trends, and the ontology of professional judicial knowledge. Dordrecht: Springer.
Ciborra, C. 1999. Notes on improvisation and time in organizations. Accounting, Management and Information Technologies 9:77–94.
Cyert, R. M., E. A. Feigenbaum, and J. G. March. 1959. Models in a behavioral theory of the firm. Behavioral Science 4 (2): 81–95.
Deerwester, S., S. T. Dumais, G. W. Furnas, and T. K. L. R. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (6): 391–407.
Escofier, B., and J. Pagès. 1992. Análisis factoriales simples y múltiples: objetivos, métodos e interpretación. Euskal Herriko Unibertsitateko Argitarapen-Zerbitzua, Bilbao.
Falcó-Gimeno, A., and J.-J. Vallbé. 2013. Coalition agreements and party preferences: A principal components analysis approach. In 3rd annual general conference of the European Association of Political Science (EPSA), Barcelona.
Feinerer, I. 2008. tm: Text Mining Package, R package version 0.3-3 edition.
Feinerer, I., K. Hornik, and D. Meyer 2008. Text mining infrastructure in R. Journal of Statistical Software 25 (5): 1–54.
Fortuna, B., M. Grobelnik, and D. Mladenič. 2005. Visualization of text document corpus. Informatica 29:497–502.
Fortuna, B., C. Galleguillos, and N. Cristianini 2009. Detection of bias in media outlets with statistical learning methods. In Text mining. Classification, clustering, and applications, ed. A. N. Srivastava and M. Sahami, 27–50. Boca Raton: CRC Press.
Greenacre, M. 2005. Weighted metric multidimensional scaling. In New developments in classification and data analysis, ed. M. Vichi, P. Monari, S. Mignani, and A. Montanari, 141–150. Berlin: Springer.
Greenacre, M. 2006. From simple to multiple correspondence analysis. In Multiple correspondence analysis and related methods, ed. M. Greenacre and J. Blasius, 41–76. Boca Raton: Chapman and Hall.
Greenacre, M., and O. Nenadić. 2007. ca: Simple, multiple and joint correspondence analysis. http://www.carme-n.org/. R package version 0.21.
Grimmer, J., and B. M. Stewart. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21 (3): 298–313.
Grün, B., and K. Hornik. 2011. topicmodels: An R package for fitting topic models. Journal of Statistical Software 40 (13): 1–30.
Guérin-Pace, F. 1998. Textual statistics: An exploratory tool for the social sciences. Population: An English selection 10 (1): 73–95.
Hofmann, T. 1999. Probabilistic latent semantic indexing. Proceedings of the twenty-second annual international SIGIR conference on research and development in information retrieval.
Huang, A. 2008. Similarity measures for text document clustering. In New Zealand Computer Science Research Student Conference (NZCSRSC), ed. J. Holland, A. Nicholas, and D. Brignoli, Christchurch (NZ).
Jakulin, A. 2007. Analysis of legal and political data. In Trends in legal knowledge—the semantic web and the regulation of electronic social systems, ed. P. Casanovas, P. Noriega, D. Bourcier, and F. Galindo, 213–226. Granada: European Press Academic Publishing.
Jakulin, A., and W. Buntine. 2004. Applying discrete PCA in data analysis. In Proceedings of the twentieth conference on uncertainty in artificial intelligence, ed. C. Meek and J. Halpern. Arlington: AUAI Press.
Jaworska, N., and A. Chupetlovska-Anastasova. 2009. A review of multidimensional scaling (MDS) and its utility in various psychological domains. Tutorials in Quantitative Methods for Psychology 5 (1): 1–10.
Jones, B. D., and F. R. Baumgartner. 2012. From there to here: Punctuated equilibrium to the general punctuation thesis to a theory of government information processing. Policy Studies Journal 40:1–19.
Jurafsky, D., and J. H. Martin. 2008. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. 2nd ed. Englewood Cliffs: Pearson Prentice Hall.
Klein, G. A., and R. Calderwood. 1991. Decision models: Some lessons from the field. IEEE Transactions on Systems, Man, and Cybernetics 21 (5): 1018–1026.
Klingemann, H.-D., A. Volkens, J. L. Bara, I. Budge, and M. McDonald. 2006. Mapping policy preferences II: Estimates for parties, electors, and governments in Eastern Europe, European Union, and OECD 1990–2003. Oxford: Oxford University Press.
Klüver, H. 2009. Measuring interest group influence using quantitative text analysis. European Union Politics 10:535–549.
Korenius, T., J. Laurikkala, K. Järvelin, and M. Juhola. 2004. Stemming and lemmatization in the clustering of Finnish text documents. In Proceedings of the thirteenth ACM conference on information and knowledge management, ed. D. A. Evans, L. Gravano, O. Herzog, C.-X. Zhai, and M. Ronthaler, 625–633. New York: ACM Press.
Krippendorff, K. 2004. Content analysis: An introduction to its methodology. 2nd ed. Thousand Oaks: Sage.
Lanzara, G. F. 1999. Between transient constructs and persistent structures: Designing systems in action. Journal of Strategic Information Systems 8:331–349.
Lapalut, S. 1995. Text clustering to support knowledge acquisition from documents. Rapport de recherche 2639, Institut National de Recherche en Informatique et en Automatique, Sophia Antipolis (France).
Lasswell, H., N. Leites, R. Fadner, J. M. Goldsen, A. Grey, I. L. Janis, A. Kaplan, D. Kaplan, A. Mintz, I. De Sola Pool, and S. Yakobson. 1949. Language of politics: Studies of quantitative semantics. Cambridge: MIT Press.
Lauderdale, B. E., and T. S. Clark. 2012. Scaling politically meaningful dimensions using texts and votes. In 7th annual conference on empirical legal studies.
Lebart, L., and A. Salem. 1988. Analyse statistique des données textuelles. Paris: Dunod.
Manning, C. D., and H. Schütze. 1999. Foundations of statistical natural language processing. Cambridge: MIT Press.
Manning, C. D., P. Raghavan, and H. Schütze 2008. Introduction to information retrieval. Cambridge: Cambridge University Press.
March, J. G. 1981. Footnotes to organizational change. Administrative Science Quarterly 26:563–577.
March, J. G., and H. A. Simon. 1958. Organizations. New york: Wiley.
Marques de Sá, J. P. 2007. Applied statistics using SPSS, STATISTICA, MATLAB, and R. 2nd ed. Berlin: Springer.
Mimno, D. 2012a. Computational historiography: Data mining in a century of classics journals. Journal on Computing and Cultural Heritage 5 (1): Article 3.
Mimno, D. 2012b. Reconstructing pompeian households. Computing Research Repository abs/1202.3747.
Moens, M.-F. 2001. Innovative techniques for legal text retrieval. Artificial Intelligence and Law 9:29–57.
Mooi, E., and M. Sarstedt. 2011. Cluster analysis. In A concise guide to market research, eds. M. Sarstedt and E. Mooi, 237–284. Berlin: Springer-Verlag.
Morin, A. 2006. Intensive use of factorial correspondence analysis for text mining: Application with statistical education publications. Proceedings of the 7th international conference on teaching statistics, ICOTS, Salvador, Bahia, Brazil.
Murtagh, F. 2005. Correspondence analysis and data coding with Java and R. Boca Raton: Chapman and Hall.
Neuendorf, K. A. 2002. The content analysis guidebook. Thousand Oaks: Sage.
Oksanen, J., F. Guillaume-Blanchet, R. Kindt, P. Legendre, P. R. Minchin, R. B. O’Hara, G. L. Simpson, P. Solymos, M. H. H. Stevens, and H. Wagner. 2013. vegan: Community ecology package. R package version 2.0-8.
Paukkeri, M. S. 2012. Language- and domain-independent text mining. PhD dissertation, Department of Information and Computer Science, Aalto University, Finland.
Percellar-Giménez, E. 2000. El procedimiento de internamiento por razón de trastorno psíquico. Noticias Jurídicas.
Porter, M. F. 1980. An algorithm for suffix stripping. Program 14 (3): 130–137.
Proksch, S.-O., and J. B. Slapin. 2009. WORDFISH: Scaling software for estimating political positions from Text (version 1.3). http://www.wordfish.org. Accessed 10 August 2014.
Quinn, G. P., and M. J. Keough. 2002. Experimental design and data analysis for biologists. Cambridge: Cambridge University Press.
R Core Team. 2013. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Reinert, M. 1985. Classification descendante hierarchique: un algorithme pour le traitement des tableaux logiques de grandes dimensions. Paris: INRIA.
Reinert, M. 1987. Un logiciel d’analyse de données textuelles ALCESTE. Paris: INRIA.
Reinert, M. 2000. La tresse du sens et la méthode alceste: Application aux “rêveries du promeneur solitaire”. In JADT 2000. 5es Journées Internationales d’Analyse Statistique des Données Textuelles.
Royer, I., and A. Langley. 2008. Linking rationality, politics, and routines in organizational decision making. In The Oxford handbook of organizational decision making, ed. G. P. Hodgkinson and W. H. Starbuck, 250–270. New York: Oxford University Press.
Salton, G., A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Communications of the Association for Computing Machinery 18 (11): 613–620.
Sandhya, N., and A. Govardhan. 2012. Analysis of similarity measures with wordnet based text document clustering. In Proceedings of the InConINDIA 2012, ed. S. C. Satapahy, P. S. Avadhani, and A. Abraham. Berlin: AISC.
Schonhardt-Bailey, C. 2005. Measuring ideas more effectively: An analysis of bush and kerry’s national security speeches. PS: Political Science and Politics 38 (4): 701–711.
Schonhardt-Bailey, C. 2008. The congressional debate on partial-birth abortion: Constitutional gravitas and moral passion. British Journal of Political Science 38:383–410.
Sementko, H. A., and P. M. Valkenburg. 2000. Framing european politics: A content analysis of press and television news. Journal of Communication Spring:93–109.
Simon, H. A. 1997. Administrative behavior: A study of decision-making processes in administrative organizations. 4th ed. New York: The Free Press.
Solka, J. L. 2008. Text data mining: Theory and methods. Statistics Surveys 2:94–112.
Srivastava, A. N., and M. Sahami, eds. 2009. Text mining. Classification, clustering, and application. Boca Raton: CRC Press.
Steyvers, M., and T. Griffiths. 2007. Probabilistic topic models. In Latent semantic analysis: A road to meaning, ed. T. Landauer, D McNamara, S. Dennis and W. Kintsch, Mahwah: Laurence Erlbaum.
Vallbé, J.-J. 2012. Autonomía y organización en el sistema judicial español. In Sistema político español, ed. Josep M. Reniu, Huygens Editorial.
Vallbé, J.-J., and M. A. Martí. 2009. Stemming y lematización: técnicas de procesamiento del lenguaje para el análisis del discurso del conocimineto profesional del juez. In Web Semántica y ontologías jurídicas. Aplicaciones para el derecho en la nueva generación de la red, ed. P. Casanovas, J.-J. Vallbé, M. Fernández-Barrera, and V. R. Benjamins. Comares, Forthcoming.
Vallbé, J.-J., Martí, M. A., Fortuna, B., Jakulin, A., Mladenič, D., and P. Casanovas. 2007. Stemming and lemmatisation: Improving knowledge management through language processing techniques. In Trends in legal knowledge—the semantic web and the regulation of electronic social systems, ed. P. Casanovas, P. Noriega, D. Bourcier, and F. Galindo, pp. 227–242. Granada: European Press Academic Publishing.
Ward, J. H. 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58:236–244.
Witherspoon, D. J., S. Wooding, A. Rogers, E. Marchani, W. Watkins, M. Batzer, and L. Jorde. 2007. Genetic similarities within and between human populations. Genetics 176 (1): 351–359.
Young, L., and S. Soroka. 2012. Affective news: The automated coding of sentiment in political texts. Political Communication 29:205–231.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Vallbé, JJ. (2015). Representing Organizational Uncertainty. In: Frameworks for Modeling Cognition and Decisions in Institutional Environments. Law, Governance and Technology Series, vol 21. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9427-5_5
Download citation
DOI: https://doi.org/10.1007/978-94-017-9427-5_5
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-9426-8
Online ISBN: 978-94-017-9427-5
eBook Packages: Humanities, Social Sciences and LawLaw and Criminology (R0)