VIEW 2006: Pixelization Paradigm pp 48-54 | Cite as
Pixelisation-Based Statistical Visualisation for Categorical Datasets with Spreadsheet Software
Abstract
A heat-map type of chart for depicting large number of cases and up to twenty-five categorical variables with spreadsheet software is presented. It is implemented in Microsoft® Excel using standard formulas, sorting and simple VBA code. The motivating example depicts accuracy of automated assignment of MeSH® descriptor headings to abstracts of medical articles. Within each abstract, predicted support for each heading is ranked, then for each heading actually assigned/non-assigned by human specialist (depicted by black/white cell), high/low support is depicted on nine-point two-colour scale. Thus, each case (abstract) is depicted by one row of a table and each variable (heading) with two adjacent columns. Rank-based classification accuracy measure is calculated for each case, and rows are sorted in increasing accuracy order downwards. Based on analogous measure, variables are sorted in increasing prediction accuracy order rightwards. Another biomedical dataset is presented with a similar chart. Different methods for predicting binary outcomes can be visualised, and the procedure is easily extended to polytomous variables.
Keywords
Automate Assignment Biomedical Informatics Medical Article Spreadsheet Software Adjacent ColumnPreview
Unable to display preview. Download preview PDF.
References
- 1.Friendly, M.: Visualizing categorical data. Cary, NC (2000)Google Scholar
- 2.Bertin, J.: Graphics and graphic information-processing. de Gruyter, New York (1981)Google Scholar
- 3.Hartigan, J.A., Kleiner, B.: A mosaic of television ratings. The American Statistician 38(1), 32–35 (1984)CrossRefGoogle Scholar
- 4.Friendly, M.: Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association 89(425), 190–200 (1994)CrossRefGoogle Scholar
- 5.Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 8(95), 14863–14868 (1998)CrossRefGoogle Scholar
- 6.Pavlidis, P., Noble, W.S.: Matrix2png: a utility for visualizing matrix data. Bioinformatics 19(2), 295–296 (2003)CrossRefGoogle Scholar
- 7.Heiser, D.A.: Microsoft Excel, and 2003 faults, problems, workarounds and fixes. (2000), http://www.daheiser.info/excel/frontpage.html
- 8.Neuwirth, E., Arganbright, D.: The active modeler – mathematical modeling with Microsoft Excel. Brooks/Cole, Belmont (2004)Google Scholar
- 9.Lévy, P.P.: The case view, a generic method of visualization of the case mix. International Journal of Medical Informatics 73(9-10), 713–718 (2004)CrossRefGoogle Scholar
- 10.Lévy, P.P., Duché, L., Darago, L., Dorléans, Y., Toubiana, L., Vibert, J.-F., Flahault, A.: ICPCview: visualizing the International Classification of Primary Care. In: Engelbrecht, R., et al. (eds.) Connecting Medical Informatics and Bio-Informatics, Proceedings of MIE2005, pp. 623–628. IOS Press, Amsterdam (2005)Google Scholar
- 11.Zupancic Pridgar, A.: The influence of vaginal flora on morbidity after conization (MSc thesis). University of Ljubljana, Faculty of Medicine, Ljubljana (2003)Google Scholar
- 12.Džeroski, S., Hristovski, D., Peterlin, B.: Using data mining and OLAP to discover patterns in a database of patients with Y-chromosome deletions. Journal of the American Medical Informatics Association 7 (Suppl.), 215–219 (2000)Google Scholar
- 13.Wilkinson, L.: The grammar of graphics. Springer, New York (1999)MATHGoogle Scholar
- 14.Tufte, E.: The visual display of quantitative information (16th printing). Graphics Press, Chesire (1998)Google Scholar