Abstract
This work aims to improve the accuracy of data-mining. A problem with today’s system for Artificial Intelligence (AI) is the reliance on frameworks that lacks oversight: AI is trained on data that is not guaranteed to be representative and is constructed through software-APIs maintained by a large number of developers, each with their interests. In contrast, the society expects the systems to work, as the result is otherwise a zombie-like behavior of systems for industrial control of how new drugs are discovered, oversight of public governance, heating systems, etc. This paper aims at tackling this issue. The idea is to relate the freedoms in software and algorithms, thereby identifying the blind spots in AI. The results reveal how knowledge discovery is directly linked to hidden attributes (in software and algorithms). Thus, this work provides users with a recipe for improving the trust in their AI-predictions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, A., Menzies, T., Minku, L.L., Wagner, M., Yu, Z.: Better software analytics via “duo’’: data mining algorithms using/used-by optimizers. Empirical Softw. Eng. 25, 2099–2136 (2020)
Ana, L.N.F., Jain, A.K.: Robust data clustering. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Proceedings. , vol. 2, p. II-128. IEEE (2003)
Antezana, E.: Towards semantic systems biology: biological knowledge management using semantic web technologies. Ph.D. thesis, University of Gent (Belgium) (2008)
Antezana, E., et al.: Biogateway: a semantic systems biology tool for the life sciences. BMC Bioinform. 10(10), S11 (2009)
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nat. Genetics 25(1), 25–29 (2000)
Barabási, A.-L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nat. Rev. Genetics 12(1), 56–68 (2011)
Bayer, R.: Symmetric binary B-trees: data structure and maintenance algorithms. Acta Informatica 1, 290–306 (1972). https://doi.org/10.1007/BF00289509
Belleau, F., Nolin, M.-A., Tourigny, N., Rigault, P., Morissette, J.: Bio2rdf: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)
Bezdek, J.C., Keller, J.M., Krishnapuram, R., Kuncheva, L.I., Pal, N.R.: Will the real iris data please stand up? IEEE Trans. Fuzzy Syst. 7(3), 368–369 (1999)
Blonde, W.: Metarel, an ontology facilitating advanced querying of biomedical knowledge. Ph.D. thesis, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium (2012)
Blonde, W., Antezana, E., Mironov, V., Schulz, S., Kuiper, M., De Baets, B.: Using the relation ontology metarel for modelling linked data as multi-digraphs (2012)
Blonde, W., Mironov, V., Antezana, E., Venkatesan, A., De Baets, B., Kuiper, M.: Reasoning with bio-ontologies: using relational closure rules to enable practical querying. Oxford Bioinform. 27, 1562–1568 (2011)
Butcher, E.C., Berg, E.L., Kunkel, E.J.: Systems biology in drug discovery. Nat. Biotechnol. 22(10), 1253 (2004)
Camon, E., et al.: The gene ontology annotation (Goa) database: sharing knowledge in uniprot with gene ontology. Nucl. Acids Res. 32(suppl 1), D262–D266 (2004)
Chowdhury, S., Sarkar, R.R.: Comparison of human cell signaling pathway databases-evolution, drawbacks and challenges. Database (2015)
UniProt Consortium: Uniprot: the universal protein knowledgebase. Nucl. Acids Res. 45(D1), D158–D169 (2017)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)
Croft, D., et al.: Reactome: a database of reactions, pathways and biological processes. Nucl. Acids Res. 39(suppl 1), D691–D697 (2011)
Cuatrecasas, P.: Drug discovery in jeopardy. J. Clin. Investig. 116(11), 2837 (2006)
Demir, E., et al.: Using biological pathway data with Paxtools. PLoS Comput. Biol. 9(9), e1003194 (2013)
Demir, E., et al.: The biopax community standard for pathway data sharing. Nat. Biotechnol. 28(9), 935–942 (2010)
Dräger, A., Palsson, B.: Improving collaboration by standardization efforts in systems biology. Front. Bioeng. Biotechnol. 2 (2014)
The Economist. Don’t trust AI until we build systems that earn trust (2019). Accessed June 2020
The Economist. An understanding of AI’s limitations is starting to sink in (2020). Accessed June 2020
Ekseth, O.K., Furnes, P.-J., Hvasshovd, S.-O.: Pattern matching in the era of big data: A benchmark of cluster quality metrics. Int. J. Adv. Softw. (2019)
Ekseth, O.K., Gribbestad, M., Hvasshovd, S.-O.: Inventing wheels: why improvements to established cluster algorithms fails to catch the wheel. In: The International Conference on Digital Image and Signal Processing (DISP 2019). Springer, Heidelberg (2019)
Ekseth, O.K., Hvasshovd, S.-O.: hpLysis database-engine: a new data-scheme for fast semantic queries in biomedical databases. In: Under Review: Provides Details of the In-memory Data-Engine: Contact oekseth@gmail.com for the Paper (2017)
Ekseth, O.K., Hvasshovd, S.-O.: In the realm of big data: how an understanding of users and computers results in a framework for finding the needles in the haystack of knowledge (2020). Manuscript ready for submission
Ekseth, O.K., Hvasshovd, S.-O.: A new framework for automated knowledge discovery of feature-data translates worst-performing cluster algorithms into best-performers through lazyness (2020). Manuscript ready for submission
Ekseth, O.K., Hvasshovd, S.-O.: A new framework that translates zombie like predictions into trustworthy knowledge grants fairness, and removes the bias, of AI (2020). Manuscript ready for submission
Ekseth, O.K., Hvasshovd, S.-O.: An empirical study of strategies boosts performance of mutual information similarity. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2018. LNCS (LNAI), vol. 10842, pp. 321–332. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91262-2_29
Ekseth, O.K., Kuiper, M., Mironov, V.: Orthagogue: an agile tool for the rapid prediction of orthology relations. Bioinformatics 30(5), 734–736 (2013)
Ekseth, O.K., Meyer, J.C., Hvasshovd, S.O.: hpLysis database-engine: a new data-scheme for fast semantic queries in biomedical databases. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 383–390. IEEE (2018)
Ekseth, O.K., Meyer, J.C., Hvasshovd, S.O.: A new database for drug discovery through application of data-integration and semantics. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 403–410. IEEE (2018)
Eltabakh, M.Y., et al.: Managing biological data using BDBMS. In: IEEE 24th International Conference on Data Engineering 2008, ICDE 2008, pp. 1600–1603. IEEE (2008)
Fernández-Suárez, X.M., Birney, E.: Advanced genomic data mining. PLoS Comput. Biol. 4(9), e1000121 (2008)
Feuerherm, A.J., Johansen, B.: Rheumatoid arthritis treatment, 1 March 2013. US Patent App. 13/783,088
National Center for Biotechnology Information. Pubmed data-base for biomedical literature, August 2020. https://www.ncbi.nlm.nih.gov/pubmed/
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. (TOMS) 3(3), 209–226 (1977)
Eric, L., et al.: High-performance computing applied to semantic databases. In: Antoniou, G., et al. (eds.) The Semanic Web: Research and Applications. LNCS, vol. 6644, pp. 31–45. Springer, Heidelberg (2011)
Goodman, L.A., Kruskal, W.H.: Measures of Association for Cross Classifications, pp. 2–34. Springer, Heidelberg (1979). https://doi.org/10.1007/978-1-4612-9995-0
Gregory, S.G., et al.: The DNA sequence and biological annotation of human chromosome 1. Nature 441(7091), 315–321 (2006)
Hermjakob, H., Montecchi-Palazzi, L., Lewington, C., Mudali, S., Kerrien, S., Orchard, S., Vingron, M., Roechert, B., Roepstorff, P., Valencia, A., et al.: Intact: an open source molecular interaction database. Nucl. Acids Res. 32(suppl 1), D452–D455 (2004)
Hopcroft, J., Tarjan, R.: Efficient algorithms for graph manipulation. Technical report, Stanford University, Stanford, CA, USA (1971)
Hucka, M., Finney, A., Sauro, H.M., Bolouri, H., Doyle, J.C., Kitano, H.: The ERATO systems biology workbench: enabling interaction and exchange between software tools for computational biology (2002)
Hunter, A.J.: The innovative medicines initiative: a pre-competitive initiative to enhance the biomedical science base of Europe to expedite the development of new medicines for patients. Drug Discov. Today 13(9), 371–373 (2008)
Ioannidis, Y., Ramakrishnan, R., Winger, L.: Transitive closure algorithms based on graph traversal. ACM Trans. Database Syst. (TODS) 18(3), 512–576 (1993)
Jagadish, H.V., Olken, F.: Database management for life sciences research. ACM SIGMOD Rec. 33(2), 15–20 (2004)
Kohonen, T., Somervuo, P.: Self-organizing maps of symbol strings. Neurocomputing 21(1), 19–30 (1998)
Kolpakov, F.: Cyclonet-an integrated database on cell cycle regulation and carcinogenesis. Nucl. Acids Res. 35(suppl. 1), D550–D556 (2007)
Kusner, M.J., Loftus, J.R.: The long road to fairer algorithms (2020)
Lawley, M.: Exploiting fast classification of SNOMED CT for query and integration of health data. In: KR-MED (2008)
Li, S., Sejong, O.: Improving feature selection performance using pairwise pre-evaluation. BMC Bioinform. 17(1), 312 (2016)
Liu, C., Wang, H., Yong, Yu., Linhao, X.: Towards efficient Sparql query processing on RDF data. Tsinghua Sci. Technol. 15(6), 613–622 (2010)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Ma, X., Gao, L.: Biological network analysis: insights into structure and functions. Brief. Funct. Genomics 11(6), 434–442 (2012)
Masseroli, M., et al.: Genometric query language: a novel approach to large-scale genomic data management. Bioinformatics 31(12), 1881–1888 (2015)
McMahon, E., Patton, M., Samtani, S., Chen, H.: Benchmarking vulnerability assessment tools for enhanced cyber-physical system (CPS) resiliency. In: 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 100–105. IEEE (2018)
Mirkin, B.: Eleven ways to look at the chi-squared coefficient for contingency tables. Am. Stat. 55(2), 111–120 (2001)
Mironov, V., Seethappan, N., Blondé, W., Antezana, E., Splendiani, A., Kuiper, M.: Gauging triple stores with actual biological data. BMC Bioinform. 13(1), S3 (2012)
Morset, E.: Email conversations with the CTO of winns reveals how accurate regulations of heat-pumps maps to their energy consumption (2021). Accessed Apr 2021
United Nations. World stumbling zombie-like into a digital welfare dystopia, warns un human rights expert (2019)
Le Novere, N., et al.: Minimum information requested in the annotation of biochemical models (MIRIAM). Nat. Biotechnol. 23(12), 1509–1515 (2005)
Pang, C.: Biobankconnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing. J. Am. Med. Inform. Assoc. 22(1), 65–75 (2015)
Papanikolaou, N., et al.: Biotextquest+: a knowledge integration platform for literature mining and concept discovery. Bioinformatics 30(22), 3249–3256 (2014)
Pareto, V.: Translated into English by A.S. Schwieras Manual of Political Economy (1906)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Pieroni, E., et al.: Protein networking: insights into global functional organization of proteomes. Proteomics 8(4), 799–816 (2008)
Ritchie, M.D., Holzinger, E.R., Li, R., Pendergrass, S.A., Kim, D.: Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genetics 16(2), 85–97 (2015)
San Martín, M., Gutierrez, C.: Representing, querying and transforming social networks with RDF/SPARQL. In: Aroyo, L., et al. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 293–307. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02121-3_24
Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on Hadoop. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 164–179. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_11
Sertkaya, A., Wong, H.-H., Jessup, A., Beleche, T.: Key cost drivers of pharmaceutical clinical trials in the United States. Clin. Trials 13(2), 117–126 (2016)
Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G., Kasprzyk, A.: Biomart-biological queries made easy. BMC Genom. 10(1), 1 (2009)
Smith, B., et al.: The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251–1255 (2007)
Soussi, T., Asselain, B., Hamroun, D., Kato, S., Ishioka, C., Claustres, M., Béroud, C.: Meta-analysis of the p53 mutation database for mutant p53 biological activity reveals a methodologic bias in mutation detection. Clin. Cancer Res. 12(1), 62–69 (2006)
Stark, C., et al.: The biogrid interaction database: 2011 update. Nucl. Acids Res. 39(suppl 1), D698–D704 (2011)
Tomašev, N., Radovanović, M.: Clustering evaluation in high-dimensional data. In: Celebi, M.E., Aydin, K. (eds.) Unsupervised Learn. Alg., pp. 71–107. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24211-8_4
US-CERT. Alert (ta16-288a) heightened DDOs threat posed by Mirai and other botnets (2016). Accessed Sept 2019
Venkatesan, A.: Application of semantic web technology to establish knowledge management and discovery in the life sciences. Ph.D. thesis (2014)
Venkatesan, A., et al.: Finding gene regulatory network candidates using the gene expression knowledge base. BMC Bioinform. 15(1), 386 (2014)
Wandeto, J.M., Dresp, B.: Ultrafast automatic classification of SEM image sets showing CD4 + cells with varying extent of HIV virion infection. Int. J. Adv. Softw. (2019)
Westerlund, M., Neovius, M., Pulkkis, G.: Providing tamper-resistant audit trails with distributed ledger based solutions for forensics of IOT systems using cloud resources. Int. J. Adv. Secur. 11(3 & 4), 2018 (2018)
Wheeler, D.L., et al.: Database resources of the national center for biotechnology information. Nucl. Acids Res. 35(suppl 1), D5–D12 (2007)
Wylot, M., Cudré-Mauroux, P.: DiploCloud: efficient and scalable management of RDF data in the cloud. IEEE Trans. Knowl. Data Eng. 28(3), 659–674 (2016)
Ye, K.Q., Green, M., Sanguansin, N., Beringer, L., Petcher, A., Appel, A.W.: Verified correctness and security of mbedTLS HMAC-DRBG. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2007–2020. ACM (2017)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 856–863 (2003)
Zhao, M., Yang, C.C.: Mining online heterogeneous healthcare networks for drug repositioning. In: 2016 IEEE International Conference on Healthcare Informatics (ICHI), pp. 106–112. IEEE (2016)
Ziegeldorf, J.H., Morchon, O.G., Wehrle, K.: Privacy in the internet of things: threats and challenges. Secur. Commun. Netw. 7(12), 2728–2742 (2014)
Acknowledgments
The authors would like to thank MD K.I. Ekseth at UIO, Dr. O.V. Solberg at SINTEF, Dr. S.A. Aase at GE Healthcare, and Dr. B.H. Helleberg at NTNU/St. Olavs, for their support.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ekseth, O.K., Morset, E., Witzø, V., Refsnes, S., Hvasshovd, SO. (2022). Exploring the Freedoms in Data Mining: Why the Trustworthiness and Integrity of the Findings are the Casualties, and How to Resolve These?. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2021, Volume 1. FTC 2021. Lecture Notes in Networks and Systems, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-030-89906-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-89906-6_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89905-9
Online ISBN: 978-3-030-89906-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)