Skip to main content
Log in

Data processing, information retrieval and classification of atmospheric measurements

  • Original Article
  • Published:
Journal of Data, Information and Management Aims and scope Submit manuscript

Abstract

The product of this research work takes raw atmospheric science data as input and generates clean, standardized, and redundancy-free data as output. There are two major tasks involved in the work: data processing and information retrieval. Data processing involves removing inaccuracies and resolving inconsistencies among data. Information retrieval involves identifying the information needed, extracting the data, and consolidating similar entries. There are four challenges encountered in this study: the inaccuracy of the raw data, the inconsistency of records, the need for multiple criteria for validating a record, and the large number of alternatives for certain records. Some existing research dealt with some of the complexities involved in this study. Unfortunately, a solution that targets all these challenges at once was not found in the literature. The contribution of this work is to find a comprehensive approach that works with all four problems and produces a reliable outcome. In particular, fuzzy matching and fuzzy rule-based inference systems have been used for removing inconsistencies among data entries, retrieving information from certain sections of data files, and consolidating information from different sources. A rule-based system is chosen to represent the factors that are associated with the type of measurement as well as their interrelationships, and then make decisions on the category of the measurement. The retrieval of instrument information is aided by a structural-based approach using the natural language processing (NLP) technique. The output of NLP is used to match an entry in the instrument dictionary using a fuzzy rule-based inference system. A multi-criteria decision-making algorithm is used to aggregate information from different sources and select the instrument classification by factoring in the significance of each source. A software package based on the algorithms presented in this paper has been developed; the package has been deployed for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abdel-Galil TK, Sharkawy RM, Salama MMA, Bartnikas R (2005) Partial discharge pattern classification using the fuzzy decision tree approach. IEEE Trans Instrum Meas 54(6):2258

    Article  ADS  Google Scholar 

  • Abreu LR, Nagano MS (2023) A two-stage fuzzy inference model to determine raw materials criticality in life sciences industries. Oper Manag Res 16:2048

    Article  Google Scholar 

  • Aknan A, Chen G, Crawford J, Williams E (2013) ICARTT file format standards V1.1. https://www-air.larc.nasa.gov/missions/etc/IcarttDataFormat.htm

  • Buchanan BG, Shortliffe EH (1984) Rule based expert systems: the Mycin experiments of the Stanford heuristic programming project. Addison-Wesley, Reading, MA

  • Chan PM, Hu YF, Sheriff RE (2002) Implementation of fuzzy multiple objective decision making algorithm in a heterogeneous mobile environment. Proc 2002 IEEE Wirel Commun Netw Conf 1:332–336

  • Christen P (2012) A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowledge Data Eng 24(9):1537

    Article  Google Scholar 

  • Du G et al (2022) A novel emotion-aware method based on the fusion of textual description of speech, body movements, and facial expressions. IEEE Trans Instrum Meas 71:1

    Google Scholar 

  • Elmagarmid AK et al (2007) Duplicate record detection: a survey. IEEE Trans Knowledge Data Eng 19(1):1

    Article  Google Scholar 

  • Eslamipoor R (2022) A fuzzy multi-objective model for supplier selection to mitigate the impact of vehicle transportation gases and delivery time. J Data Inf Manag 4:231–241

    Article  Google Scholar 

  • Frost R, Hafiz R (2006) A new top-down parsing algorithm to accommodate ambiguity and left recursion in polynomial time. ACM SIGPLAN Notices 41(5):46–54

    Article  Google Scholar 

  • Hutton G (1992) Higher-order functions for parsing. J Funct Program 2(3):323–343

    Article  MathSciNet  Google Scholar 

  • Kao A, Roteet S (2007) Natural language processing and text mining. Springer

    Book  Google Scholar 

  • Leijen D, Meijer E (2001) Parsec: direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-27, Department of Computer Science, Universiteit Utrecht, Tech. Rep

  • Lesk ME, Schmidt E (1975) Lex: a Lexical analyzer generator. Bell Laboratories Murray Hill, NJ

  • Navale V, von Kaeppler D, McAuliffe M (2021) “An overview of biomedical platforms for managing research data.” J Data Inf Manag 3:21–27

    Article  Google Scholar 

  • Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88

    Article  Google Scholar 

  • Sansone SA, McQuilton P, Rocca-Serra P et al (2019) FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol 37:358–367

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sharma T, Tiwari N, Kelkar D (2012) Study of difference between forward and backward reasoning. Int J Emerg Technol Adv Eng 2(10):271–273

  • Triantaphyllou E, Mann SH (1989) An examination of the effectiveness of multi-dimensional decision-making methods: a decision-making paradox. Decis Support Syst 5(3):303–312

    Article  Google Scholar 

  • Tygel A, et al (2016) Towards cleaning-up open data portals: a metadata reconciliation approach. Proc 2016 IEEE Tenth Int Conf Semantic Comput 71–78

  • Wang D, Bai Y (2022) Data processing and information retrieval of atmospheric measurements. Proc IEEE Int Conf Comput Virtual Environ Meas Syst Appl 1–5

  • Zadeh AH (2021) Quantifying fan engagement in sports using text analytics. J Data Inf Manag 3:197–208

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dali Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, D., Bai, Y. Data processing, information retrieval and classification of atmospheric measurements. J. of Data, Inf. and Manag. 6, 41–49 (2024). https://doi.org/10.1007/s42488-024-00112-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42488-024-00112-5

Keywords

Navigation