6.1 Non-technical Requirements
Data
Protection
and Privacy
Particularly in the EU, there are numerous data protection and privacy issues to consider when undertaking big data analytics. Regulatory
requirements dictate that personal data must be processed for specified and lawful purposes and that the processing must be adequate, relevant, and not excessive. The impact of these principles for financial services organizations is significant, with individuals being able to ask financial services organizations to remove or refrain from processing their personal data in certain circumstances.
This requirement could lead to increased costs for financial services organizations, as they deal with individuals’ requests. This removal of data may also lead to the dataset being skewed, as certain groups of people will be more active and aware of their rights than others.
Confidentiality
and Regulatory Requirements
Any information related by a third party that is subject to big data analytics is likely to be confidential information. Therefore, financial services organizations will need to ensure that they comply with their obligations and that any use of such data does not give rise to a breach of their confidentiality or regulatory obligations.
Liability
Issues
Just because big data contains an enormous amount of information, it does not mean that it reflects a representative sample of the population. Therefore there is a risk of misinterpreting the information produced and liability may arise where reliance is placed on that information. This is a factor that financial services organizations have to take into account when looking at using big data in analytical models and ensuring that any reliance placed upon the output comes with relevant disclaimers attached.
6.2 Technical Requirements
Data
Extraction and Sentiment Classification
Though the definition of sentiment is vague, in general, a sentiment on an object is a positive or negative view, attitude, emotion, or appraisal on or from a document author or actor.
Sentiment is often expressed in a domain-specific way, and using non-domain-specific vocabulary may lead to misclassifications. The goal is to extract facts and sentiments concerning the financial use cases: financial instruments, situations, conditions, indicators, and experts’ assessments regarding these instruments, as well as investors’ sentiment, etc. The classification of sentiment can be done at several levels: words, phrases, sentences, paragraphs, documents, and even multiple documents, and then aggregate.
Data extraction
needs to cope with noise, misinformation, irony, bias, or uncertainty. In addition, with sentiment it is important not only to determine the sentiment of a piece of information, but how words affect the semantic orientation and how sentiment changes.
Data
Quality
The more timely, accurate, and relevant the data (along with good analytics), the better the assessment of the current financial state is. This requires better processes of identifying and maintaining the data sources of interest, verifying, cleaning, transforming, integrating, and deduplicating data. Due to the large amount of available data, there is a need for automation and scalability processes. Language detection
methods also need to be refined to improve precision and reliability.
Data
Acquisition
For banks and financial services providers, the volume of data they generate, consume, store, and access will increase exponentially year over year. The applications depend on acquiring and accessing massive amounts of historical heterogeneous information and live feeds of unstructured, semi structured, and structured information. A significant amount of data comes from internal structured data, though there is a growing trend towards external unstructured data (from news, blogs, articles, social networks, and websites). Even when there can be a wide variety of data sources to access, the actual ones that are required depend on the design for a specific application.
Data Integration/Sharing
This describes the task to overcome the heterogeneity of disparate data sources in terms of hardware, software, syntax, and/or semantics by providing access tools that enable interoperability.
The data is usually scattered among different heterogeneous sources with differing conceptual representations (different structures and data semantics) but it is encapsulated into a single, homogeneous data source to the end user.
The motivation for integration may be based on strategic or operational considerations. Regarding strategic considerations and analysis, it may not be required to constantly integrate the data but to integrate data snapshots at a certain point in time. For operational analysis a real-time integration
of the most up-to-date information may be required.
Typically data integration is not a once-off conversion but an on-going task, therefore poses the additional constraint that the chosen solution needs to be robust in terms of adaptability, extensibility, and scalability. Approaches leveraging standards such as eXtensible Business Reporting Language (XBRL)
and Linked Data
show promise (O’Riáin et al. 2012).
This rapid generation of continuous streams of information has challenged the storage, computation, and communication capabilities in computing systems, as they impose high resource requirements on data stream processing systems.
Decision Support Systems (DSS)
Model-driven DSS emphasises access to and manipulation of statistical, financial, optimization, and/or simulation models. Models use data and parameters to aid decision-makers in analysing a situation, for instance, assessing and evaluating decision alternatives and examining the effect of changes. This requires integrating information from the knowledge base into financial event detection models, visualization models, decision-models, and for scalable execution of these models.
For some application scenarios, the response of the system should support real-time or near-real-time insights
. The velocity of the response is subject to the end user requirements.
In DSS, visualization is an extremely useful tool for providing overviews and insights into overwhelming amounts of data to support the decision-making process.
Data Privacy
and Security
Top priorities for the financial sector today include on-going regulatory compliance [e.g. Sarbanes-Oxley (SOX) Act, U.S. Government (2002); EU data protection directive
, Parliament (1995); cyber security directive, Parliament (2013)] and risk mitigation, continued adaptation to the expectations of consumers for anywhere/anytime service, reducing operational costs, and increasing efficiencies through use of cloud-based services.
Banking and financial institutions need to secure the storage, transit, and use of corporate and personal data across business applications, including online banking and electronic communications of sensitive information and documents.
The increasingly global nature and high-interconnectivity of the industry makes it necessary to comprehensively address international data security and privacy regulations, from the front to the back-end, and along the full supply chain, including third parties. Data is not always stored in-house but with third parties. Using commercial “cloud” services as data storage locations poses potential privacy and security problems since the terms of service for these products are often poorly understood.