1 Introduction

Searching the Internet has become a routine activity for most people. Over the past decades, improvements in assistive technology have enabled visually impaired (VI) web users to access information on the Internet. However, efficiently obtaining the target information using online search engines remains a challenge for VI web users [1, 2]. In this study, we attempt to address the challenges identified in the literature, especially those faced by VI web users when exploring search results [3].

The VI web users access web browsers using a screen reader, i.e., software that renders the screen content via computer speech. NonVisual Desktop Access (NVDA), Job Access With Speech (JAWS), and VoiceOver, are three of the most commonly used screen readers. The screen readers process webpages sequentially. This results in many difficulties, including information overload and a lack of context for users [4]. When the web users submit their queries, they need to focus on the information presented to them to decide if it is significant to their needs [5]. Therefore, user interfaces (UIs) that minimize distractions are beneficial for web users. The search engine interfaces face several distinct challenges when organizing the search results and integrating navigation [6]. These difficulties are considerably critical for VI web users [7].

The presentation of web search results should help the users to obtain the desired information as efficiently as possible [8,9,10]. Therefore, the search interface should have a minimalist design and avoid unnecessary elements that can distract users [11]. Several studies have attempted to improve the information-seeking behavior of VI web users by enhancing specific search engine features. Two studies [1, 2] have attempted to improve the query suggestion feature, which was evaluated by VI web users. However, the participants evaluated this feature to be distracting, because it required them to navigate to their initial query, which is difficult when using a screen reader. This problem can be solved using a suitable information-gathering process, which could be simplified by examining the user behavior and focusing on the recommended features that are most useful to the participants as opposed to the remaining features that only increase the task analysis complexity. Also, some studies address search engine accessibility with different approaches, such as search engine interface for clustering search results for VI user’s query [12], search engine ontology for VI users [13], an audible search engine for VI users for accessibility improvement [14], and usability improvement of search engine tools to simplify interaction for VI users [15].

Herein, we introduce a novel approach that clusters the web search results and presents an overview of the clustered results. This approach shortens the query path to the desired search result. A cognitive benefit of this approach is that it does not overload the short-term memory of the user. To evaluate this approach, we developed a functional prototype called interactive search engine (InteractSE) interface. It contains an overview of the search results that is presented in a multilevel tree structure, and each node of the tree presents a concept with a list of discovered results. We asked 16 VI web users (legally blind) to evaluate the prototype for measuring the effectiveness of our approach. The evaluation examined the manner in which the participants interacted with various interface components (tree and list). The evaluation results led to the development of a set of design suggestions to support the information-seeking process.

2 Motivation and research questions

Searching for information on the web is an extremely demanding process [16]. This process is even more challenging for VI web users because of the complexities associated with their web interfaces owing to the usage of screen readers [17]. This process comprises four main stages: query formulation, action, search result exploration, and refinement [18]. Search result exploration is the most intense and demanding stage [6]. This stage is also the most demanding for VI web users, who have to spend approximately twice the time spent by users with 20/20 vision to search for the appropriate information [19]. The serial nature of the screen reader output reduces the speed at which the VI web users access the information during the exploration stage. Some studies [1, 20] have suggested providing VI web users with an overview of the search results, and Berget and MacFarlane [21] stated that user interface design is a clear issue for VI users for navigating search results. However, to the best of our knowledge, no study has yet suggested a mechanism to provide such an overview and change user interface design to enable easy navigation through the search results. Therefore, we propose a mechanism for presenting an overview of the search results to the user by generating a general summary of the relevant search results [1, 22]. This study addresses the following research questions.

RQ-1 Can search results overviews enhance the VI users’ information seeking experience and efficiency?

RQ-2 What are the possible enhancements that could be applied to the search engine interface to meet the current needs and future expectations of the VI web users?

The remainder of this study is organized as follows. In the literature review presented in Sect. 3, we describe the information-seeking process, screen readers, and retrieval of accessible information. In addition, we summarize the types of search tasks, including the complex VI search tasks and formal concept analysis (FCA) theory. The research methodology and procedure are described in Sect. 4, including our process of producing search result overviews using FCA via the preliminary design interface. The results are presented in Sect. 5, the discussion in Sect. 6, while our conclusion and future work are presented in Sect. 7.

3 Literature review

Currently, the information-seeking process is well supported by state-of-the-art search engines [6]. The search engines provide implicit and explicit features to help the web users during each stage of the information-seeking process [23]. However, VI web users who use screen readers encounter various challenges when accessing the essential features of search engines [1]. First, we introduce the information-seeking process and its stages. Then, we discuss the usage of screen readers as well as interactive information retrieval and present different types of search tasks. Finally, we present the literature related to FCA.

3.1 Information-seeking process

Information seeking is the process of attempting to acquire information from human and technological contexts using a specific database or information network [24]. Williamson and Asla [25] indicated that not all information seekers are alike and that they may vary according to various factors, including their gender, physical ability, and age. All these factors directly affect the needs of information seekers. Understanding the human information-seeking process is important to design successful UIs. The success of a search engine is determined primarily based on the usability of its interface [6], and interface design progress made browsing approach more efficacy [26].

Marchionini [27] agreed that information-seeking process is a cycle that involves problem identification, a plan or query formation, and search result evaluation. Furthermore, this process may be repeated if the desired goals are not met. Berget and MacFarlane’s [21] literature review showed that interface design impacts the navigating and evaluating search results that are part of the search process.

In the standard model, the process of searching for information is static and repetitive; however, the needs of the users keep changing and their interactions with the search tools vary during the search activity in a dynamic model. The information seeker progressively gains more knowledge about the topic being studied using the suggested terms and other results; with time, their questions become refined as their previous questions are answered. Bates [28] denoted that the process of learning, the goals of the search, and the questions posed constantly change. Consequently, an information seeker achieves the desired search results by combining a series of results [8]. The process of information-seeking develops over time with the change in the knowledge of information seekers and their attitudes toward various tasks [8, 29].

3.2 Screen readers and accessible information retrieval

According to a screen reader user survey conducted by WebAIMFootnote 1 [30], the top three most popular screen readers are: (1) NVDA, which is a free and open-source Windows-based program that is most compatible with the Firefox web browser; (2) JAWS, which is also a Windows-based program but is a commercial off-the-shelf software that is most compatible with Internet Explorer; and (3) VoiceOver, which is a feature built into Mac OS that is most compatible with Safari. Even though these assistive tools are widely available, screen reader users are dissatisfied with their experiences mostly due to inaccessible content or the design flow of the websites [31, 32]. Oppenheim and Selby [33] showed that web page design has to be screen-reader friendly, and the major key features of the VI user interface design include simple, consistent design and clear navigation through the web page.

Websites have developed from static pages to dynamic and interactive applications that enrich the user experience; however, the user experience of VI web users has not witnessed a similar enhancement. Craven [34] reported that VI web users were disappointed when navigating the search engine result pages, which tended to increase their mean task completion time. In terms of improving the search engine UIs, Andronico, Buzzi, Leporini, and Castillo [35] modified the Google search engine interface to simplify it so that the participants can easily interact with the search results. The participants observed that the modified interface was easier to use and that less time was required to complete the search process. Two studies [36,37,38] have explored the addition of voice control to screen readers. Both studies showed that the inclusion of voice command interfaces reduced the required number of user actions.

3.3 Complex search tasks for VI web users

In Marchionini’s exploratory study, the search tasks were classified as closed- or open-ended tasks. The open-ended tasks provided searchers with considerably more room to maneuver when compared with that provided by the closed-ended tasks [27]. The closed-ended tasks primarily focused on searching for facts, whereas the open-ended tasks offered considerable freedom and flexibility to the searchers. In another study conducted by Marchionini [39], the search activities were classified as lookup, learn, or investigate; the learning and investigating activities were considered exploratory search tasks. Both studies indicated that the task type affected the time required for completing the search, the recall time, the number of results, and the search precision. In addition, the employed search tactics differed depending on the task type. He concluded that the classification of search tasks was determined based on the goal of the search, the complexity, the topic structure, and the type of search expression.

Many web searches are direct searches aimed to find a specific type of information or to look up a particular resource on the web. Another type of search task is exploratory search [39, 40]. Exploratory search is for learning about a certain topic or discovering new information. From a different angle, a complex search task could be exploratory search for learning or investigating activities from multiple sources [39], or a multi-step search task that can be performed in any order [41]. We could use “complex” interchangeably with the exploratory to describe these search tasks [42]. Al-Thani [22] presented two exploratory tasks that are with similar complexity levels.

Examining the behaviors of VI users engaged in search activities is important when developing effective interfaces to support their specific needs. One study examined the manner in which VI web users searched for information on the Internet [1]. This study indicated that the queries by VI web users were longer, more complex, and more expressive in nature when compared with those of the users with 20/20 vision. This behavior was correlated with the fact that the search process for VI web users is slower due to the serial nature of screen readers. Therefore, VI web users recorded considerably longer search times when compared with that recorded by users with 20/20 vision. This can be attributed to many factors, including the lack of visual cues that usually appear in dialog boxes, the lack of spelling suggestions, and the generally low level of interactions provided by the screen readers.

The needs of VI web users are not always considered by the search engine designers. Therefore, a study was conducted to investigate the possibility of improving the functionalities of search engines to support the web search experiences of the VI web users; the Google search engine was considered to be a case study. Yang [43] proposed a specialized search engine (SSEB) that provided VI web users with a more interactive interface and a search assistant functionality. The SSEB system improved the satisfaction, mean search time, and search performance of the VI web users.

Sahib [2] demonstrated that their proposed search interface could effectively support the VI web users during complex search tasks and that their interface features were usable and accessible via screen readers. They added new features, such as bookmarking and integration with external note-taking applications, to the search interface for tracking the search results. Other features included providing keyboard shortcuts for specific actions and associating a context menu with the individual search results that allowed interactions with these results.

3.4 FCA theory

The FCA theory is a mathematical framework for data analysis [44] that can be used for conceptual data classification and text summarization [45]. We used the FCA method to cluster the search results. The retrieved search results were represented using a binary relation. Then, the binary relation was analyzed to extract the sets of patterns, which can be referred to as the optimal concepts that contain the most critical information in a relation. In recent FCA research papers, different methods were proposed for filtering top-n recommendation task [48], decomposition web service interface to improve its usability [49], and clustering scenario [50]. The main parts of the FCA theory in this study are explained below.

3.4.1 Binary, rectangular relations, and maximal gain

The binary relation R is a subset of the Cartesian product of two sets, the elements (E) and the features (F), i.e., E × F. The binary relation R may be covered, with different ways, by several rectangles \(\mathrm{REi}\). A rectangle \(\mathrm{REi}\) is a Cartesian product Ai × Bi ⊆ R, where Ai and Bi are non-empty sets such that Ai ⊆ E and Bi ⊆ F; Ai is called the domain of \(\mathrm{REi}\), whereas Bi is its codomain. Each rectangle \(\mathrm{REi}\hspace{0.17em}\)= Ai × Bi has a gain \(\mathrm{g}\left(\mathrm{REi}\right)\), determined by Eq. (1):

$$g\left(REi\right)=\left(\Vert Ai\Vert \times \Vert Bi\Vert \right)-\left(\Vert Ai\Vert +\Vert Bi\Vert \right)$$
(1)

where || || denotes the cardinality of the set. A rectangle RE \(\mathrm{j}\) is “optimal” if it has the maximal gain \(g\left(\mathrm{REj}\right)\) compared to all other rectangles included in R. If we consider objects as sentences and attributes their corresponding indexing words, the optimal rectangle \(\mathrm{REj}\) represents the main cluster of sentences in the document [46]. By sorting rectangles in decreasing order, using a heap data structure, we may build a tree of rectangles. By dynamically labeling each rectangle by a significant word, as can be seen in Sect. 4.2.3, we obtain a tree of words that is used for browsing through a document or a set of documents.

3.4.2 FCA foundations

FCA defines the relations between certain objects and attributes used to analyze the data [44]. FCA defines a formal concept based on a formal context (FC). An FC is a triplet F = (O, A, R), where O represents objects as tuples in the database instance (DBI), A represents the features or attributes in the columns in the DBI, and R represents the relation between O and A [47]. As an example, Table 1 shows an FC containing a set of objects O = {Object1, Object2, Object3}, a set of attributes A = {Attribute1, Attribute2, Attribute3}, and (Object1, Attr1) ∈ R, (Object2, Attr3) ∉ R. Table 1 indicates that the objects {Object1, Object2} have shared attributes {Attribute1, Attribute2} and that they share the same initial set of objects {Object1, Object2} for this set of attributes; therefore, ({Object1, Object2}, {Attribute1, Attribute2}) is a formal concept (Table 2).

Table 1 FC example
Table 2 Demographics of the participants and additional information

4 Methodology

To the best of our knowledge, no study has yet examined the interactions of the VI web user based on clustering and generating an overview of the search results. Therefore, this study is an exploratory study. We aim to understand the information-seeking behavior of VI web users when they are presented with an overview of the search results in a tree-like structure. Our exploratory study involves both quantitative and qualitative measures. This makes a mixed-method approach appropriate for this project because of the need to observe the participants and conduct interviews before and after conducting the experiments. While pre-questionnaires were used to gather participants’ demographic and search experience data, post-questionnaires were used to determine their satisfaction levels.

The study results were obtained via the integration of the quantitative and qualitative data during the analysis phase. The quantitative data were collected via closed-ended questions and rating scales based on an evaluation of the interface by the users. Further, the usage logs of the users were recorded to track the manner in which they interacted with the system. These records provided an idea of the user behavior [66]. The qualitative data were acquired via an open-ended question in the interviews and the observations of the participants’ behavior during the experiments.

Three mixed-method design strategies were presented in a previously conducted study [67]; our user study follows a convergent design strategy in which the qualitative and quantitative data are collected in parallel. The qualitative data were obtained based on the observations of the participants during the tasks and the semi-structured interviews conducted with respect to the participants after the tasks were performed. The observations and interviews enabled to understand the behavior of VI web users during the search process. These data also provided an in-depth understanding of the activities performed during the task with respect to the quantitative data, including the duration of the information-seeking stage, the query length, and the number of explored search results.

A Post-Study System Usability Questionnaire (PSSUQ) [68, 69] was used to assess the overall user satisfaction with respect to the system usability (lower scores indicate higher satisfaction). PSSUQ was used with minor modifications to ensure its compatibility with our study. The table of modified PSSUQ is included in the Appendix—A.

4.1 Study design

The observational study included 16 VI participants who performed complex search tasks on a default search engine and the proposed interface to compare their information-seeking behavior. Some participants were physically present in our laboratory, whereas the remaining participants remotely participated in the experiment; in this case, we connected to the participants’ machines using SkypeFootnote 2 and provided an NVDA screen reader add-on remote moduleFootnote 3 to allow them to access the interface at our laboratory. In this section, we describe the user study, including the participant sample, search tasks, data collection strategy, study setup, and data analysis methods.

4.1.1 Participants

We recruited 16 legally blind web users who differed in age, profession, and web search experience. The demographic information of the participants is presented in Table 4. We recruited the VI participants using dedicated email lists (10 participants) and a snowballing approach [70] (6 participants). The NVDA screen reader was used in the experiment, because all the participants were experienced with it.

4.1.2 Tasks

To conduct this study, we designed three tasks, from which the participants could select two, i.e., one to be performed using the Google search interface and the other to be performed using InteractSE. All three tasks were similar in terms of complexity.

As shown in Table 3, we created a task in which the user wished to seek information for studying abroad. This required the VI web user to make decisions based on different search results by analyzing and comparing the results and details. The same approach was applied to the business and travel tasks. We allowed the user to select their preferred tasks, provided that they did not involve a previously performed search [71].

Table 3 The proposed web search tasks

Since the tasks were unrelated, the information obtained for one task will not influence the experimental results obtained for another task. The choice of location depended on the geographical location of the participant. For example, if the participant was located in Qatar, then the search location would be in the UK, and vice versa.

4.1.3 Experimental procedure

The participants were invited to perform the web search activities in a university laboratory designed to mimic a natural work environment. For the participants in remote locations, we set up sessions through the screen-reader NVDA remote access, sent them URL/key to connect, and granted them access to our server at the university to monitor and record their search processes. We checked with the remote participants their access to our server, and there was no effect on the user experience due to the site, application, or screen sharing service through the NVDA screen-reader. The participants individually performed each search task at different times to enable us to collect separate observations and results for the two search tasks.

The consent forms and the research methodology were approved by the Qatar Biomedical Research Institute Institutional Review Board (QBRI‐IRB), approval number (2018–005). The participants were initially requested by email to sign a consent form indicating that video and logs would be captured, maintaining they remain anonymous. Further, they had to complete the pre-questionnaire for obtaining their demographic information and determining their proficiency level with respect to the usage of computers, screen readers, and web search engines. We provided equivalent training for all participants for using InteractSE interface. In the experiment, each participant was asked to perform two complex search tasks. One task was performed using the Google search engine, whereas the other was performed using InteractSE. We counterbalanced the order of the tasks on the search interfaces for the participants randomly to minimize any effect of the interface/task order on the collected data. An external video camera was used instead of the screen recording software to record the screen activities, because the screen recording software would adversely affect the response time of the screen reader [57]. After the completion of the two complex search tasks, we asked the participants to complete a post-study usability questionnaire PSSUQ. In addition, we conducted a semi-structured interview of the participants to follow-up on our observations during their search sessions.

4.1.4 Data analysis

Our primary source for data analysis was the video recordings of the participants’ interactions with the Google search engine and InteractSE. The videos were split into four information-seeking stages for analysis and were annotated using NVivoFootnote 4, which is a qualitative data analysis and video annotation tool. We used the post-study questionnaire and participant interviews for data analysis and summarization of the participants’ responses. Subsequently, we analyzed the participants’ feedback to an open-ended question using the RapidMiner StudioFootnote 5 software, which is a data science platform that provides an integrated environment for data preparation, text mining, and other analytical tools. The captured logs of the participants’ interaction with InteractSE were used to support our understanding and interpretation of the video data after the data were split into four information-seeking stages.

The data obtained from the semi-structured post-study interviews and questionnaires enabled us to obtain a better understanding of the searching behavior of the participants. We used the open and axial coding phases from grounded theory [72] to identify themes from the participants’ responses. In addition, we calculated our inter-rater reliability by giving the final generated codes to another researcher and asking him to choose the code independently for a random set of interviews; Cohen’s kappa measure revealed a high level of reliability with a value of 0.83.

We conducted a statistical test at a p value of less than 0.05 with a 95% confidence interval to verify the significance of the differences observed between the two search conditions (Google and InteractSE). Further, we performed a one-tailed paired t test using the data analysis statistical package in Microsoft Excel. The normal distribution assumption of the paired t test is satisfied when the data looks like a bell-curve as illustrated in Fig. 1. In addition, we considered the effect size (ES) [73], which is a standardized measure [74] used to objectively evaluate the intensity of an effect. The ES and statistical tests are integral factors for the data analysis of experimental results [75]. In this study, the ES metrics and their interpretation were based on the guidelines for Cohen’s d [76] and the expanded guidelines provided by Sawilowsky [77]. Both the studies contained descriptors for the ES magnitudes ranging 0.01 to 2.0.

Fig. 1
figure 1

The dependent variable is normally distributed

4.2 Preliminary design interface

In this section, we present InteractSE for search result clustering to help the VI web users find search results that are most relevant to their information needs. We present the interface flowchart, algorithm, and UI design.

4.2.1 UI flowchart and design

First, we must understand the response of the search engine to the submitted search query via static data analysis. This will enable us to understand the returned data structure of the expected response (returned search results) for future requests. Static data analysis of the search results scraped from the search engine results page (SERP) will allow us to identify the essential details, including the title, snippet, URL, and serial number assigned to each search result as a unique identifier, which should be obtained for the FCA clustering process [51]. Each result is treated as one document object containing four attributes: (1) the UID, which is the unique identifier of the document; (2) the title, which is the document headline; (3) a snippet, which provides a description; and (4) the URL, which is the link to the content of the material. The algorithm merges the title and the snippet to be analyzed as a single attribute using the FCA process to be identified as the description field. The description field is used later to determine the optimal concepts of the search results for the sent query, which requires a text preprocessing stage before the analysis stage. The preprocessing phase is described in Sect. 4.2. Our proposed interface flow is presented in Fig. 2.

Fig. 2
figure 2

The InteractSE flowchart for search engine results obtained from clustering based on the FCA algorithm

The Carpineto survey [52] presented the most crucial web search clustering engines with their classifications and features. The study addressed that some advances are required to provide better overviews for the clustered results to enable user interaction. The De Maio study [53] showed that FCA supports data organization and navigation model for the generated hierarchal representation, which enabled the user interaction for the extracted knowledge. The Negm paper [54] showed that many information retrieval systems were using FCA for browsing search results. These systems include: Conceptual REorganization of Documents (CREDO), FCA-Google (FooCA), and Portal Retrieval Engine based on Formal Concept Analysis (PREFCA). PREFCA gained high performance score in comparison with other techniques of Term Frequency Inverse Document Frequency (TF-IDF) and Latent Semantic Analysis (LSA).

InteractSE generates an overview by building a hierarchical tree structure of the returned search results. InteractSE was evaluated for accessibility with the screen-reader by experts [55]. In contrast, this paper presents a detailed view of blind users’ evaluation that makes it the basis for creating the improved design of InteractSE interface. This will enable VI web users to find the information that they need with less time and effort. The primary goal of the interface design is to provide the user with an overview of the search results that can enhance the user’s engagement. The interface comprises three components: (1) a search query at the entry area; (2) a tree view in which the search results are presented in a multi-level tree following a hierarchical order; and (3) a search result list that contains a description of the list of webpages that matches the selected tree node. From an interface and navigation design perspective [56], categorized overviews have a positive impact on the web search process to organize, classify, and link information for the end-users. Al-Thani [57] studied VI users and suggested two design recommendations for collaborative information seeking: (1) provide an overview of the search results; and (2) cluster search results make the navigation process faster with screen-reader. Additionally, interface design progress makes the browsing approach more efficient [26]. Therefore, our design is based on presenting the search results in a tree view and the related list view. Table 4 summarizes the main interface components; Fig. 3 represents the primary design interface; and Fig. 4 illustrates one search result that is opened in a new tab page.

Table 4 The component–action–response features of InteractSE
Fig. 3
figure 3

The preliminary design interface of InteractSE

Fig. 4
figure 4

Webpage of one search result that is opened in a new tab page

4.2.2 Text pre-processing

In the initial stage, the document titles require data cleansing for identifying and deleting the irrelevant parts of data. Clustering the relevant keywords as search terms is crucial. This supports data quality and users’ decisions. This stage comprises the following steps: (1) elimination of digits and stops words (e.g., a, an, at, and, that, and which); (2) elimination of other words containing less than three characters (e.g., hi, go, and id); (3) obtaining terms using Porter’s Snowball algorithm [58, 59] to eliminate suffixes and affixes for converting words to their English language roots; and, (4) splitting the text into keywords based on spaces, punctuation marks, and special characters (e.g., ?, !, $, and #) for tokenization. The elimination of digits, stop words, and other words reduce the number of keywords for FCA processing. The stemming step reduces the number of terms and data complexity [60].

4.2.3 Discovering optimal concepts with dynamic labeling via FCA

According to the FCA methodology, the search results should be converted into a binary relation for discovering the optimal concepts. In the context of this study, FC is the binary relation. The obtained terms represent the attributes, whereas the unique IDs of the documents represent the objects in the binary relation of the FC. We illustrate the algorithm for the user query example, “Qatar tuition fees for universities,” to obtain seven search results. In this example, the FC components, as described below, are presented in Table 5:

  • Objects The document objects are represented by the UID;

  • Attributes The terms extracted from the object description (title and snippet);

  • Relation This is the binary relation between the documents and attributes. If the term exists in the document (the object description), it is set to one and zero otherwise.

Table 5 FC for the user query “Qatar tuition fees for universities”

We use FC to identify the optimal concepts covering the entire relationship and represent these concepts [61] in a hierarchical structure [62]. Each concept group is a set of documents (objects defined by the UIDs arranged in rows) and is considered to be the extent that can result in a standard set of keywords (attributes represented by the keywords arranged in columns) being shared as the intent. Our proposed interface is built on a query-based summarization method to score and rank the document relevance for a given user query to obtain the optimal concepts covering the entire relation at each level within the level threshold. Our configuration, containing five levels as the navigation level threshold, complies with Miller’s renowned human short-term memory storage 7 ± 2 rule [63]. Further, Miller’s research denoted that the optimal depth range is between one and six levels of distribution for navigation hierarchies containing 64 navigation items [64].

During static data analysis, we observed that the results returned by scraping the search engine were 10 results per request by default, limiting the displayed results for each low-level tree. Therefore, we increased the returned results to 20 for obtaining an acceptable number of results in the lower tree levels within a reasonable response time. Our maximum navigation level was set to five, limiting the discovered concepts to a fair number of search results at each level for the exploration stage [51] and providing us with an ability to apply future enhancements that may include an increment of the returned research results from 20 to 60. According to Table 3, the discovered concept “Qatar” is the highest concept that covers the entire extent as the first level and is a single keyword assigned to the name of this discovered concept. “Fees” and “universities” are the first concepts on the second level, taking the higher weight (more instances with a value of 1) of the related intent and its order to be assigned as the name of this newly discovered concept as per the term frequency (TF) defined in Eq. (2):

$$TF_{t} = \left( {number \, of \, times \, term \, t \, appears \, in \, all \, documents} \right)/\left( {total \, number \, of \, terms \, in \, all \, documents} \right)$$
(2)

The TF weight is a statistical measure used to evaluate the importance of the term t with respect to a collection of documents. The concept weight depends on the total number of occurrences of the term in all the returned documents, and the binary relation of FC returns one for the term existence and zero otherwise. The discovered concept “top” is selected as the second concept of the second level with child nodes “schools” and “rank” considered as the nodes of the final tree level. The tree structure for this result is presented in Fig. 5.

Fig. 5
figure 5

Tree structure of the discovered concepts for the search “Qatar tuition fees for universities”

At each level, the algorithm filters the intent related to the discovered concept and all the remaining extents with zeros for presenting to the next level. Then, the algorithm continues to discover the subsequent concepts at the next lower level until the complete relation is revealed.

4.2.4 Search results presentation and interactive exploration

Search results exploration is the most challenging phase in the information-seeking process [55]. Adding to that, listening to a screen-reader during the exploration phase increases the user’s cognitive load, and cognitive load is a fundamental aspect of search system design [65]. In this section, we compare the Google search engine interface and InteractSE interface when explored using a screen reader in Figs. 6 and 7. This comparison shows how InteractSE could reduce the required cognitive effort to perform a search task and hence support the information seeking process.

Fig. 6
figure 6

Google search engine interface

Fig. 7
figure 7

InteractSE interface

The Google search engine presentation in Fig. 6 shows that organic search results could come after other SERP elements such as the featured snippet that provides direct answers to the user’s question, the knowledge graph that presents information to the user besides search results in an infobox, and paid ads. Regular organic search results consist of title, URL, snippet, and may contain additional site links below their snippets. While the VI users are navigating Google search results sequentially with the keyword, they need to distinguish between organic search results and other SERP elements. This process causes a heavy cognitive load, which can have a negative effect on task completion.

The InteractSE presentation in Fig. 7 shows that clustering the discovered concepts at the search results tree enables the VI users to navigate to a different search results list based on the selected concept node. This interactive design supports the users to continue the exploration phase without distracting their attention with other SERP elements. The users could change the search results list by choosing a different concept during the same search results exploration phase, which is different compared with the Google search engine scenario where the user has to reformulate the query and repeat the whole process to change the search results list. Additionally, users could make better decisions with the summaries and overviews [40] and shortlist (Schnabel et al. 2016).

5 Results

In this section, we present an overview of the participant search experiments using the Google search engine and InteractSE. We made a comparative analysis of the participant search behavior using the two interfaces based on the four-phase framework of the information-seeking process model, which divides the process into the query formulation, action, search result exploration, and refinement phases [18]. However, in this study, we present our observations according to the research questions RQ-1 and RQ-2 in Sect. 2 for the search result exploration phase.

5.1 Experiment overview

We used the time required to complete the tasks during experiments as the basis for our comparison of the two interfaces. As shown in Table 6, the average time spent on InteractSE was not significantly higher than that spent on the Google search engine by the 16 participants (t(15) = 0.116, p = 0.455, d = 0.042), because no statistical significance was observed among the data and the estimated ES was considerably small. We compared the times spent by all participants for exploring the search results. We observed that compared to the other participants, participant number 16 (P16) spent more time exploring the search results using InteractSE but was within the minimum exploration time range when using the Google search engine (20–30 s). We determined that excessive time was spent in using InteractSE, because this user repeatedly explored the navigation tree, which consumed a considerable amount of time. P16 explored the search results in 127 s that exceeds the outlier upper bound of 121 s for InteractSE, i.e., the third quartile plus 1.5 multiplied by the interquartile range, (Q3 + 1.5(IQR)) [78], as shown in the boxplot in Fig. 8. Therefore, according to the interquartile ranges, we decided to exclude P16 from the sample population.

Table 6 Mean time spent exploring the search results [standard deviation] (minimum–maximum)
Fig. 8
figure 8

A boxplot indicating the quartiles and outliers for the time spent in exploring the search results

After excluding P16, the ES indicated a medium positive effect (d = 0.47) (with 15 participants), indicating that the mean time spent exploring the search results using InteractSE was significantly lower than that spent using the Google search engine. In addition, the p value decreased for the 15 participants (t(14) = 1.76, p = 0.04, d = 0.47), indicating the significance of our result.

Finding 1 The InteractSE interface helped participants to complete a complex search task more efficiently when compared with another task having the same complexity level using the Google search engine interface.

5.2 Search result exploration

RQ-1 Can search results overviews enhance the VI users’ information seeking experience and efficiency?

We considered the exploration phase of search results from two different perspectives, i.e., the average time spent exploring the search results and the number of search results viewed per participant. The average time spent exploring the search results is discussed in Sect. 5.1. Participants viewed fewer search results using InteractSE as shown in Table 7. The difference was significant (t(14) = 4.795, p = 0.00014, d = 1.49) with a considerably large ES. This indicates that the presented InteractSE structure helped to reduce the number of viewed search results. The Google search interface presents each search result to be individually read by the screen reader, whereas InteractSE presents a cluster of search results; this reduced the exploration phase by almost 50%.

Table 7 Mean number of the viewed search results (standard deviation) (minimum–maximum)

Finding 2 The presentation of search results overview by InteractSE via a clustering mechanism in multi-level tree nodes allowed the participants to reach the target website quickly. The participants explored fewer search results compared with the number of search results explored using the Google search engine interface.

The search result links that were clicked by a user directly through the SERP are defined as the direct visited links for the target webpages. The external visited links for webpages are reached via another webpage and not directly through the SERP. The mean number of direct visited links for InteractSE was not significantly lower than that for Google (t(14) = 1.38, p = 0.09, d = 0.58); however, the ES was slightly greater than medium, indicating that the mean number of direct visited links for InteractSE was significantly lower than that for the Google search interface, as shown in Table 8.

Table 8 Mean number of direct visited links (standard deviation) (minimum–maximum)

The mean number of external visited links for InteractSE was zero, i.e., none of the participants visited a webpage via an external link; this is contradictory to the result obtained using the Google search interface (t(14) = 1.74, p = 0.05, d = 0.68), as shown in Table 9. Here, the ES value also showed a medium positive effect, indicating that the mean number of external visited links is significantly lower (at zero) while using InteractSE than that for Google interface.

Table 9 Mean number of external visited links (standard deviation) (minimum–maximum)

Finding 3 The InteractSE interface design significantly reduced the average number of both direct and external visited links required to obtain the target information.

During the experiments, three participants visited the external links from the websites that they reached through the Google SERP, because the information that they were searching was not found on the webpage that they were exploring. However, the page that they were exploring contained a link to the target webpage with the required information; therefore, the participants needed to follow the external link.

5.3 Post-study user evaluation

In this section, we present the observations of the post-study user evaluation of InteractSE. Here, quantitative data were acquired via the satisfaction questionnaire, whereas qualitative data were obtained from our observations, participant-reported problems, comments, recommendations, and answers to the open-ended question after the completion of the study.

5.3.1 Quantitative results of the adapted PSSUQ

The first part of the post-study evaluation was based on user satisfaction. The PSSUQ rules to calculate the user satisfaction score were based on the following four subclasses: (1) The system usefulness (SYSUSE) was evaluated based on the average of the responses to the first six questions; (2) The information quality score (INFOQUAL) was obtained based on the average of the responses to questions 7–10; (3) The interface quality (INTQUAL) was scored based on the average of the responses to questions 11–14; (4) The OVERALL score was calculated based on the average of the responses to all the 15 questions. The scores ranged from 1 to 7, with lower scores indicating higher satisfaction, as illustrated in Table 10.

Table 10 Summary of user satisfaction

Table 10 indicates that the overall participant score for InteractSE was 1.56 and that the interface quality was the lowest satisfaction subclass. As shown in the detailed PSSUQ in Fig. 9, there were four questions related to the interface quality; Table 11 presents the satisfaction levels for these subclass questions. These results indicate that InteractSE requires user experience and accessibility investigations to fulfill user expectations. The surveyed users thought that some of the functional and accessibility features of our system could be improved. These features are explained in detail in the next sections.

Fig. 9
figure 9

Results of the adapted post-study system usability questionnaire and other measurement items added to the subclasses

Table 11 User satisfaction with respect to the interface quality

Next, we describe the results obtained from the post-study interviews with the participants to illustrate the additional functionalities and accessibility features that they requested.

5.3.2 Post-study interview analysis

We used the open and axial coding phases of grounded theory [72] to analyze the participants’ answers to the post-study interview open-ended question: “How can we improve this interface?”.

The participants reported their experiences in their own words, and we categorized their feedback as follows. If the participants were satisfied with a set of features, we identified these features as the “enabling features.” If the participants required a second set of features to improve user experience, we identified such features as the “recommended features.” The main categories and coding processes are shown Fig. in 10. The coding process illustrates that InteractSE has an expandable navigation tree with dynamic keyword labeling that can cluster the information obtained using the search results. The user experience design is another subcategory of the enabling features that include easy and straightforward interface codes. Finally, the target result subcategory indicates no advertisements and that a user can find the target results in less time.

Fig. 10
figure 10

Coding process for the design features

The feature list is presented in Fig.  10. For example, open code (2.1.A) refers to (2) recommended features, (1) customize-manage-results, and (A) custom-tree-list- keywords. The participants showed interest in recommended features, such as customizing the tree levels and listing the keywords shown on a tree (2.1.A) as well as providing a “next results” button to show the next 20 search results from the search engine (2.1.D). Moreover, the participants indicated that splitting the search result list into two levels for the title and the snippet in an expandable manner (2.2.A) would be helpful. Finally, the participants were interested in a feature that would provide location feedback for the interface components and the navigation tree level to assist the user at any step of the process with the question “Where am I now?” (2.2.E). All these open codes were arranged in the customize-manage-results, interaction-style, and facilitation-supporting subcategories, as shown in Fig. 10. The recommended features are explained in the discussion section.

Participants praised the InteractSE interface features that were not available in the Google search engine interface. These features allow the users to view the navigating tree in which the search results are clustered. Comments from different participants included the following: “This is very helpful to categorize search results,” “easier to narrow search results,” “clustering search results with expandable tree-view,” and “meaningful keywords for tree nodes.” Participants enjoyed the ability of InteractSE to provide an easy and simple interface to browse search results stating that InteractSE is “pretty simple and easy to use,” and “no need to navigate to different parts of the SERP page.” Participants using InteractSE interface provided statements such as, “Find target information quickly” and “no ads between search results list.” However, some participants reported that it would be an advantage for the InteractSE interface to include additional features for the enhanced version in future experiments. One such participant said, “It would be great to have bookmark management tool to enable quick access for the reached search result and read it later.” Another stated, “It would be helpful to have hotkey to access search, tree and results-list areas (for example by using: Alt + S, Alt + T and Alt + R).” One more participant stated, “Having available online training for using the interface would be practical for the end-users as guidance with a small introduction.”

5.4 Summary of results

Our findings are validated across quantitative and qualitative data. Collected data allowed quantification of answers to the post-study interview open-ended question. The research showed quantitative difference for the use of InteractSE interface compared to the Google search engine interface for the means of exploration time in seconds (38.33 vs. 47.17), number of viewed search results (3.33 vs. 6.73), and number of direct visited links (1.06 vs. 1.27). Then the participants revealed InteractSE high usability score as reported PSSUQ questionnaire (1.56). The open-ended questions interview protocol (qualitative method) also revealed indicators of ease of use and additional advantage feature for interface design (Cluster-Navigate-Tree results) that led to the fewer average number of viewed search results. Almost 64% of all participants’ comments were focused on the enabling features of InteractSE interface (easy, simple, rapid exploration for the search results, and results clustering through the navigation tree). In comparison, 36% of all comments were focused on the other recommended features (expand summarization to the list results, provide a management tool for the search results, and provide interface support) and that to be available in the future for the updated interface. Finally, we can conclude that the participants were performing better on the InteractSE interface, and they were satisfied with the current interface version and its enabling features.

6 Discussion

RQ-2 What are the possible enhancements that could be applied to the search engine interface to meet the current needs and future expectations of the VI web users?

The following design suggestions were obtained via the data analysis and results. These suggestions are related to the findings listed in Sect. 5 and are grouped into two main improvement categories: search result exploration and interface design.

6.1 Search results exploration

Finding-1 showed that the participants benefited from clustering and dynamic labeling of the clusters to obtain a summary and overview of the search results, because less time was required to view the tree results. InteractSE splits the search result at the exploration phase into two stages: the tree exploration stage for the cluster search results and the list exploration stage for the individual search results of the selected tree node. Each search result node contains the title and snippet for each result, reducing the time required for the participant to listen to the complete details. Figure 11 shows that two participants (P14 and P15) spent 400% more time exploring the list results when compared with the tree node results; further, three participants (P1, P6, and P13) spent almost 83% of their time for exploring the list results, which was the same amount of time that they required to explore tree results. Finding-2 showed that the InteractSE interface presented an overview at the tree exploration stage that allowed the participants to explore fewer search results. The same summarization approach could be applied for each search result on the list. That could enable the user to have another detailed overview of the result at the list exploration stage before clicking on the link and visiting the website. Finding-3 showed that the InteractSE interface design reduced the visited (direct/external) links of the search results. The same design could be applied to the list UI component that might lead to the same effect to the search process. All these findings led to a chance for a promising improvement to the list exploration stage.

Fig. 11
figure 11

Comparative exploration time for the tree versus list items

Design Suggestion 1 To support the information exploration stage, the user could be provided with a multi-level hierarchy of the search results.

For example, as per our scenario, the first level could provide the title of the search results, whereas the second level could provide the snippet. Additional levels would allow the user to reveal information in an incremental manner as a preview of the main content of the target website.

6.2 Interface design

Our interface design suggestions are based on the recommended features revealed using the analysis of the open codes and subcategories.

6.2.1 Customize and manage the results

We observed that the participants exhibited different search preferences. Some participants preferred a low number of tree levels for clustering the search results, whereas others preferred more than five tree levels, which is the current depth level of InteractSE. This is also applicable to the number of keywords in the tree nodes. These observations refer to open code (2.1.A), and the participant’s preferences can be generalized as follows.

Design Suggestion 2 Allow the user to define the level of overviews extracted from the returned list of results. This will provide the user with the freedom to reshape the structure of the results.

In our scenario, the maximum hierarchy tree level was five, based on the human short-term memory 7 ± 2 rule [63]. Other parameters, such as the maximum number of keywords to be presented in the tree node, could be used to customize the results.

With respect to the creation and management of the saved searches, the open codes (2.1.B) and (2.1.C) denoted the user interest in using InteractSE and searching for additional features to manage and share their search results.

Design Suggestion 3 Allow the user to manage and store the search results and benefit from the obtained information later.

In InteractSE, a bookmark feature allows the user to manage the search results using the favorite option to return to the results at any time and using the copy option to share a link to these results with the remaining users.

While reviewing the search results, P4 and P11 commented that they did not find a “more search results” option on the interface. They suggested (2.1.D) implementing an option that would allow them to proceed to the next search results or the next page, because InteractSE only retrieves 20 results to build the search result tree. We can generalize this design suggestion to allow more results and customize the returned search results by providing the user with an option to set the maximum returned query size. The user would not prefer to wait for the interface to find that many matches nor would the user want to see that many matches in a single page of search results.

Design Suggestion 4 In the exploration phase of search results, allow the user to set the number of search results that could be returned to build a customized tree.

We can apply design suggestion 4 to InteractSE by parameterizing the interface requests to the search engine. Such parameters can include the number of returned results per request.

6.2.2 Interaction style

Open code (2.2.A) indicates that the participants are willing to extend the overview, which is the summary of the returned search results, from the tree level to a preview level. A preview would provide a summary for each individual search result at the list level that can be expanded.

Design Suggestion 5 Allow users to customize their search results according to the information they need and the required tree depth for the results to be presented.

The application of design suggestion 5 to the search interface will allow the users to control the amount of information that they would listen to via the screen reader rather than listening to the first part and skipping other parts that may contain important details. The user would be able to expand a particular search result to obtain further details before navigating to the actual website. In our case, i.e., InteractSE, the root node for each list item would be the title, which could have two child nodes. The first child node would be for the snippet, and the second child node would be a summary of the main content of the webpage, which would be built using an incremental approach that should be extended as requested by the user. This means that an initial summary would be generated for the first section of the main content and that a summary could be generated for the next section based on user interactions and requests.

The participants also requested the addition of hotkeys to facilitate navigation to the interface components (e.g., search query, results tree, and results list) (2.2.B).

Design Suggestion 6 Implement hotkeys for different stages of the information-seeking process for specific actions.

We can enhance the InteractSE user experience by assigning hotkeys to navigate between different interface components or tab pages and create a new bookmark for the website on the current tab page.

Currently, with InteractSE, users need to use a keyboard to input search queries, which may limit the user interaction with the interface. Participants (2.2.D) recommended the implementation of other methods for entering search queries.

Design Suggestion 7 Provide alternative methods of interacting with the interface.

For InteractSE, we can provide users with different options to enter the search query, i.e., using a keyboard or via speech recognition.

InteractSE has three different cursor locations (i.e., search query, results tree, and results list). Therefore, users need to be aware of their status in the search process to know their current location on the interface (2.2.E).

Design Suggestion 8 Implement feedback for the cursor location on the interface.

The implementation of the cursor location feedback will provide an answer to the user question “Where am I?”.

6.2.3 Facilitating and supporting features

The participants recommended the use of interface system support, including an introduction to the interface components and other training materials that would cover all the information-seeking phases involved in the process. This suggestion was presented in open code (2.3.A).

Design Suggestion 9 Provide an accessible context-sensitive help system that provides detailed support at any specific point on the interface.

InteractSE could use sound recognition ("Speech Recognition Anywhere," 2019) [79] to provide a voice-based virtual assistant (VA) to support users at any phrase of the search process ("VERSE: Voice. Exploration. Retrieval. Search," 2019) [80]. A recent study has shown that voice-based VAs can improve the navigation systems [81].

The InteractSE UI was implemented in English. The majority of the Arab participants recommended that the interface should support the Arabic language (2.3.B).

Design Suggestion 10 Add multilingual UI support.

Multilingual support should adhere to multilingual design principles [82]. The InteractSE users could switch to the preferred language.

7 Conclusion and future work

This study proposed and examined InteractSE, a search interface to support VI web users with complex search tasks, and presented an evaluation of the interface based on 16 VI web users and our observations. Many previous studies have suggested that an overview of search results would assist VI web users during web searches [1, 22]. Therefore, we designed an interface to summarize the results and enable VI web users to perceive, understand, navigate, and interact with the search results easily. The system used unsupervised machine learning to cluster the search results based on FCA theory.

We evaluated InteractSE based on quantitative data, i.e., the time spent to complete a search task and the number of viewed and visited search results, and qualitative data, i.e., the answers of participants on a post-study usability questionnaire and comments concerning the interface. The evaluation results were significant and promising, indicating that the proposed search interface features related to the layout design and clustering summary are key improvements to provide a more effective and enjoyable search experience for VI web users.

In the future, we intend to increase the scraped search results to 100 items to investigate the improvement from the clustering algorithm at the lower tree levels. Then we will extend the layout design and summary technique from an overview level for clustering search results to a preview level for each individual item in the search result list. The overview provides the searcher the shared concepts in the collection of documents that are related to the query as a summary of the search results. The preview focuses on a single document and provides a document summary that is also related to the initial query. Marchionini [39] gave an example of TreeMaps as a good technique for the exploration stage for both the overview and preview levels. A summary of each search result would allow the users to expand the summary within the scope of the initial query starting from the webpage title and description and ending with its main content. The summary approach would be incremental, i.e., a summary would be generated for the main content of the webpage based on the user request prior to navigating to the search result URL. In this scenario, the user could obtain the required information without visiting the actual webpage.