Mammography to tomosynthesis: examining the differences between two-dimensional and segmented-three-dimensional visual search

Adamo, Stephen H.; Ericson, Justin M.; Nah, Joseph C.; Brem, Rachel; Mitroff, Stephen R.

doi:10.1186/s41235-018-0103-x

Mammography to tomosynthesis: examining the differences between two-dimensional and segmented-three-dimensional visual search

Original article
Open access
Published: 14 June 2018

Volume 3, article number 17, (2018)
Cite this article

Download PDF

You have full access to this open access article

Cognitive Research: Principles and Implications Submit manuscript

Mammography to tomosynthesis: examining the differences between two-dimensional and segmented-three-dimensional visual search

Download PDF

Stephen H. Adamo¹,
Justin M. Ericson¹,
Joseph C. Nah¹,
Rachel Brem² &
…
Stephen R. Mitroff¹

2597 Accesses
9 Citations
Explore all metrics

Abstract

Background

Radiological techniques for breast cancer detection are undergoing a massive technological shift—moving from mammography, a process that takes a two-dimensional (2D) image of breast tissue, to tomosynthesis, a technique that creates a segmented-three-dimensional (3D) image. There are distinct benefits of tomosynthesis over mammography with radiologists having fewer false positives and more accurate detections; yet there is a significant and meaningful disadvantage with tomosynthesis in that it takes longer to evaluate each patient. This added time can dramatically impact workflow and have negative attentional and cognitive impacts on interpretation of medical images. To better understand the nature of segmented-3D visual search and the implications for radiology, the current study looked to establish a new testing platform that could reliably examine differences between 2D and segmented-3D search.

Results

In Experiment 1, both professionals (radiology residents and certified radiologists) and non-professionals (undergraduate students) were found to have fewer false positives and were more accurate in segmented-3D displays, but at the cost of taking significantly longer in search. Experiment 2 tested a second group of non-professional participants, using a background that more closely resembled a mammogram, and replicated the results of Experiment 1—search was more accurate and there were fewer false alarms in segmented 3D displays but took more time.

Conclusion

The results of Experiments 1 and 2 matched the performance patterns found in previous radiology studies and in the clinic, suggesting this novel experimental paradigm potentially provides a flexible and cost-effective tool that can be utilized with non-professional populations to inform relevant visual search performance. From an academic perspective, this paradigm holds promise for examining the nature of segmented-3D visual search.

Mammographic Density Effect on Readers’ Performance and Visual Search Pattern

An Investigation into the Consistency in Mammographic Density Identification by Radiologists: Effect of Radiologist Expertise and Mammographic Appearance

Article 11 August 2015

A curated mammography data set for use in computer-aided detection and diagnosis research

Article Open access 19 December 2017

Find the latest articles, discoveries, and news in related topics.

Medical Imaging

Significance

This study is the first step in establishing a new paradigm to examine segmented-three-dimensional (3D) visual search that can be used with professional and non-professional searchers, which has theoretical and real-world implications. Theoretically, while there is a long history of studying the nature of visual search in cognitive psychology, visual search in a segmented-3D environment has been relatively unexplored. This new paradigm can add to existing classic theories with the potential to generate novel ones. Practically, it is possible to learn about breast cancer detection by studying how non-professionals search for targets in a two-dimensional (2D) environment compared to a segmented-3D environment. The findings from this study replicated the results typically found in radiology when comparing breast cancer detection in mammography (a 2D radiograph of breast tissue) to tomosynthesis (a segmented-3D tomogram of breast tissue)—search in 2D is less accurate and quicker. Importantly, the results were found with both professional and non-professional populations, suggesting this may be a general search attribute that can be observed in different populations. Since professionals are difficult to recruit as participants, it is potentially quite powerful that this paradigm can be run with non-professional participants knowing they demonstrate similar patterns in search performance found within radiology. Likewise, it is promising that we replicated the pattern of results found with tomosynthesis radiographic images with the use of simple stimuli given that there is more experimental control with simple stimuli and they can be shown to non-professionals. The primary significance of this project is that it takes the first steps in establishing a new cognitive psychology paradigm that can inform the growing field of tomosynthesis for breast cancer detection.

Background

For decades, mammography has been the primary tool of choice for radiologists who are tasked with detecting breast cancer (Bleyer & Welch, 2012). Mammography is the process of creating a single 2D image that represents an entire breast (a mammogram) and then examining that image for signs of cancer. Despite extensive training and often years (or decades) of experience, radiologists are not perfect and will miss present abnormalities in a mammogram (Rosenberg et al., 1998).

Radiologists can miss a cancer for a variety of reasons and the focus on the current study is on one particular cause—the limits of human cognitive processing. This is a well-known contributor in radiological misses, with several robust examples. For example, radiological misses are more likely to occur when targets are rarely present (Evans, Birdwell, & Wolfe, 2013) and after another abnormality has already been found in the same read (e.g. Berbaum et al., 1991). These two specific negative impacts on radiological success (low target prevalence and “satisfaction of search,” respectively) have been studied by radiologists for decades, with several insights gained. Interestingly, these two specific cases have also been examined via basic psychology studies that have used simplified displays and non-professional searchers (e.g. undergraduate participants in a psychology study). Such psychology studies have been able to provide radiologically relevant conclusions about the limitations of cognitive processing related to target prevalence (e.g. Van Wert, Horowitz, & Wolfe, 2009; Wolfe, Horowitz, & Kenner, 2005; however, see Fleck & Mitroff, 2007) and target number (e.g. Adamo, Cain, & Mitroff, 2013, 2017; Fleck, Samei, & Mitroff, 2010).

To overcome the limits of human cognitive processing, radiology has often looked to technological aids. The goal is to use advances in technology to counter inevitable cognitive failures. For example, computer-aided detection (CAD), a recognition software that highlights potential cancers in a radiograph, can be used by radiologists to potentially aid them in cancer detection (e.g. Lehman et al., 2015). However, while CAD can improve search performance (e.g. Brem, Hoffmeister, Zisman, DeSimio, & Rogers, 2005; Zheng et al., 2001), it is not infallible and has been shown to lead to no improvement in accuracy (e.g. Lehman et al., 2015) and even more misses under conditions when multiple abnormalities are present (e.g. Berbaum, Caldwell, Schartz, Thompson, & Franken Jr., 2007).

The radiological field of breast imaging is currently undergoing a new technological shift to improve breast cancer detection by changing how radiologists view medical images. Specifically, imaging is moving away from mammography, a 2D imaging technique, to tomosynthesis, a segmented-3D imaging technique. Unlike mammography, where the volume of the breast is compressed into one 2D image, tomosynthesis is the process of dividing the volume of the breast into many segmented images (i.e. slices) to create a segmented-3D display. With tomosynthesis, radiologists have the ability to search in depth by moving from slice-to-slice allowing them to better distinguish signs of cancer from normal breast tissue. Tomosynthesis has been a success to date; with tomosynthesis, radiologists tend to make fewer false positives/false alarms (e.g. incorrectly indicating a benign mass as malignant; Durand et al., 2015; Friedewald et al., 2014; Skaane et al., 2013) and detect cancer more often (Ciatto et al., 2013). However, this improvement comes at a cost as it takes radiologists significantly longer, even up to double the amount of time, to evaluate a patient with a combined tomosynthesis and mammography read compared to mammography alone (Bernardi et al., 2012; Dang, Freer, Humphrey, Halpern, & Rafferty, 2014; Michell et al., 2012; Zuley et al., 2010). The increase in evaluation time is not a subtle point as this has put enormous stress on the workload of radiologists. In general, overwork can lead to several negative outcomes, including more missed cancers (e.g. Krupinski, Berbaum, Caldwell, Schartz, & Kramer, 2012) and legal concerns (e.g., Berlin, 2000).

Beyond the clinical effects of tomosynthesis, there is little work on the nature of searching in a segmented-3D environment. While there is a growing literature of visual search in 3D environments created through stereoscopic techniques (i.e. presenting different views to the right and left eye to induce a 3D effect; e.g. Finlayson, Remington, Retell, & Grove, 2013; McIntire, Havig, & Geiselman, 2014), only recently has there been research on searching through environments with successive slices that allow the viewer to move in and out of the depth plane (e.g. Drew et al., 2013; Wen et al., 2016). For example, one recent study (Wen et al., 2016) found that different search styles within a segmented-3D environment changes what is more salient to an observer. Specifically, when observers “drilled” (i.e. staring at a region within a segmented-3D environment and rapidly scroll from slice to slice through the depth plane) they were drawn toward 3D dynamic motion saliency, and when observers “scanned” (i.e. searching over a large area of a given slice before moving to the next slice) they were drawn toward 2D saliency. With the ability to utilize different search styles and the ability to search in depth, this raises questions as to what the cognitive processes underlying segmented-3D search are and how they compare to that of 2D search.

The current study used a simplified cognitive psychology paradigm to examine 2D and segmented-3D search in both professional (radiology residents and certified radiologists) and non-professionals (undergraduate students). Beyond the theoretical reasons for comparing search performance in a 2D and segmented-3D environment, the main motivation for conducting this experiment was to explore whether the results would yield similar findings to that of radiologists when using mammography and tomosynthesis in practice. If this novel lab-based task replicated the pattern of results found within radiology (i.e. segmented-3D search/tomosynthesis revealing decreased false alarms, increased hits, and increased search times), the control and flexibility of this program could be used to better understand what underlies the differences in search performance. Another key motivation for this experiment was to compare performance between professionals and non-professionals. If non-professionals performed similarly to professionals, future research could explore segmented-3D search with non-professionals and gain insight as to how professionals would perform with tomosynthesis. Since professionals are difficult to access as participants in research studies (due to their time constraints), and there is less experimental control when using medical images, testing non-professionals in a laboratory based, segmented-3D search would be a faster, easier, and more flexible alternative. This experimental path mirrors prior efforts from our research team that created a simplified paradigm that could be used with non-professionals (Fleck et al., 2010) to potentially inform radiological questions (e.g. Adamo, Cain, & Mitroff, 2015; Cain, Dunsmoor, LaBar, & Mitroff, 2011; Cain & Mitroff, 2013).

To preview the results, professionals and non-professionals replicated the pattern of results found within radiological practice—there were fewer false alarms, better accuracy, and longer response times in segmented-3D search compared to 2D search. Despite many differences between the two participant populations (e.g. search experience, age), there were no significant differences in search performance.

Experiment 1

Methods

Participants

Professionals

A total of 30 participants composed of radiology residents and certified radiologists were recruited from the Radiological Society of North America conference in Chicago, Illinois between 27 November 27 and 2 December 2016. This sample size was determined by how many professionals could be recruited at the conference. The participants had no restriction on their specialty (e.g. breast, thoracic, general) and were entered in a drawing for a chance to win a GoPro Hero 4 for their participation. Three participants were not included in the final analysis: two were removed due to experimental error during data collection and one for quitting the experiment early, leaving a total of 27 participants (14 radiology residents and 13 certified radiologists). The 27 participants’ age range was 24–65 years (radiology residents: age range = 24–39 years, M = 30.69, SD = 3.88; certified radiologists: age range = 33–65 years, M = 46.92, SD = 10.66) and there were 11 women and 16 men. The average number of cases evaluated per week was in the range of 50–600 (radiology residents: range = 50–300, M = 150.91, SD = 93.96; certified radiologists: range = 100–600, M = 340.00, SD = 207.67). There were overall no differences in search performance between radiology residents and certified radiologist (see Appendix 1), so they were treated as a single population group (professionals) when compared to undergraduate students.

Non-professionals

A total of 31 undergraduate students (non-professionals) were recruited from The George Washington University to approximately match the number of recruited professionals. They had no radiology experience and received course credit in exchange for participation. There were 20 women and 11 men with an age range of 18–23 years (mean age = 18.8 years; SD = 1.0).

General procedures

Participants sat approximately 45 cm (with no head restraint) from the center of a 13-in. MacBook Pro laptop computer. Stimulus displays were presented using Matlab software (The MathWorks, Natick, MA, USA) and Psychophysics Toolbox 3.0.12 (Brainard, 1997; Pelli, 1997) at a resolution of 800 × 600 pixels. The search displays were constructed in 3D space, filling in a cube array that was 600 × 600 × 600 voxels.^{Footnote 1} This cube was then trimmed into a 600-voxel diameter sphere. To generate a cloudy background akin to a mammogram, 250 ellipsoids (i.e. clouds) in the range of 50–350 voxels were created and randomly placed in the sphere. Afterwards, a 3-voxel Gaussian filter was applied twice to smooth the image.

Target-present displays contained one T-shaped target and 99 L-shaped distractors and target-absent displays contained 100 L-shaped distractors. Each item (target T or distractor L) could be rotated 0, 90°, 180°, or 270° along the y-axis (see Fig. 1) and was 15 × 7 × 17 voxels (0.89° × 0.64° × 1.3°) for the X, Y, and Z coordinates, respectively. The T-shaped targets were constructed with two perfectly aligned perpendicular cross bars and the two cross bars were offset by 3 voxels (0.27°) to form non-perfect, L-shaped distractors. Both the targets and distractors had a 3-voxel gap between the cross bars. Colors of the search items were randomly selected within a gray-scale range of 47–63% white. The search items were randomly placed within a 15 × 15 × 15 location matrix for the 3D displays. The matrix was transcribed into the sphere and any cells that overlapped the perimeter of the sphere were removed, so no target or distractor could appear outside the display area. The search items were then jittered by 0–16 pixels along the x- and y-axes and by 0–32 pixels in the z-axis for the 3D displays.

Spheres were either compressed into a single “flat” plane for the 2D-search displays (see Fig. 1a) or were divided and compressed into 30 different slices for the 3D-search displays (see Fig. 1b and c). When compressing the sphere array for the 2D-search displays the average pixel color (for both items and the background) for a given x- and y-coordinate voxel was averaged between the middle three slices to create a single, 2D search plane. For the 3D search displays, each slice was similarly computed by taking the average colors of the pixels across every 20 voxels’ z-coordinate. This process created 30 slices per display, effectively making it a “segmented-3D” display wherein search items could be contained within one slice or across two slices.^{Footnote 2} This process caused a high probability that the search items would overlap in the 2D displays and a low-probability of overlap on each slice in the segmented-3D displays (see Fig. 1).

A tick bar appeared on the right side of the displays (see Fig. 1), with each tick representing one slice in the depth plane. A marker moved as participants traversed from slice to slice and indicated which slice the participant was currently on when searching through the segmented-3D displays. The slice number was also presented at the top left of the displays (see Fig. 1). For the 2D-search displays, there was a single tick mark and the number in the top left corner displayed the number “1.”

There were four practice trials and 24 experimental trials. The practice trials were separated into two blocks (2D, segmented-3D) and the experimental trials were separated into four blocks (two 2D blocks and two segmented-3D blocks) with an equal number of trials per block. Block order was randomized for the practice. Block order was also randomized for the first two experimental blocks and then again for the last two experimental blocks. The trials in the last two experimental blocks were repeats of the trials from the first two experimental blocks, but in the opposite display type (e.g. the x and y coordinates for targets and distractors in the segmented-3D trials from the first half of the experiment were repeated for the 2D trials in the second half of the experiment). There was an equal number of target-present and target-absent trials with a randomized and equal distribution of trials per block.

For the segmented-3D displays, participants used a mouse wheel to scroll from slice to slice and for the 2D displays the mouse wheel was not used. Participants were instructed to indicate the target location via a mouse click and press the spacebar if they believed no target was present. The trial ended once either a mouse click was made or spacebar was pressed. If participants reached a 60-s time limit without making a response, the trial ended and was considered a “timeout.” The tick bar turned yellow after 50 s and then red after 55 s to inform participants that the trial was about to end. The next trial loaded during the inter-trial interval and started immediately once loaded. Similarly, after each block, the next block automatically began. The experiment took approximately 30 min in total.

Planned analyses

The primary goal of Experiment 1 was to explore whether the results would yield similar findings to that of radiologists when using tomosynthesis and mammography. When comparing tomosynthesis to mammography, there are fewer false alarms (e.g. Skaane et al., 2013), improved cancer detection (Ciatto et al., 2013), and a significantly longer time spent evaluating a patient’s case (e.g. Bernardi et al., 2012). As such, the three respective key measures of interest for Experiment 1 were false alarm rate, hit rate, and target-absent response time. Target-absent response time was assessed because it represents how long a participant will spend searching before deciding to quit (Chun & Wolfe, 1996). There are additional measures of interest that are standard dependent variables in visual search experiments (e.g. timeout rate, target-present response time) but they were not primarily relevant for the current study and can be found in Appendix 2.

Note that trials where participants timed out were excluded for the hit rate analysis and trials where participants timed out or false alarmed were excluded from the target-absent response time analysis. Also, while the goal of repeating trials from experimental blocks 1 and 2 in blocks 3 and 4 in the other display type was to explore differences between the repeated and initial trials, there were no meaningful differences based on repetition. Specifically, there was no significant difference in search performance between the 2D trials within the first half of the experiment (where the x and y coordinates of a specific 2D trial were seen for the first time) and the 2D trials in the second half of the experiment (where the x and y coordinates previously seen within the segmented-3D display of the first half of the experiment were repeated in the 2D displays). Likewise, there were no significant differences in performance for the segmented-3D trials from the first half of the experiment and those from the second half. As such, the repetition aspect of the study design will not be discussed further.