Background

The perinatal period, spanning from pregnancy to 12 months postpartum, can be a joyous time for expectant and new mothers; however, physical, emotional, and lifestyle changes in this period can heighten their risk for developing depression. The Internet can be an accessible source of mental health information for pregnant and postpartum people experiencing depression. Online health information is most often accessed by women and is a popular medium of information consumption for people in the pregnancy and postpartum periods [1,2,3,4]. Further to this, previous research suggests that perinatal people who endorse symptoms of depression are more likely to access online mental health resources than those with little to no symptoms [5]. Prior website evaluation research within the area of perinatal mental health has shown that websites vary in quality, are difficult to read, and often contain incomplete information [6, 7]. The quality of online information about perinatal depression has yet to evaluated, which was the focus of this study.

Depression in the perinatal period is common, with prevalence estimates ranging from 11 to 20% across studies [8, 9]. Factors contributing to the development of perinatal depression may include insufficient sleep [10], individual and family mental health history [11], perceived lack of social support [10, 11], and a history of adverse childhood experiences, including abuse [12]. Of concern, perinatal depression can have detrimental effects on both maternal and infant health if left undiagnosed and untreated [11, 13,14,15,16,17]. Despite the prevalence and adverse impacts of perinatal depression, help-seeking rates are low due to a multitude of barriers [18]. For example, pregnant and postpartum people struggling with depression may face criticism or feel ashamed if their perinatal experiences deviate from societal expectations, which often depict this period as full of bliss and satisfaction [19]. Consequently, people in the perinatal period may avoid seeking help for their mental health [19]. Another help-seeking barrier is low mental health literacy, an individual’s knowledge about mental disorders and treatment options, which can hinder mothers’ ability to recognize depression and make informed treatment choices [19, 20].

There are important implications for the use of the Internet to increase understanding and recognition of symptoms of perinatal depression and related treatment options. The Internet can help users gather health-related information outside of medical appointments, assisting the public to make informed treatment decisions [21, 22]. Within a perinatal context, pregnant and postpartum people may use the Internet to independently locate health information, to get a second opinion, and to exert greater control over the decisions affecting their health and that of their child [4]. Nevertheless, it is unclear how well Internet users can discern the quality of online health information [23]. Further, women in the perinatal period often deem online health information to be reliable, but do not always verify their findings with their healthcare provider [24].

Several factors contribute to higher quality health information websites, including readability, information quality, usability, and visual design. The National Institute of Health recommends that health information have a reading level of grades 6–8 [25]; however, online health materials often exceed this recommendation [26, 27]. This is consistent with previous mental health website evaluations, with findings that websites differed greatly in quality [6, 28]. At present, the quality of perinatal depression websites has yet to be assessed, with past evaluations focused solely on depression in the postpartum period [6]. A previous evaluation of perinatal anxiety websites found that most websites had high reading levels, low to moderate information quality, and low actionability ratings [7]. The aim of this study was to evaluate the quality of perinatal depression information websites, with a specific focus on readability, information quality, usability, and visual design.

Methods

Procedure

We identified websites for this evaluation by searching three sets of terms in the search engine Google. We selected Google as over 70% of online searches are conducted using this search engine [29]. Prior to each search, we activated Google’s incognito mode and cleared the browsing data, including search history, cookies, and cached files, to reduce the impact of past searches on our search results. To investigate which search terms would return the broadest set of relevant results, we searched several terms related to perinatal depression in Google in February 2020. Based on these results, we selected several lay terms for depression in pregnancy and postpartum, as well as the medical term perinatal depression. Consequently, we captured a wide range of websites that pregnant and postpartum people with depression may visit. Our final search terms included: perinatal depression, depression AND pregnant, and sad after pregnancy. To meet inclusion criteria, websites needed to include at least 500 words on perinatal depression (depression occurring in pregnancy and/or within 12 months postpartum), be written in English, and be retrieved from the first three pages of the search, as most people do not look beyond this [30]. Websites were excluded if they were duplicates, blogs, forums, commercial websites, practitioner materials, book or article excerpts, or were inaccessible to the public (e.g., required a login). As detailed in Fig. 1, 37 websites met inclusion criteria and were evaluated in the current study.

Fig. 1
figure 1

Website selection process

After selection, non-rater SK Pierce took screenshots of each website in March 2020. SK Pierce then de-identified screenshots and randomized the order in which each of our raters (MPH and SK Petty) assessed each website. This procedure was adopted to blind raters to websites in our sample, with the aim of achieving more objective website ratings. Raters reviewed deidentified screenshots and completed independent DISCERN, PEMAT (Patient Education Materials Assessment Tool), and VisAWI (Visual Aesthetics of Website Inventory) ratings for each website. Raters only consulted one another when rating the first three websites to ensure that they had a shared understanding of all scale items.

Measures

Reading level

We calculated website readability using the Simple Measure of Gobbledegook (SMOG) evaluation [31]. This measure indicates the years of education a reader needs to understand the material, known as reading level. When interpreting scores, higher SMOG scores correspond to higher reading levels. We calculated SMOG scores by inputting 30 sentences from each website into an online readability tool [32]. This measure has been used to evaluate the reading level of online materials for a range of mental health conditions, including but not limited to anxiety, depression, bipolar disorder, and schizophrenia [33, 34].

Information quality

We assessed website information quality using the DISCERN, a standardized 16-item instrument rated on a scale of 1 (no, does not meet website quality criteria) to 5 (yes, meets website quality criteria) [35]. Raters answer questions about elements related to health information quality, such as whether the information is relevant. The DISCERN has high levels of reliability and validity and has been used to assess online information quality for several mental and physical health topics, including but not limited to perinatal anxiety, posttraumatic stress disorder, and chronic kidney disease [7, 35,36,37].

Usability

We evaluated website usability using the Patient Education Materials Assessment Tool (PEMAT). This measure assesses the understandability and actionability of health information and has high levels of internal consistency, reliability, and construct validity [38]. Understandability is a 19-item evaluation, while actionability is a 7-item assessment. Our two raters independently rated each item with either 0 (Disagree) or 1 (Agree). Materials are considered understandable or actionable if they reach a threshold of 70% or more on each measure [38]. The PEMAT has been used to assess the understandability and actionability of websites focused on various mental and physical health topics, including but not limited to perinatal anxiety, cervical cancer screening, and hypertension [7, 39, 40].

Visual design

We assessed website visual design using the Visual Aesthetics of Website Inventory (VisAWI), a standardized 18-item instrument designed for online materials [41]. Raters are prompted to answer questions related to various design elements using a 7-point Likert scale (1 = strongly disagree to 7 = strongly agree). We calculated a general factor of visual aesthetics for each website by averaging ratings for all items [41]. The VisAWI has been used to evaluate visual aesthetics of websites on topics such as anxiety and nutrition [33, 42].

Analysis

Following previous research, we averaged all 16 items of the DISCERN to produce an overall information quality rating for each website [7, 43]. We also calculated mean ratings and 95% confidence intervals for each DISCERN and VisAWI item. To determine whether mean DISCERN scores were significantly associated with SMOG scores, PEMAT ratings, and mean VisAWI ratings, we calculated Pearson correlation coefficients. As well, we computed correlations with these variables and search engine order from all three searches to evaluate how quality differed across searches. We also calculated interrater agreement for DISCERN, PEMAT, and VisAWI ratings using an interclass correlation coefficient. To make comparisons across measures, we assigned websites a rating (good, adequate, or poor) for each domain evaluated. We then calculated aggregate ratings for all websites and used these to assign websites an overall rank. Table 1 outlines the criteria for good, adequate, and poor classifications for website domain and aggregate ratings. Websites remained blind to evaluators until we completed data analysis.

Table 1 Criteria used to rank websites

Results

Highest rated websites

Table 2 shows website rankings, domain ratings, and aggregate ratings. Only five websites in our sample achieved aggregate ratings falling within the good range, as per the criteria outlined in Table 1. The websites with the highest aggregate ratings were the American Family Physician and the National Health Service, which both had aggregate ratings of 13. Both websites were rated highly in terms of information quality, with the latter having the highest DISCERN rating across websites in our sample. Both websites also met the 70% threshold for understandability and actionability. The American Family Physician had a reading level of 7, thus falling within the recommended reading levels of 6–8; however, the National Health Service exceeded the recommended range, with a reading level of 10. These two websites also varied in terms of their visual design ratings. The American Family Physician had one of the lowest VisAWI ratings in our sample at 3.5, while the National Health Service had a VisAWI rating of 5.4, which falls within the adequate range.

Table 2 Perinatal depression website characteristics and dimension comparison – February, 2020

Furthermore, three other websites had aggregate ratings falling within the good range. These websites were Beyond Blue, March of Dimes, and a second webpage from the National Health Service, which all had aggregate ratings of 12. With regards to reading level, March of Dimes and the National Health Service fell within the recommended range of grades 6–8, with scores of 8 and 8.5 respectively. Beyond Blue was slightly above the recommended range, with a reading level of 9. These websites all met the 70% threshold for understandability but did not meet criteria for actionability. Further, the DISCERN ratings for these websites only fell into the adequate range based on our criteria outlined in Table 1. Beyond Blue had the highest VisAWI rating across websites in our sample and fell within the good range with a rating of 6.6. On the other hand, March of Dimes and the National Health Service only had VisAWI ratings falling within the adequate range (5.8 and 5.3 respectively).

Reading level

Websites varied greatly in their reading levels, with ratings ranging from 6.8 to 13.5. Only 10 websites in our sample had reading levels that fell within the recommended range. To determine whether reading level increased with search engine order, we calculated two-tailed Pearson correlation coefficients for all searches. A significant positive association was found between search engine order (search: perinatal depression) and reading level, r(14) = .56, p = .04. This indicates that as website order increased in this search, so did reading level. These variables were not significantly associated in other searches.

Information quality

We calculated mean ratings for each DISCERN item to assess website performance across items (Table 3). Website information quality ratings ranged from 1.8 to 4.3 out of 5 (M = 3.2, SD = .7), with mean item ratings varying between 2.6 to 3.9 out of 5. Interrater agreement for the DISCERN was excellent, r(37) = .84, p = .01. Mean overall website information quality (item 16) was 3.1, indicating that websites only partially meet DISCERN criteria. Several of the lowest rated items included items 4 (sources of information used; M = 2.8, SD = 1.5) and 5 (when sources were produced; M = 2.7, SD =1.1). The highest rated items were 3 (relevance of content; M = 3.9, SD = .8) and 14 (more than one treatment option provided; M = 3.9, SD = 1.0). With the exception of item 14, all items related to treatment (items 9–13) were rated low to moderate. These items include how treatment works (item 9, M = 2.7, SD = 1.4), the benefits and risks of treatment (item 10, M = 2.6, SD = 1.1; item 11, M = 3.0, SD = 1.4), what happens if no treatment is used (item 12, M = 3.0, SD = 1.4), and how treatments affect quality of life (item 13, M = 3.1, SD = 1.1). We calculated two-tailed Pearson correlation coefficients to determine whether search engine order and mean DISCERN ratings were associated across searches. There were no significant relationships between these variables.

Table 3 Mean scores of DISCERN items across all websites

Usability

Understandability ratings ranged from 42 to 89% (M = 66.7, SD = 11.8), with moderate interrater agreement, r(37) = .61, p = .01. Only 14 websites in our sample met the understandability threshold, with most lacking information summaries and visual aids. To determine whether websites with high information quality also had high understandability, we calculated a two-tailed Pearson correlation coefficient. The relationship between these variables was not significant, r(37) = .26, p = .13. Moreover, actionability ratings varied greatly (14–86%, M = 40.0, SD = 16.2), with high interrater agreement, r(37) = .76, p = .01. Overall, only two websites were actionable, with websites often missing tangible tools. To determine whether information quality and actionability were associated, we computed a two-tailed Pearson correlation coefficient. There was a significant relationship between these variables, r(37) = .34, p = .04, indicating that websites with higher information quality also had greater actionability. To assess whether search engine order and actionability were associated across searches, we calculated Pearson correlation coefficients. There was a significant negative relationship between these variables (search: perinatal depression), r(14) = −.55, p = .04, indicating that websites with higher actionability ratings appeared earlier in this search. These variables were not significantly associated in the other two searches.

Visual design

We calculated mean VisAWI item ratings to assess website performance across items (Table 4). Website visual design ratings ranged from 3.2 to 6.6 out of 7 (M = 5.0, SD = .8) with mean VisAWI item ratings ranging from 4.1 to 5.7 out of 7. The highest rated items were related to website layout and use of colour, including items 4 (site appears patchy; M = 5.7, SD = 1.3), 12 (colours do not match; M = 5.6, SD = 1.3), and 13 (choice of colours is botched; M = 5.6; SD = 1.3). Websites received lower ratings on items related to design creativity, including items 6 (design is uninteresting; M = 4.1, SD = 1.6), 7 (layout is inventive, M = 4.4, SD = 1.4), and 8 (design appears uninspired, M = 4.2, SD = 1.2). To determine whether there was a relationship between search engine order and mean VisAWI ratings across searches, we calculated two-tailed Pearson correlation coefficients. There was a significant negative relationship between these variables (search: perinatal depression), r(14) = −.64, p = .01, indicating that websites earlier in this search had superior visual designs. These variables were not significantly associated in the other searches.

Table 4 Mean scores of VisAWI items across all websites

Initially, VisAWI interrater agreement was low due to the broad range of possible responses and the subjectivity of this measure. To improve interrater agreement, raters reassessed by consensus any of their ratings that were two or more points apart for items 1, 6, 7, 8, 12, 13, 14, and 18. We limited reassessment to these items to preserve as many of our independent ratings as possible; however, interrater agreement remained low. In response, raters reassessed by consensus any total VisAWI ratings differing by seven or more, which represented the upper third of the data. This resulted in excellent interrater agreement, r(37) = .97, p = .01.

Discussion

The purpose of this study was to evaluate the quality of perinatal depression information websites, with the literature currently limited to evaluations of perinatal anxiety and postpartum depression websites [6, 7]. Websites in our sample were predominantly of low to moderate quality, with only five websites achieving good aggregate ratings. With regards to readability, only 10 of 37 websites fell within the recommended reading level (grades 6–8).

The websites with the highest overall ratings were the American Family Physician and the National Health Service, with the latter also having the highest information quality rating across websites in our sample. Although the National Health Service had a high reading level of 10, the American Family Physician fell within the suggested range with a reading level of 7. Both websites met criteria for understandability and actionability (70%). Their visual design ratings varied, with the American Family Physician receiving one of the lowest visual design ratings across websites in our sample, while the National Health Service had a visual design rating falling within the adequate range. Other highly rated websites included Beyond Blue, March of Dimes, and a second webpage from the National Health Service. March of Dimes and the National Health Service fell within the recommended readability range, with reading levels of 8 and 8.5 respectively; however, Beyond Blue was slightly above the recommended range with a reading level of 9. Although all three of these websites met criteria for understandability, they did not meet criteria for actionability. Further, their information quality ratings only fell within the adequate range. Beyond Blue had the highest visual design rating across websites in our sample, while March of Dimes and the National Health Service had visual design ratings falling within the adequate range.

It is essential that websites present trustworthy content to ensure that pregnant and postpartum people with depression can make informed treatment choices. Information quality scored lowest in areas related to the sourcing of information and information about treatment options. This aligns with findings of previous mental health website evaluations, suggesting that website authors must incorporate evidence-based sources and convey details of these sources to users [7, 28, 43]. Furthermore, most websites lacked detailed descriptions of treatment benefits and focused primarily on risks; however, information was generally limited to pharmacological interventions, which is consistent with a review of adult depression websites [28]. Websites successfully conveyed relevant information, such as depression symptoms, and provided comprehensive lists of treatment options.

Only two websites in our sample met criteria for both understandability and actionability, which is consistent with a previous evaluation of perinatal anxiety websites [7]. To improve understandability, websites can include summaries of key information and visual aids. The lack of visual materials is problematic, as videos may be an effective means of destigmatizing mental illness [44, 45]. As well, women may prefer greater visual aids when learning about postpartum depression [46]. Websites in our sample also had poor actionability features, including a lack of tangible tools, such as symptom checklists. There was a significant negative correlation between search engine order (search: perinatal depression) and actionability ratings, indicating that websites earlier in these results were more actionable. As well, mean information quality and actionability ratings were positively correlated.

Overall, website visual design ratings varied widely, with most websites in our sample falling within the adequate range. Websites possessed strong structural elements, such as well-designed layouts, in addition to engaging colour choices; however, they often lacked creative design features, such as inspiring design elements. It must be noted that only one of the five top-rated websites in our sample had a visual design rating falling within the good range. Given that perceived aesthetics may influence users’ first impressions of a website and perceptions of trustworthiness, user engagement with online mental health materials may be increased through improved visual design [47, 48]. Within our sample, a significant negative correlation was found between search engine order (search: perinatal depression) and mean visual design ratings, suggesting that websites with superior designs appeared earlier in this search.

Limitations

This study is additive and complementary to the growing number of mental health website evaluations; however, it is not without limitations. Despite our broad range of search terms, these terms are not reflective of all of the terms that may be used by pregnant and postpartum people experiencing depression, or those close to them who are looking for information and support. Further to this, we recognize that our search terms may not have captured all of the specialized perinatal mental health websites available online. Future research using different search strategies and evaluating additional perinatal mental health websites would be a valuable addition to the extant research in this area. Our search results may also have been impacted by region and may not include all available websites. As well, only websites that were written in English were assessed, which limits the generalizability of our results. Our search data was also limited to the date on which our searches were completed. Other search engines may have produced differing results, however, we followed the precedent of previous website evaluations and only used Google, the most widely used search engine [7, 28, 29, 33, 43].

Additionally, there are several limitations to the methods that we used to rate websites in our sample. Although blinding websites to raters reduced subjective bias, there were drawbacks to this method. Specifically, it was not effective when rating websites such as Wikipedia that have highly recognizable appearances. Further, raters did not rate all VisAWI items independently due to initially low interrater agreement, attributed to the subjectivity of the VisAWI and its broad rating scale. It is important to contextualize our challenges with interrater reliability for the VisAWI within the extant body of research. The literature reveals a range in coefficients across studies measuring interrater agreement for shorter versions of the VisAWI (0.11–0.88) [49, 50]. This highlights the need for consideration on the use and potential refinement of this method to ensure that ratings are consistent across reviewers.

Conclusion

This study adds to the growing body of literature on mental health website evaluations, and more specifically, evaluations focused on perinatal mental health websites. Overall, websites in our sample varied greatly in quality. Websites often exceeded the recommended reading level, suggesting that website creators must produce more easily understandable content. Furthermore, there was a paucity of treatment-related information, which would hinder users’ ability to make informed treatment choices. Poor understandability and actionability ratings suggest that website usability must be improved, namely by adding information summaries, visual aids, and tangible tools to help users seek support. At present, perinatal depression websites are not meeting the needs of the public in terms of reading level, information quality, usability, and visual design. Our findings may guide healthcare providers, people who are pregnant or postpartum and experiencing depressive symptoms, and their supporters, to high quality online resources focused on perinatal depression. Several high-quality resources that can be referred to perinatal people experiencing depression include the American Family Physician, Beyond Blue, March of Dimes, and the National Health Service.

We recommend that future perinatal mental health website evaluations integrate a variety of medical terms into their searches (e.g., postpartum depression). We expect this would return a different sample of websites, which, in conjunction with the findings presented in this study, may extend researchers’ understanding of the quality of websites focused on mental health in the perinatal period. Future research could also include the use of several different search engines, which may result in a larger sample of websites that could be assessed. Further to this, non-English websites or websites from different regions about perinatal depression could be assessed to determine how the quality of perinatal mental health websites differs across languages as well as geographically.