INTRODUCTION

The magnitude of novel coronavirus (COVID-19) pandemic has led to considerable economic hardships, stress, anxiety, and concerns about the future. Social media can provide a place for measuring a pulse of mental health in communities. Evaluating the changing use of language on social media can complement traditional survey-based approaches and provide new insights into the well-being of a country or region during a public health crisis. Social media could also enable early symptom discovery for diseases where the pathology is not completely known and is evolving.1 We, therefore, created a dashboard (https://bit.ly/penncovidmap) to monitor and analyze changes in language expressed on Twitter over the course of the COVID-19 pandemic within the USA with a specific focus on mental health and symptom mentions.

METHODS

We are collecting two data sets, each containing approximately 5 million tweets/day, of publicly accessible streaming data for the dashboard: (a) a random 1% sample of daily US tweets to infer overall mental health, from which we identify English-language tweets posted from within the USA on the previous day; and b) tweets containing COVID-19 related keywords obtained using a public keyword streaming API to compute symptom mentions per state related to COVID-19.

After geolocating all the tweets by mapping posts to states using a combination of location coordinate information and user location descriptions, we extract the relative frequency of single words and phrases (consisting of two or three consecutive words). Based on the word and phrase frequencies, mental health estimates are computed on the random 1% sample by applying four pre-trained data-driven machine learning models: overall sentiment (net positive language)2, stress3, anxiety 4, and loneliness expressions5. We calculated estimates for these four measures from the national declaration of emergency, on March 13, to May 6, and compared them to the estimates from the same period in 2019, controlling for day of the week and seasonality effects. We quantified the effect size using Cohen’s d.

Using the second Twitter sample containing COVID-19 keywords, we calculate the frequency of Twitter posts relating to different COVID-19 symptoms across states. The study was considered exempt under the University of Pennsylvania Institutional Review Board guidelines.

RESULTS

Comparing the mental health estimates across all the states in the duration after the declaration of emergency from March 13 to May 6, sentiment (Fig. 1a) was lower in 2020 compared with that in 2019 (Cohen’s d = − 0.97; CI = [− 1.41, − 0.53], p < 0.001), stress (Fig. 1b) was higher (d = 1.5; CI = [1.03, 1.97], p < 0.001 ), anxiety (Fig. 1c) was consistently higher (d = 4.4; CI = [3.66, 5.2], p < 0.001), and loneliness (Fig. 1d) also showed a marked increase (d = 1.58; CI = [1.11, 2.06], p < 0.001).

Figure 1
figure 1

(a) Sentiment, (b) stress, (c) anxiety, and (d) loneliness expressions derived from data-driven machine learning models on Twitter language from the start of January till May 6 in 2019 (green) and 2020 (orange). The measures are normalized by centering and scaling based on January values of the respective years and calculating the mean over all states in the USA weighted by the number of Tweets in each state.

Symptom mentions in the COVID-19 related tweets capture emerging symptoms such as a change in smell/taste, body aches, and skin lesions (Fig. 2).

Figure 2
figure 2

Trends in symptom mentions in COVID-19 related tweets. *Smell/taste, body ache, headache, chills were added to the symptom list by the Centers for Disease Control (CDC) on April 17. Skin lesions are increasingly being discussed in the context of COVID-19 tweets.

DISCUSSION

Language used in tweets can provide insight into changes in mental health of communities during public health emergencies where widespread polling may not be available. Stress, anxiety, and loneliness are increasingly divergent from 2019 levels. Early recognition of hotspots of declining mental health can lead to community-level interventions, for example through providing increased access to telepsychiatry services, supporting local community partners, and locally employing more paraprofessionals, such as community health workers.

Trending symptom mentions may lead to early recognition of new symptoms, such as recently noted skin findings associated with COVID-19.6 Several symptoms were reported in the context of COVID-19 tweets prior to them being added to the symptom list by the Centers of Disease Control and skin lesions have been discussed starting March. Syndromic surveillance could also enable early recognition of disease re-emergence or spread and more informed distribution of tests and equipment.1

Limitations of this study include that Twitter users are not representative of all segments of population and that the language-based estimates are on a random 1% data stream of tweets. Further, lack of polling data means our estimates have not been validated during the assessment period. In future work, we intend to validate these models against gold standard polling data. In conclusion, real-time monitoring of location-specific social media posts can provide insight into emerging issues of public concern. Early recognition of local trends can lead to an informed distribution of resources, targeted public health interventions, and better preparedness in this and future public health emergencies.