Skip to main content
New Research

Using Written Records – and Tweets – as a Roadmap for Plant Disease Spread

Late blight disease on potato leaf.
Late blight lesion on a potato leaf. Photo courtesy of Jean Ristaino.

North Carolina State University researchers used text analytics on both historic and modern writing to reveal more information about the effects and spread of the plant pathogen – now known as Phytophthora infestans – that caused the 1840s Irish potato famine and that continues to vex breeders of potatoes and tomatoes.

The study examined keyword terms like “potato rot” and “potato disease” after digitizing historic farm reports, news accounts and U.S. Patent Office agricultural records from 1843 to 1845 to show how the pathogen first spread across the northeast United States before causing the devastating famine in Ireland in 1845. The study also used text analysis to track social media feeds for the modern-day spread of late blight.

Textual analysis holds promise as a useful tool to help researchers track and visualize both historic and current plant diseases, the researchers say.

“We went back to original descriptions of the potato disease outbreaks in the United States because they occurred between 1843 and 1845, before outbreaks occurred in Europe,” says Jean Ristaino, William Neal Reynolds Distinguished Professor of Plant Pathology at North Carolina State University and corresponding author of a paper in Scientific Reports that describes the study. “We searched those descriptions by keywords, and by doing that we were able to recreate the original outbreak maps using location coordinates mentioned in the documents.

“We were also trying to learn what people were thinking about the disease at the time and where it came from.”

The analysis documents late blight disease on potatoes in five states – New York, Delaware, Massachusetts, New Jersey and Pennsylvania – before it spread to the rest of the northeastern U.S. and into Canada between 1843 and 1845. The pathogen later wreaked havoc on Europe – especially Ireland.

The paper also examined tweets from 2012 to 2022 to learn more about modern spread of P. infestans. They mined tweets for both common and scientific names of the pathogen and were able to geolocate the sources.

“The social media mining was interesting because we found that most people talking about this disease are scientists in developed countries promoting their own work on Twitter (now X),” Ristaino said. “It was also interesting to note that states where the disease appeared all those many years ago still have the disease now.”

The study also used Google Ngram search terms to reveal a surprising finding. The researchers saw a spike in late blight disease reported in 1950s documents. Drilling down into the relevant academic literature cited in the documents, Ristaino saw evidence of a large late blight outbreak in tomatoes in the United States after World War II.

“That could have been the emergence of a new North American strain of the pathogen, known as U.S. 1, that became really widespread after that,” Ristaino said.

Ristaino added that she and her team plan to continue this type of work and expand the analytic tools to other plant diseases and pests.

Co-authors Ariel Saffer, Laura Tateosian and Yi-Peng Yang are part of NC State’s Center for Geospatial Analytics. Amanda C. Saville, a research specialist in Ristaino’s lab, also co-authored the paper. Funding was provided by the Triangle Center for Evolutionary Medicine Seed Grant; the U.S. Dept. of Agriculture’s NIFA under grant number 2015-2370; and by the National Science Foundation PIPP Phase 1 grant number 2022-1191.


Note to editors: The abstract of the paper follows.

“Reconstructing Historic and Modern Potato Late Blight Outbreaks Using Text Analytics”

Authors: Ariel Saffer, Laura Tateosian, Amanda C. Saville, Yi-Peng Yang and Jean B Ristaino, NC State University

Published: Feb. 15, 2024 in Scientific Reports


Abstract: In 1843, a hitherto unknown plant pathogen entered the U.S. and spread to potato fields in the northeast. By 1845, the pathogen had reached Ireland leading to devastating famine. Questions arose immediately about the source of the outbreaks and how the disease should be managed. The pathogen, now known as Phytophthora infestans, still continues to threaten food security globally. A wealth of untapped knowledge exists in both archival and modern documents, but is not readily available because the details are hidden in descriptive text. We 1) used text analytics of unstructured historical reports (1843-1845) to map U.S. late blight outbreaks; 2) characterized theories on the source of the pathogen and remedies for control; and 3) created modern late blight intensity maps using Twitter feeds. The disease spread from 5 to 17 states and provinces in the U.S. and Canada between 1843-45. Crop losses, Andean sources of the pathogen, possible causes and potential treatments were discussed. Modern disease discussion on Twitter included near-global coverage and local disease observations. Topic modeling revealed general disease information, published research, and outbreak locations. The tools described will help researchers explore and map unstructured text to track and visualize pandemics.

This post was originally published in NC State News.