The automated evaluation of health-related websites

Published on March 1, 2021 – Recently, ACHC researchers published a study in which they applied machine learning on online texts about early childhood vaccinations. Their research showed that computers can be trained to automatically determine whether online information is reliable or not.

The Internet hosts an abundance of information, but a large proportion of this information is unreliable or even incorrect. Many people find it difficult to evaluate online information in terms of reliability, or lack the motivation to do so. ACHC researchers Corine Meppelink, Hanneke Hendriks, Julia van Weert and Eline Smit therefore wanted to investigate whether online health information can be classified in an automated way. To this end, they collected online texts about early childhood vaccination which were manually coded into reliable/unreliable. Subsequently, together with colleagues Damian Trilling and Anqi Shao, machine learning was used to train several models for this automated classification task.

Results of the training process showed that computers can be ‘trained’ to adequately distinguish reliable online information from information that is less reliable. This finding particularly applied to reliable information; this type of information was identified by the computer very well. Not only in the data on early-childhood vaccinations, but also in a different data set that contained online texts about HPV vaccinations. This makes the conclusions of the study much stronger, since the models were only tested and not trained on this dataset.

The study is one of the first steps towards the automated classification of online health information. In future research, the models can be trained and tested on many more texts covering different topics to eventually develop an easily accessible tool for people to select reliable online information.

Link to the article:

Meppelink, C. S., Hendriks, H., Trilling, D. C., van Weert, J. C. M., Shao, A., & Smit, E. S. (2020). Reliable or not? An automated classification of webpages about early childhood vaccination using supervised machine learning. Patient Education and Counseling.