Leveraging NLP To Analyse Effects of Covid-19 On Baseline Demographics and Barriers to Compliance By Arpita Dutta – Student Author

It has been more than 8 months now since the COVID-19 pandemic. With cases skyrocketing, everything remains very uncertain till now. What we do have now is information, data and huge amounts of research going on globally (Kaggle currently has 220K scholarly articles which is approximately 17GB of data) Healthcare is a very dynamic field as many parameters keeps changing. It becomes difficult to keep a track of this, because there is a huge amount of data online. But how is one supposed to analyse such huge amounts of data with a plethora of information out there? Being a healthcare student, I realised that this is not just COVID-oriented problem, wherever there is data there will always be the goal to analyse it. But how easy would it be to analyse this data and provide a conclusion?

All Posts

Since the inception of COVID-19, people are trying to find answers to so many problems, be it forecasting, discoveries and so many other things. But instead of focussing on the later parts, let’s talk about the first step towards analysing the data. There are various perspectives and various ways through which the data can be studied. So, I researched and tried to find out how such data can be analysed.

Here we can’t ignore the role of analytics, with the coming of age technologies it is important to hold the pulse of technology and take a step towards Artificial Intelligence. Sounds complicated? Well it isn’t as tough as it sounds. So, while I was trying to know how AI can help, I came across NLP or Natural Language Processing. Natural language processing helps computers communicate with humans in their own language and focuses on other language-related tasks. So, NLP makes it possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important. In layman’s terms, NLP is the technology used to aid computers to understand the human’s natural language, the ultimate object being read, decipher, understand, and make sense of the human language in a manner that is valuable.

How is NLP useful for healthcare? Healthcare is a very dynamic field, as a result of which there are huge chunks of data which is unstructured, all this data can become unusable and unreadable. With the use of cloud computing and smart devices, the healthcare environment has facilitated for smoother application of NLP. However, with the use of NLP this data can be structured and can be used in various ways in healthcare settings. There is a huge amount of data available in healthcare settings, which is unstructured in nature. Most of which doesn’t make sense.

Some really basic uses of NLP can be spell check or autocorrect. The e-mails that we receive for example some go to our inbox and some to spam, here certain keywords are used to distinguish as to which mails should go to our inbox or spam.

Coming to healthcare usage of NLP, it can help in decision making or compare patient’s data to analyze which patient is doing better or which patient is facing problem.

·         In Healthcare, Medical coding and HCC (Hierarchical Condition Category) are important touchpoints. A Medical Coder’s job is to find the list of all conditions present in a Patients Chart/Clinical Notes (which is a tedious and manual job). One error can lead to a financial loss for the Provider. This is a key area where NLP can help, first of all NLP can help in converting speech to text and can also help to detect and correct errors. In addition to this, NLP can help identify/highlight the patient diagnosis using specific keywords or medical terminologies.

·         If we talk about Medical codes like ICD-10, the main problem is that they are not simple “English”. Example, if you try and find the word “Heart Attack”, you will not find it, on the contrary you will find the word “Myocardial Infarction’. This is where a terminology like SNOMED helps as it is an English like language. Meaning if you try searching disease by its name, you will be able to find it.

·         Another example is of Chatbots, you might have come across the same on various sites. It is very dynamic; you ask a question and it clarifies your doubt. A “Symptom Checker” is being increasingly used during these times to self-diagnose your symptoms to check whether you might be positive or not. Underlying the chatbot uses NLP which work on keywords. It uses those specific keywords to understand the symptoms and diagnosis.

·         Data mining is one of the most important tools that we can see for NLP. It helps in mining data of huge amount. Or if we talk about clinical trials, it can help in finding patients and data from a wide variety of sources from various sites. This not only helps in reducing administrative costs but also can save money.

·         Some other uses of NLP are improvements in clinical documentation, as it can help freeing up the physicians from manual work and help them focus more on healthcare delivery. It also helps in bridging the gap between patient and healthcare provider which ultimately results in improved healthcare delivery, increasing patient awareness. This can further help in treating patients who have critical or complex needs. NLP can majorly help in detecting errors or understand the problem areas where improvements can be made or prevent the problem from occurring in the first place.

·         NLP can be useful in developing a robust clinical decision support system which in turn can help physicians make accurate decisions. NLP helps here by keeping in check the medical errors by continuously monitoring symptoms and diagnosis.

Some other applications of NLP in future can be of ambient virtual scribe to reduce the time and work of physicians or population surveillance which can help in mapping out diseases and forecasting it.

With these applications, NLP has made our work easier and has great scope in future also. It has helped in providing us a 360-degree approach with a holistic view in solving problems.

Coming back to COVID scenario, one of the aspects that we all are trying to answer is related to non-pharmaceutical interventions during the pandemic. The clinical trials and all other facets which chiefly focus on pharmaceutical interventions are being handled by the scientists but there are some measures which are to be taken by the government for the people and people need to follow some precautionary measures to contain the spread of the virus till the vaccines are feasible to use. 

We all know that COVID-19 pandemic has hit the nations globally and, in every sector, so just analysing scientific information is not going to be enough. A good amount of research has to be done in the non-pharmaceutical sector to tackle this problem. Now you must be wondering what is the need to do that? This is because, there are certain measures that the government can take and implement for the public at large to follow. But, how do we know that what has to be done and what can be beneficial or harmful?

We have a range of problems to deal with while considering non-pharmaceutical interventions. rapid assessment of the likely efficacy of school closures, travel bans, bans on mass gatherings NLP helped me narrow down the research from the database, which was done based on specific keywords. Now what? Nothing, my work just got simpler and now I will let the computer do the rest of the work. So, NLP made the analysis for me easier by giving me the data which is relevant to non-pharmaceutical interventions. Keywords like UV light, climate, rainfall, water, wind velocity, precipitation, water vapor pressure, air pollutant etc help us understand the ever-changing nature of virus and focus on specific research papers, journals and news articles which consequently suggests some solutions or models followed worldwide which can be used in India. So instead of going through each and every paper, I analysed only the ones which were related to non-pharmaceutical interventions and eliminated the ones which were of no use to me.

Result? Well we will discuss what I analysed and found in the next segment.

Climate, temperature, latitude, humidity heavily affected the seasonality of the virus. The countries in which community spread was going on was due to the countries having similar geographical patterns. A number of studies, including laboratory studies, epidemiological studies and mathematical modelling, point to the role of ambient temperature and humidity in the survival and transmission of seasonal respiratory viruses. For this, baseline demographics like age, gender, height, weight, education level, lifestyle factors like physical activity, smoking status, and alcohol drinking status, diet, medical post, and chronic medical diseases can be used to assess the impact of COVID-19 and also to understand that why are not people following the public heath advice or what relief measures economically, government can take to mitigate the risks.

With all the data gathered from NLP, coming towards the result and how has COVID-19 impacted India, we see that The Ministry of Health and Family Welfare in India has come up with protocols and measures which can be followed to control the spread and restrict community transmission as much as possible. But, still, there are wide barriers to compliance.

It might be strange to know that there are higher number of deaths per year, higher than what COVID-19 is causing in India. Availability heuristic and saliency bias in which outcomes or events which are salient and capture our attention (possibly due to their recent occurrence or due to constant media attention) are perceived to be riskier than they actually are, has led to a state of panic across the country. Also there still exists a lack of protocol and the world hopes for a cure which can help in attaining some control over the present situation.

Studies are still going on. Due to this people are not following public health advice. This has led to downfall in compliance rates.

The problems further arise as the fatality rate for a virus (the proportion of infected people who die) is difficult to calculate in the middle of an outbreak because records on new cases and deaths are constantly being updated. The current death worldwide is 775, 632 and in India, it is nearing to 50,000. This still is significantly lower than SARS, which killed around 10% of the people it infected.

This could mean that deaths occur every year as the virus circulates, until a vaccine is developed. If the virus can be spread by people who are infected but don’t have symptoms, it will be more difficult to control its spread, making it more likely that the virus will become endemic. Putting people on drips and ventilators can help them restore their bodily fluids when the immunity fights the virus, but how long can it be sustained for still remains questionable. As the number of cases are on the rise and with the economy crumbling, lockdown and quarantining are not a long-term feasible option.

India has been forced a reset by Covid-19. While it has set us back by at least five years, it gives India an opportunity to revitalise and structure our economic system for the future. The political economy of reform has changed. India’s government deployed about 348 billion Indian rupees to combat the effects of the coronavirus (COVID-19) lockdown as of May 6, 2020. The largest value under the package went towards payments to farmers under the PM-KISAN scheme, and women account holders of the Pradhan Mantri Jan Dhan Yojana. The relief package falls under the Pradhan Mantri Garib Kalyan Yojana scheme, with a commitment of about 1.7 trillion rupees as relief for this time.

Overall NLP helped in text selection with the help of specific keywords, answering questions with the help of passage selection in close proximity to the questions asked. This way NLP helped me in information retrieval methods and select potentially relevant passages having answers. This data is used for analysis later.


(1)   [Internet]. Mohfw.gov.in. 2020 [cited 13 July 2020]. Available from: https://www.mohfw.gov.in/pdf/3ContainmentPlanforLargeOutbreaksofCOVID19Final

(2)   Insights S, Processing? W, Insights S. What is Natural Language Processing? [Internet]. Sas.com. 2020 [cited 23 August 2020]. Available from: https://www.sas.com/en_in/insights/analytics/what-is-natural-language-processing-nlp.html

(3)   Noronha G. India’s GDP to contract 3.1% in 2020: Moody’s – The Economic Times [Internet]. The Economic Times. 2020 [cited 13 July 2020]. Available from: https://economictimes.indiatimes.com/news/economy/indicators/indias-gdp-to-contract-3-1-in-2020-moodys/articleshow/76515744.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst

(4)   Coronavirus outbreak: what’s next? [Internet]. Nature.com. 2020. Available from: https://www.nature.com/articles/d41586-020-00236-9

(5)   Seasonality will ‘eventually’ play a role in COVID-19 transmission [Internet]. Healio.com. 2020 [cited 13 July 2020]. Available from: https://www.healio.com/news/infectious-disease/20200501/seasonality-will-eventually-play-a-role-in-covid19-transmission

(6)   Sajadi M, Habibzadeh P, Vintzileos A, Shokouhi S, Miralles-Wilhelm F, Amoroso A. Temperature, Humidity, and Latitude Analysis to Estimate Potential Spread and Seasonality of Coronavirus Disease 2019 (COVID-19). 2020.

Arpita Dutta
Arpita Dutta

Arpita Dutta has completed Bachelors of Business Administration in Healthcare Management, and is currently pursuing Masters of Business Administration from Symbiosis International University.  The article is based on the project undertaken under Mr. Harish Rijhwani (Senior Health IT professional; Mentor; Author: Healthcare Decoded – Begin Your Health IT Journey) 

Scroll to Top
👋 Hello
Hello!! 👋 Manish here, Thanks for visiting The Healthcare IT Experts Blog !! How can i help you?