A Visual Story of the COVID19 Outbreak and How Misinformation Spreads

Zoe Liu

Presented at JSM on Aug 5 2020

Global Covid-19 Confirmed Cases

[Refresh the page if the graph is not properly rendered]

Method

Tools

  • Python
  • Plotly

Data Source

  • Google Trends
  • Twitter

What are Google Trends data1?

  • Random sample of actual search requests made to Google
  • Normalized search data to make comparisons between terms easier
  • Search interest scaled on a range of 0 to 100 based on a topic's proportion to all searches
  • Search terms with low search volume appear as "0"
  • Data point is relative in the range of time and geography

Notes on Google Trends data

  • In this project, most of Google Trends data are retrieved with the time frame of January to April 2020. 2019 data is retrieved within the time frame of January 2019 to April 2020 so that we can compare 2019 data with 2020.
  • Because it is a random sample of the actual complete data, the key word is “Trend”. When interpreting the data, the focus would be identifying any trends or patterns, rather than proving a relationship.

Twitter data

  • I considered tweepy and twint to scrape twitter data.
  • Tweepy has the advantage of providing access to the entire twitter API methods. However, it requires a developer account and it returns tweets in the past seven days only.
  • Twint does not require a developer account as it scrapes only the public tweets. Most importantly it returns data far beyond the past seven days. And that is the reason I went with twint.

Misinformation 1

"COVID-19 was caused by Chinese eating bats "

On January 7th, Sing Tao Daily, the second largest Chinese language newspaper in Hong Kong, quoted microbiologist Yuen Kwok-Yung on the estimated origin of the virus.

"香港大學微生物學系講座教授袁國勇相信,武漢不明原因肺炎由冠狀病毒引起,他估計今次新病毒很大機會與沙士一樣,源頭來自蝙蝠,傳給某種野生動物,再跳到人類。"

"Hong Kong University microbiologist Yuen Kwok-Yung believes the wuhan pneumonia is caused by a coronavirus and he esimates that the new virus is very similar as SARS in that it orignated from bats, passing on to other wild animals and eventually to human."

In late January, videos of bat soup went viral on Chinese social media Weibo and TikTok across the world. Among them is a video of this young lady eating bat with chopsticks.

On January 22nd, Chen Qiushi, a lawyer and activist well-known online, shared one of the bat soup videos on Twitter and asked: "...Will the Chinese stop eating wild animals after all this?". At the time, the video was viewed 4.8 million times.

It should be noted that Chen participated in the Hong Kong protests last year, risking his life to report on police brutality. He’s not the stereotypical conspiracy theorist that we associate with misinformation.

Daily Mail ran a story on the bat soup video and linked it to coronavirus on January 23rd.

On the same day RT published a similar story and claimed it "actually perfectly possible" that the virus was caused by eating infected bat soup.

On January 24th, far-right YouTuber Paul Joseph Watson promoted the link between bat soup and coronavirus.

On March 2nd, Fox News host Jesse Watters said:

"Let me tell you why it happened in China...They have these markets where they were eating raw bats and snakes. They are very hungry people.The Chinese communist government cannot feed the people. And they are desperate, this food is uncooked, it is unsafe. And that is why scientists believe that’s where it originated from."

On March 15th, Kodama Boy wrote A Song About Coronavirus.

He is not the first or the only one who wrote songs (parody or not) about coronavirus. According to QUARTZ2, by March 3rd, there were more than 65 songs with "coronavirus" in the title on Spotify.

So did the media stories and influencers contributed to the spread of the misinformation?

Search Interest of "bat soup"

[Refresh the page if the graph is not properly rendered]

Search Interest of "bat soup"

[Refresh the page if the graph is not properly rendered]

The timing of the media and social media stories align well with the soaring interest in late January and the comeback in March.

Those stories above are not the only ones that promoted the bat soup theory. A great number of others have shared the bats soup videos on social media without fact-checking.

Misinformation 2

"Coronavirus came from the Wuhan lab"

ROUND ONE

On January 24th, Bill Gertz wrote, "The deadly animal virus epidemic... may have originated in a Wuhan laboratory".

On January 25th, GNews pushed a story claiming that China will admit coronavirus coming from its P4 lab.

One the same day, Steve Bannon pushed the wuhan lab theory from GNews on his radio broadcast War Room: Pandemic.

New York Times reported back in December 2019 that Steve Bannon forged a lucrative financial relationship with Chinese businessman Guo Wengui who owns GNews.

On January 29th, Tyler Durden's blog on Zero Hedge assigned a name to the wuhan lab theory.

He argued that Dr. Peng Zhou was somehow responsible for the source of this pandemic because "his primary field of study is researching how and why bats can be infected with some of the most nightmarish viruses in the world including Ebola, SARS and Coronavirus, and not get sick".

ZeroHedge was later banned by Twitter on January 31st for this blog post.

Search Interest of "wuhan lab": Jan through Mar

[Refresh the page if the graph is not properly rendered]

The peak of search interest of key words such as “wuhan p4 lab” and “wuhan bio lab” appeared in late January.

Note that the word “P4” was mentioned in the GNews story.

ROUND TWO

On April 14th, columnist Josh Rogin published an opinion piece on the Washington Post, titled “State Department cables warned of safety issues at Wuhan lab studying bat coronavirus”.

On April 15th, Fox News adopted Rogin’s story and claimed that “Sources believe Coronavirus outbreak originated in Wuhan lab as part of China’s efforts to compete with US”.

Search Interest of "wuhan lab": Jan through Apr

[Refresh the page if the graph is not properly rendered]

With the Google Trends search extended from end to March to end of April, a second peak appeared in mid-April, on April 15th and 16th , just a day or two after the leak of the state department cables.

It’s very likely that Rogin’s story created the second wave of the misinformation that the virus came from the Wuhan lab.

Given that most of the news stories were from the US, I zoomed into the US data by state in the six days around the first peak, from January 24th to 29th.

Average Search interest in Jan

[Refresh the page if the graph is not properly rendered]

On January 25th and 26th, we observe sustained higher interest in California compared to the other states.

So were the Californians leading the search interest in “wuhan lab” in January?

As Google Trends returns a random sample of the entire search data, it is possible that the pattern based on one slice of data is due to chance. I repeated sampling search trends on January 25th and 26th and calculated the average of the repeated sampling.

Average Search Interest on Jan 25th (Repeated Sampling)

[Refresh the page if the graph is not properly rendered]

Average Search Interest on Jan 26th (Repeated Sampling)

[Refresh the page if the graph is not properly rendered]

As we observed before, California was the state with the highest search interest in “Wuhan Lab” and related key words on January 25th.

The same occured on January 26th.

We only know that more searches made in California on January 25th and 26th than in other states, but not the reason why...

Misinformation 3

"5G Network was associated with the spread of COVID-19"

Search Interest of 5G Network in 2019

[Refresh the page if the graph is not properly rendered]

People googled 5g network sporadically in 2019. Search interests are mostly below 15%.

5G theories at that time were fringe conspiracy theories. However, the COVID-19 outbreak provided new angles into this theory.

On January 19th, a tweet speculated on a link between 5G and coronavirus:

"Wuhan has 5,000+ #5G base stations now and 50,000 by 2021 — is it a disease or 5G?" 3

On January 22nd, a Belgian newspaper published an interview with Kris Van Kerckhoven, a local doctor with a headline reading “5G is life-threatening and no one knows it.”

Search Interest of 5G Network in 2020

[Refresh the page if the graph is not properly rendered]

The 5G-coronavirus theory started to circulate online in late January, but it did not retain enough traction in January and February.

In March, there was a growing interest in the effect of 5G but the real surge came in the first week of April.

So what exactly happened in April that led to the sudden interest in 5G and coronavirus?

Isobel Cockerel and Ashely Jung at codastory pointed out a few celebrities pushed 5G coronavirus conspiracies to millions of fans. Many of those posts occurred in early April this year and most of them on Twitter.

Among those celebrities, Wiz Khalifa had the most followers on Twitter. On April 3rd, he twitted “Coroan?5g? Or Both?” Other celebrities mentioned include John Cusack, Teddy Riley, and Woody Harrelson. But are those celebrities the only ones pushing the 5G Coronavirus theory?

Scraping Public Twitter Posts
  • April 1st to 8th
  • Tweets contain "5g" and "coronavirus", "covid", or "virus"
  • Compile a list of individual users mentioned or quoted the most

[Refresh the page if the graph is not properly rendered]

The most frequently mentioned individual user is Donald Trump. Many of tweets mentioned Trump also mentioned Boris Johnson and David Icke (a British conspiracy theorist).

Rhiannon Williams, a technology correspondent, is the second most quoted individual. She published an article “Why 5G isn’t causing the coronavirus pandemic” on April 1st.

Wiz Khalifa did not make it to the top list.

What is surprising is that while the discussion around 5G spread in United States and the UK, on the other side of the world, Nigerian politicians such as Femi Fani-Kayode and Senator Dino Melaye emerged as major contributor to the spreading of 5G misinformation.

JJ. Omojuwa, a Nigerian writer, argued against the link between 5G and coronavirus on Twitter.

Twitter users often refer to two of the three (Omojuwa, Fani-Kayode, and Melaye) in the same tweet and a triangle of tweets can be observed in the network.

The network is based on accumulated data from the 1st to the 8th.

What if we split the data by day?

Ranking the most mentioned Twitter User

[Refresh the page if the graph is not properly rendered]

When the data are split by day, the most mentioned individuals across time are Donald Trump and David Icke.

Rhiannon Williams obtained the most quotes/mentions only on April 3rd.

Starting from April 4th, we see more mentions of the Nigerian users.

Conclusion

1. Two major sources of misinformation are traditional media and social media.

2. Celebrities, politicians, and influencers sometimes hold the power of making a previously fringe misinformation go viral in a very short amount of time.

3. Anyone can spread misinformation sometimes even by mistake. Fact check before posting on social media.

References

1. FAQ about Google Trends data. Retrieved from https://support.google.com/trends/answer/4365533?hl=en&ref_topic=6248052

2. Kopf, Dan, (March 3, 2020). There are already more than 65 songs with "coronavirus" in the title on Spotify. Quartz. Retrieved from https://qz.com/1811814/theres-already-more-than-65-songs-about-coronavirus-on-spotify

3. Satariano, A. and Alba D, (April 10, 2020). Burning Cell Towers, Out of Baseless Fear They Spread the Virus. The New York Times. Retrieved from https://www.nytimes.com/2020/04/10/technology/coronavirus-5g-uk.html

Thank You!

github.com/liu-zoe/covid_misinformation