Text Analytics on News from the Rohingya Refugee Crisis

This repository contains work done as a part of a hackathon in the fall of 2020, cosponsored by Save the Children and the UVA School of Data Science. This is a small project, focused on text mining from news resources, building a pipeline to usable text data, and performing introductory topic modelling on the resulting data, and this README is an explanation of the file strucutre of the repository.

Notebooks

In the project, we have three notebooks: NYT, Pipeline, and Analysis. Their contents are as follows:

NYT

In the NYT.ipynb notebook, the New York Times API is employed in order to search their articles, so as to build a corpus of relevant documents.

Pipeline

In the Pipeline.ipynb notebook, we query a free online news API to build another corpus, and we extract and perform cleaning operations on the text of the relevant news stories.

Analysis

In the Analysis.ipynb notebook, we use the cleaned text data to perform Latent Dirichlet Analysis topic modelling on each corpus. This notebook also generates a handful of .html files, which are interactive visual represenations of the topic models.

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
Analysis.ipynb		Analysis.ipynb
NYT.ipynb		NYT.ipynb
Pipeline.ipynb		Pipeline.ipynb
README.md		README.md
ldavis_prepared_newsapi_10.html		ldavis_prepared_newsapi_10.html
ldavis_prepared_newsapi_3.html		ldavis_prepared_newsapi_3.html
ldavis_prepared_nyt_10.html		ldavis_prepared_nyt_10.html
ldavis_prepared_nyt_3.html		ldavis_prepared_nyt_3.html
nyt.csv		nyt.csv
nyt_urls.csv		nyt_urls.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Analytics on News from the Rohingya Refugee Crisis

Notebooks

NYT

Pipeline

Analysis

License

About

Releases

Packages

Contributors 2

Languages

savethechildrenhackathon/Annie-Haizhu-Andre

Folders and files

Latest commit

History

Repository files navigation

Text Analytics on News from the Rohingya Refugee Crisis

Notebooks

NYT

Pipeline

Analysis

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages