Submission of final project proposal for Group U #1

Kimberlyshan · 2022-03-02T22:03:06Z

@QMSS-G5063-2022/teaching_team

SHA: e4784df

The text was updated successfully, but these errors were encountered:

JonathanReeve · 2022-03-11T05:37:50Z

My suggestions for this would be:

Narrow your datasets. I'd recommend choosing either Twitter tweets or Reddit posts, since each medium carries its own linguistic style and user base. It would overcomplicate things to be comparing them.
Think about how you'll handle language. If you're only looking at English-language posts, you're excluding quite a lot of important opinion that would be written in Russian or Ukrainian. Not to mention French, German, Polish, Romanian, and so on. So that's going to color a lot of the sentiment you're analyzing.
Sentiment analysis and making word clouds are not the same. Do some thinking about what the word cloud is doing, if you decide to go with that visualization. It usually throws out stopwords and shows you a quilt of remaining words, where size is correlated with frequency. But can you do better? I think if you think about what sorts of things you're interested in measuring, you can do better than this out-of-the-box solution.
If you really want to do sentiment analysis, maybe look into some sentiment analysis packages for R, or the the nltk.sentiment package in Python. Multilingual sentiment analysis might be a bit more difficult, but you could probably accomplish this to a certain extent with lexical approach, if you find the right word lists for your target languages.
Some questions I might have for your data set would include:
- How is a Reddit user's sentiment about the war correlated with the other subreddits in which they post? If a user subscribes to lots of right-wing subreddits, for instance, does that make him more or less likely to have certain opinions?
- Similarly, what else does a Twitter user post about, that is not about the war? How do their opinions about the war correlate with their other opinions, about other things entirely?
- What kinds of expressions are correlated with certain sentiments? For example, if you see the expression "special military operation" (the official Russian phrase for the war), what kinds of sentiments are conveyed?
- Does other metadata correlate with sentiment used? For instance, are anti-Ukrainian tweets happening between 9am and 5pm, Moscow time? (This would really suggest that they're tweets paid for by the Russian government.)

Let me know if you have any questions regarding NLP-related tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission of final project proposal for Group U #1

Submission of final project proposal for Group U #1

Kimberlyshan commented Mar 2, 2022

JonathanReeve commented Mar 11, 2022

Submission of final project proposal for Group U #1

Submission of final project proposal for Group U #1

Comments

Kimberlyshan commented Mar 2, 2022

JonathanReeve commented Mar 11, 2022