How does Christmas look like on Twitter? How does it look like in Spanish, English, Afrikaans and Danish? Are there differences? These were some of the questions, that the group I was in at uni wanted to answer. We wanted to scrape Twitter and find the words that often appear along Christmas in tweets. We scraped Twitter in the beginning of December.
We all had our diffuculties with Python, and several of us settled on a scraper from Scraperwiki.com. Scraping in Danish was something of a challenge in a way that I didn’t think about until this point.
I’ve found that the language itself puts certain restraints on it. It has proven to be difficult to get clean Danish results, even though I used Twitter’s own parameters for searching in a specific language. Firstly, the letters ‘jul’ often appears in names, like Julianna or Julie. Secondly, the letters ‘jul’ is used in different words in different languages. And even though I search for tweets in Danish, I got results back in Norweigan and something Google Translate detected was Malaysian. Although I’m not sure.
Exporting tweets in Danish also proved to be somewhat of a challenge. If I export tweets in json format, the letters æ, ø and å are converted into code. This also happened when I downloaded a zip-file from Tweet Archivist and opened the csv-file in Excel. The only way to avoid this, I found, was to download data as csv-files and open them in TextWrangler or Textedit.
Then I sorted the data and left out words like me, you, us, and http.
So what words were often associated with Christmas? Well, in general in the four languages the words were happy words. Like presents, family and happy. In the screen shot below you can see some of the most frequent-used words in Danish. Compared to the other three languages, it seems like Danes have a much more frivolous relationship with Christmas. How a word like fisse got to a fouth place beats me. Sarietha, who worked with Afrikaans found out that not many people tweet in Afrikaans, and the ones, who do seem to be conservative.