In late 2019 an unknown person had threatened to bomb the casinos in Liechtenstein. After those threats I became very interested in, who would go this far. After talking with friends, I had the idea, that the easiest way to gage the feelings of the most vocal people in Liechtenstein, would be to analyse the “Leserbriefe”.
So I started to approach this project. I had written a basic crawler to analyze my own websites, which I rewrote to easily fetch the letters. If you want to follow along, you can get the code on my GitHub-Page.
Download the dataset as CSV
Look at the data and search through it
The code is really simple. At first we start with a few configuration settings.
This part of the code sets bounds for the code to look at. Because the Website, where I am getting the letters has a simple file structure. If a new letter gets created (or article) a counter gets larger. So if we want to get a new letter we just look at the webpage at:
#Base URL atm it only works for this website but the Code could be adapted to other sites base_url = "https://www.volksblatt.li/Leserbriefe" #Bounds for Letters to analyze amount_of_letters = 654932 lower_bound = 647504
So after the basic config, we open a txt-file, in which we save all letters that were already looked at and get the last id.
#Looking at all letters already looked at with open("text.txt","r") as articles: list_articles = articles.read().split("\n") newest_letter_id = amount_of_letters for article in reversed(list_articles): if " --- " in article: newest_letter_id = int(article.split(" --- ")) break logging.info("Newest Letter id: "+str(newest_letter_id))
Now we are ready to start checking for new letters.
#--------Creator-------- #Actually Getting the letters from the site #Writing it to the output file text.txt with open("text.txt","a") as text: #Going from upperbound to Lower Bound for i in reversed(range(newest_letter_id)): if i < lower_bound: logging.debug("Exited with: "+str(i)) exit() letter = Leserbrief(i) if letter.get_text()=="Error": logging.debug("Letter not available at: "+str(letter.id)+" | did not get any text") else: text.write( letter.id+" --- "+letter.get_title() + " --- "+ letter.get_creator()+" --- "+ letter.get_text()+"\n") logging.info("") print("Done with: "+str(letter.id))
from urllib.request import urlopen from html.parser import HTMLParser from urllib import parse from bs4 import BeautifulSoup import logging class Leserbrief: url = "https://www.volksblatt.li/Leserbriefe/" def __init__(self, id): self.id=str(id) self.url = Leserbrief.url+self.id self.title="body_hTitel" self.creator="body_divInfo" self.text="body_divText"