CRC 1342: Global Dynamics of Social Policy

Place	Mary-Somerville-Str. 7 28359 Bremen
Time	10 am - 4 pm
Contact Person	Prof. Dr. Sebastian Haunss
Organisation	Teilprojekt A04 (2022-25): SFB 1342, Universität Bremen
Lecture Series	Internal Events

The two-day workshop conveys basic knowledge of text classification workflows (document and sentence classification, e.g. for identifying relevant documents or sentiment analysis) and sequence tagging forr information extraction (e.g. named entities or protest event data such as protest form, issues, or number f participants). The workflow presented in the workshop is based on neural transformer networks (BERT and successor models) and includes the following steps:

1) handling of typical data formats for model training and prediction (CSV, XMI CAS, CoNLL),
2) application of pre-trained models,
3) training or fine-tuning of models with own data for new tasks,
4) hyper-parameter optimisation and evaluation of models.

The workshop includes short lecture content and plenty of time for exercises with prepared Jupyter Python notebooks running in the Google Colab platform in the browser. For machine learning, the Flair NLP framework and the Huggingface library of transformers based on Pytorch will be used.

Prerequisites
Basic knowledge in Python programming, handling of Pandas data frames, optional: familiarisation with the Flair NLP framework

Preparation
register a Google account (exercises will be done with Google Colab)

Data set
Test events in local news texts (will be provided)

Events

Text Classification and Information Extraction with Neural Networks for Computational Social Science

Partner