Text2Story

Extracting journalistic narratives from text and representing them in a narrative modeling language.

More


Picture: Morning News by Francis Luis Mora

About

Nowadays journalistic content is distributed in multiple formats, mostly through the web and specific internet based applications running on smartphones and tablets. Text is a very important format, but readers (or more accurately users or information consumers) heavily rely on images, videos, slideshows, charts and infographics. Textual content is still the main representation for information. Any journalistic subject (e.g. Trump and Russia) is described in one or more texts produced by journalists and possibly commented by readers. Many of those subjects are followed during days, weeks or months. To grasp a possibly vast and somewhat complex set of interconnected news articles, readers would greatly benefit from tools that summarize those articles by showing main actors, their interplay and their trajectories in time and space, their motivations, main events, causal relations of events and outcomes. In other words, tools that extract narrative elements and re-represent them in formats that convey the essential story but that are more efficiently consumed by the users.

This vibrant research line poses many challenging problems in information extraction and automatic production of media content. At this project want to be able to extract narratives/stories from news articles or collections of related news articles (unstructured data) about the same (or related) subject, representing those narratives in intermediate data structures (structured data) and making this available to subsequent media production processes (semi-automatic generation of slide shows, infographics and other visualizations, video sequences, games, etc.). In summary, our aim in Text2Story project is to develop a conceptual framework and operational pipeline for the extraction of narratives from textual sources. The project focuses on the automatic processing of journalistic text in written Portuguese.

Team

Alipio

Alípio Jorge

Principal Investigator

U. Porto and INESC TEC

His aim is to make the computer get the essential of a narrative, represent it and show the narrative as a timeline, a slide-show, a video or a game. The plan is to apply Machine Learning and NLP

Alexandre

Alexandre Ribeiro

MSc Researcher

U. Porto and INESC TEC

Antonio

António Leal

PhD Researcher

U. Porto and CLUP

António is mainly interested in the formal representation of meaning in Portuguese news, namely how tense, aspect and nominal reference can be encoded.

Brenda

Brenda Santana

PhD Student

UFRGS and INESC TEC

Brenda is studying NLP and is also our webmaster. In the project she is focusing on extracting the relation between time and events.

Daniel

Daniel Oliveira

PhD Student

U. Porto and INESC TEC

Evelin

Evelin Amorim

PhD Researcher

INESC TEC

My main interests are NLP related research. Specifically, extraction of information of text, automatic essay scoring, and all the subtasks related to these two fields.

Fátima Oliveira

Fátima Oliveira

PhD Researcher

U. Porto and CLUP

Fátima Silva

Fátima Silva

PhD Researcher

U. Porto and CLUP

Hugo

Hugo Sousa

PhD Student

U. Porto and INESC TEC

Looking for a away to make models temporally aware.

Ines

Inês Cantante

MSc Researcher

U. Porto and CLUP

Joana Valente

Joana Valente

MSc Student

U.Porto

João Nogueira

João Nogueira

MSc Student

U.Porto

His aim is to boost NLP models for Portuguese using transfer learning and data augmentation.

Mariana Costa

MSc Student

U.Porto

João Cordeiro

João Paulo Cordeiro

PhD Researcher

UBI and INESC TEC

John

Pavel Brazdil

PhD Researcher

U. Porto and INESC TEC

Pedro

Pedro Botelho

MSc Student

U. Porto and SAPO24

Looking at the inner workings of news stories and how people interact with them, his aim is to find innovative ways to present journalism.

Pedro Mota

Pedro Mota

BSc Student

U. Porto and INESC TEC

John

Purificação Silvano

PhD Researcher

U. Porto and CLUP

Her aim is to study temporal relations in news articles and contribute to the representation of the timeline comprised in the narratives.

Ricardo

Ricardo Campos

Co-Investigator

IPT and INESC TEC

His aim is to get new insights from news articles. In particular, he is interested in applying nlp as a means to understand how events relate to the temporal dimension.

Sergio

Sérgio Nunes

PhD Researcher

U. Porto and INESC TEC

John

Sofia Oliveira

MSc Student

U. Porto and INESC TEC

Partners

Events & Special Issues

Talks:

Outcomes

Datasets

Publications

MSc Dissertations

BSc Dissertations

  • Pedro Mota (2021). Pipeline for narrative extraction. FCUP (orientador: Alípio Jorge)

Acknowledgements

This project is financed by the ERDF – European Regional Development Fund through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project PTDC/CCI-COM/31857/2017 (NORTE-01-0145-FEDER-03185)