The discourse annotation of argument-predicator connectives covers three main annotation tasks:
Task 1: Identification of explicit Arabic discourse connectives.
Task 2: Disambiguating discourse connectives by annotating discourse relations they convey.
Task 3: Annotating the two arguments, the abstract objects in charge of a particular connective.
In this first discourse annotation effort for Arabic, the LADTB v.1, we concentrate on explicit discourse relations that are signalled by one of the discourse connectives in our inventory for Arabic. On that the annotation of other cohesive devices such as implicit relations, attribution, entity relations and anaphora are the subjects of new studies or future versions of the LADTB.
We created the first Arabic discourse corpus the LADTB using the annotation principles following the prinsibles of the PDTB annotation. Our annotation scheme of explicit discourse relations presents these principles and any Arabic specific adaptations we have made. We used the READ tool to annotate all potential discourse connectives in the Arabic treebank Part1 using our collection of Arabic discourse connectives.
The human annotation was conducted by two well-trained Arabic native speakers, who have a good linguistic background, on 537 news files from the Penn Arabic Treebank Part1 including 126,394 tokens after the treebank clitic segmentation. The gold-standard of the LADTB includes 6,328 annotations of 80 explicit connective types, and 55 distinct discourse relations (17 single relations). The hirarchy strucuter of Arabic discourse relations is presented in Fig 1.
Fig 1: Arabic discourse relations
We reported inter-annotator agreement studies for the three annotation tasks in our publications. The annotation then filtered by semi-automatic and manual post-processing for all disagreements to drive towards a gold standard .
the common disagreement cases of all annotation tasks were reported in (ppt) for future development.
The complete distributions of discourse connectives and relations in the LADTB gold standard are presented in Tools and Downloads page.