PRET: Prerequisite-Enriched Terminology


PRET is a tool to support the creation of Gold Standard Concept Maps


The tool is designed to be an easy-to-use interface for domain experts that want to identify, tag and study  prerequisite relations between pairs of concepts in textual educational resources like textbooks.

PRET provides an effective user interface for supporting manual annotation of relations by enabling the expert to read the textual resource and annotate the presence of relations in the text. The result of the annotation produced by each annotator is a Concept Map of the textbook, that is a prerequisite-enriched terminology of the given educational resource (the text, when uploaded by the expert, is pre-processed and linguistically annotated according to the CoNLL-U Format).

The combination of each annotator Concept Map generates a Gold Standard Concept Map. The Gold Standard Concept Map  can be exploited (i) as an autonomous resource, e.g., exploring its content, or (ii) to evaluate the output of automatic prerequisite learning methods (some  methods to extract prerequisite relations are integrated and available in PRET).

Features of the tool:

    • in-text annotation
    • language and domain independent
    • supports the user during the annotation phase
    • provides different views of the concept maps
    • provides agreement metrics between annotators
    • provides annotation check and formal validation
    • provides some  automatic methods to extract prerequisite relations
    • allows to match Gold Standard Concept Maps against the output of automatic metods for prerequisite extraction

The tool is available on GitHub  https://github.com/Teldh/PRET

The docs folder on GitHub contains also PRET Quickstart Guide and the Annotation Manual  for annotators.

The Annotation Manual contains the instructions and recommendations for performing manual annotation of prerequisite relations in educational materials, according to the PRErequisite Annotation Protocol (PREAP).

PREAP is  an annotation protocol specifically designed to perform prerequisite relation annotation on textual educational resources. PRET tool has been designed according to the principles of PREAP,  thus the tool allows to easily carry out annotation projects aiming at building resources following the principles of PREAP.

For the documentation about PREAP annotation protocol please contact TELDH

 

Base principles and definitions in PREAP and PRET

PREAP is an annotation protocol aimed at building resources manually annotated with prerequisite relations. The protocol is designed to preserve as much as possible the strict bound between text an content, supporting manual identification of prerequisite relations while reading educational materials, for example a textbook. Having an annotation strictly bound to the text has the advantage of making the annotation independent from any external resource and, at the same time, it reflects the teaching approach of the author of the resource being annotated. As a consequence, the novel prerequisite annotation approach defined by PREAP allows the creation of resources that can be used to investigate concepts organisation within the content of a learning material and if the prerequisites appear within recurrent linguistic patterns.

In what follows we provide some definitions about the basic principles and concepts used in PRET tool and PREAP annotation protocol. Plese refer to the documentation for more comprehensive definitions.

    • Prerequisite Relations: they are pedagogical relations that hold between educational concepts described in educational materials. These relations express what should be understood first in order to avoid knowledge gaps when learning a new concept.
    • Concepts: they represent the building blocks of learning in a subject domain.  With regard to granularity, concepts can be very general (e.g., algebra, geometry, mathematics etc.) or very specific (e.g., radius, integer multiplication, fraction denominator). Either way, they are represented in texts as lexical entities constituted by a single or multi-word term.
    • Corpus: a corpus is a document, the textual resource, such as a textbook or any educational material in written form. It can be enhanced with labels to become an annotated text, i.e. a text where certain information related to its content is made explicit through annotation.
    • Annotation: in general terms, annotation is the process of adding comments, notes, explanations, or other types of external marks that can be attached to a (part of a) document. In PREAP, we refer to annotation as the manual process led by humans consisting in adding labels to a textual corpus in order to indicate the presence of prerequisite relations between two concepts mentioned in the text.
    • Annotation Protocol: guidelines and specifications aimed at indicating how to obtain corpora enriched with explicit information regarding a certain phenomenon and that are designed to be reproduced on any unannotated texts at any time.
    • Annotation Project: the set of tasks aimed at building an annotated text that includes explicit annotations about the phenomenon being studied (here, prerequisite relations in educational texts). The project includes  every element involved in the annotation, such as the documentation concerning annotation specifications, the corpus to be annotated, the people taking part in the annotation process and the results obtained.
    • Project Manager: the person or team leading the annotation project. The manager is in charge of taking decisions concerning the goals and settings of the annotation project.
    • Annotation Guidelines: instructions and recommendations that indicate how to perform annotation.
    • Annotators: the persons that perform the annotation of the textual corpus according to the guidelines and project principles.
    • Gold Standard Dataset: the output of the annotation project. A Gold Standard Dataset (or Gold-PR dataset with reference to a Gold Standard annotated with PR relations, or Gold Standard Concept Maps, as intended here) is a ground truth annotated dataset based on a single trusted annotation, or obtained by combining multiple manual annotations into a single one. The Gold-PR are generally exploited to:
      • obtain informative analysis of the annotated phenomenon;
      • train and test the performances of machine learning systems;
      • compare the gold labels against those obtained using automatic systems for PR extraction to test their accuracies.