Gemini     GitHub   Link

The output result of the visualization module of Gemini. In this example, we want to evaluate how accurate Unitex fetching the place name in a text. The yellow square shows the annotations generated by Unitex automatically, and the blue square shows the "correct answer" which is annotated by my colleague manually. The result also shows whether the attribute in the annotation are the same or not in two different text colors.

This is a research internship topic that I have done in the first year of master, supervised by Tita Kyriacopoulou, Claude Martineau and Philippe Gambette. During this 3-month internship, I continued the development of an evaluation module for the Unitex/GramLab.

Unitex/GramLab is a toolkit for natural language processing. To evaluate it, we usually compare the result with manual labeling, using precision, recall and F-score. Gemini is developed to facility this comparison procedure. Gemini is coded in Java. It takes two annotated text files as input (in BRAT or XML format), and calculates various metrics selon the matching mode and measure mode chosen. For more infomation about metrics, check the description on the GitHub page.

  • Gemini allows you to explore the comparison result visually in a web page where the matched annotations are shown in two different colors:

    Gemini also allows you to export a spreadsheet file containing all pairs of annotation:

  • Gemini procedure

    The diagram below shows how Gemini works. It takes two Brat/XML files as input, extraits annotation table for each file. A bipartite graph is generated with these annotations. Then applying matching algorithms to them for alignment. Finally counting matched relations and calculating metrics using this graph.