Gemini GitHub Link

This is a research internship topic that I have done in the first year of master, supervised by Tita Kyriacopoulou, Claude Martineau and Philippe Gambette. During this 3-month internship, I continued the development of an evaluation module for the Unitex/GramLab.
Unitex/GramLab is a toolkit for natural language processing. To evaluate it, we usually compare the result with manual labeling, using precision, recall and F-score. Gemini is developed to facility this comparison procedure. Gemini is coded in Java. It takes two annotated text files as input (in BRAT or XML format), and calculates various metrics selon the matching mode and measure mode chosen. For more infomation about metrics, check the description on the GitHub page.
-
Gemini allows you to explore the comparison result visually in a web page where the matched annotations are shown in two different colors:
Gemini also allows you to export a spreadsheet file containing all pairs of annotation:
-
Gemini procedure
The diagram below shows how Gemini works. It takes two Brat/XML files as input, extraits annotation table for each file. A bipartite graph is generated with these annotations. Then applying matching algorithms to them for alignment. Finally counting matched relations and calculating metrics using this graph.