ABOUT
MultiPIT is the largest Twitter-based paraphrase corpus to-date. It contains four parts: MultiPITcrowd, MultiPITexpert, MultiPITAuto, MultiPITNMR. MultiPITcrowd is a collection of crowdsourcing annoations with loosely defined paraphrase definitions. MultiPITexpert is a collection of expert annotations with strict defined paraphrase definitions. MultiPITAuto is a collection of automatically identified paraphrases pairs from recent Twitter data. MultiPITNMR is the first multi-reference test set for parpahrase generation.TALK VIDEO
PAPER
Improving Large-scale Paraphrase Acquisition and Generation EMNLP 2022Authors
from Georgia Institute of TechnologyDATA (available now)
100K+ crowdsourcing annotations 5K+ expert annotations 500K+ automatic annotations 200 × 8 expert annotationsCODE (coming soon...)
Acknowledgement: This material is based in part on research sponsored by IARPA via the BETTER program (contract 19051600004).
Rank | Model | Date | Precision | Recall | Accuracy | F1 |
---|
Rank | Model | Date | Precision | Recall | Accuracy | F1 |
---|
Rank | Model | Date | BERT-iBLEU | Self-BLEU | BERT-Score | BLEU |
---|
Rank | Metric | Referenceless | Fluency Correlation | Semantic Similarity Correlation | Diversity Correlation | Overall Correlation |
---|