Challenges in Evaluating Natural Language Generation Systems
GT NLP Seminar is an interactive talk series held bi-weekly where students/faculty/staff with interest in Natural Language Processing at Georgia Tech meet together, have lunch and listen to talks about recent NLP research in a wide range of topics. Abstract: Recent advances in neural language modeling have opened up a variety of exciting new text generation applications. However, evaluating systems built for these tasks remains difficult. Most prior work relies on a combination of automatic metrics such as BLEU (which are often uninformative) and crowdsourced human evaluation (which are also usually uninformative, especially when conducted without careful task design). In this talk, I focus on two specific applications: (1) unsupervised sentence-level style transfer and (2) long-form question answering. I will go over our recent work on building models for these systems and then describe the ensuing struggles to properly compare them to baselines. In both cases, we identify (and propose solutions for) issues with existing evaluations, including improper aggregation of multiple metrics, missing control experiments with simple baselines, and high cognitive load placed on human evaluators. I'll conclude by briefly discussing our work on machine-in-the-loop text generation systems, in which both humans and machines participate in the generation process, where reliable human evaluation becomes much more feasible. Speaker: Mohit Iyyer, assistant professor at UMass Amherst Seminar schedule: https://sites.google.com/view/nlpseminar/home More info on the Machine Learning Center at Georgia Tech: http://ml.gatech.edu/
Download
0 formatsNo download links available.