Hi! I am Pengcheng Yin, a graduate student at the Language Technologies Institute of Carnegie Mellon University.
Update: I am looking for a full-time job at industrial research labs. Feel free to connect!
Research Papers
-
Learning Structural Edits via Incremental Tree Transformations.
Ziyu Yao, Frank Xu, Pengcheng Yin, Huan Sun, Graham Neubig.International Conference on Learning Representations (ICLR), 2021.
-
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data.
Pengcheng Yin, Graham Neubig, Scott Wen-tau Yih and Sebastian Riedel.Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
-
Incorporating External Knowledge through Pre-training for Natural Language to Code Generation.
Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, Graham Neubig.Annual Meeting of the Association for Computational Linguistics (ACL), Short Paper,2020.
-
Merging Weak and Active Supervision for Semantic Parsing.
Ansong Ni, Pengcheng Yin, Graham Neubig.The 34th AAAI Conference on Artificial Intelligence (AAAI), 2020.
-
PERQ: Predicting, Explaining, and Rectifying Failed Questions in KB-QA Systems.
Zhiyong Wu, Ben Kao, Tien-Hsuan Wu, Pengcheng Yin, Qun Liu.The 13th International Conference on Web Search and Data Mining (WSDM) , 2020.
-
DIRE: A Neural Approach to Decompiled Identifier Renaming.
Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, Bogdan Vasilescu.The 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) , 2019.
-
Reranking for Neural Semantic Parsing.
Pengcheng Yin, Graham Neubig.Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
-
Improving Open Information Extraction via Iterative Rank-Aware Learning.
Zhengbao Jiang, Pengcheng Yin, Graham Neubig.Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
-
Learning to Represent Edits.
Pengcheng Yin, Graham Neubig, Miltos Allamanis, Marc Brockschmidt, Alex Gaunt.International Conference on Learning Representations (ICLR), ArXiv e-prints 1810.13337, 2019.
-
TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation.
Pengcheng Yin, Graham Neubig.Conference on Empirical Methods in Natural Language Processing (EMNLP), Demo Track, 2018.
[Paper] | [Online Demo] | [Code] -
A Tree-based Decoder for Neural Machine Translation.
Xinyi Wang, Hieu Pham, Pengcheng Yin, Graham Neubig.Conference on Empirical Methods in Natural Language Processing (EMNLP), Short Paper, 2018.
-
Retrieval-Based Neural Code Generation.
Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, Graham Neubig.Conference on Empirical Methods in Natural Language Processing (EMNLP), Short Paper, 2018.
-
Towards Practical Open Knowledge Base Canonicalization.
Tien-Hsuan Wu, Zhiyong Wu, Ben Kao, Pengcheng Yin.ACM International Conference on Information and Knowledge Management (CIKM), 2018.
-
StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing.
Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig.Annual Meeting of the Association for Computational Linguistics (ACL), 2018.
[Paper] | [Slides] | [Code] -
Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow.
Pengcheng Yin*, Bowen Deng*, Edgar Chen, Bogdan Vasilescu, Graham Neubig.International Conference on Mining Software Repositories (MSR), 2018.
[The CoNaLa Code Generation Challenge] | [Paper] -
Learning to Mine Parallel Natural Language/Source Code Corpora from Stack Overflow.
Pengcheng Yin*, Bowen Deng*, Edgar Chen, Bogdan Vasilescu, Graham Neubig.International Conference on Software Engineering (ICSE) Poster Track, 2018.
[Paper] -
A Syntactic Neural Model for General-Purpose Code Generation.
Pengcheng Yin, Graham Neubig.Annual Meeting of the Association for Computational Linguistics (ACL), 2017.
[Paper] | [Code] -
DyNet: The Dynamic Neural Network Toolkit.
Graham Neubig et al., including Pengcheng Yin.ArXiv e-prints 1701.03980, 2017.
-
Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML.
Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Edward Hovy.ArXiv e-prints 1705.07136, 2017.
-
Neural Enquirer: Learning to Query Tables in Natural Language.
Pengcheng Yin, Zhengdong Lu, Hang Li, Ben Kao.International Joint Conference on Artificial Intelligence (IJCAI), 2016.
also appear in the 4th International Conference on Learning Representations (ICLR), Workshop Track, 2016.
[Paper] | ArXiv Full Version | ICLR 2016 Workshop Poster -
New Word Detection and Tagging on Chinese Twitter Stream.
Miya Liang, Pengcheng Yin, Siu-Ming Yiu.T. Large-Scale Data and Knowledge-Centered Systems, Vol XXXII, 2017.
-
Answering Questions with Complex Semantic Constraints on Open Knowledge Bases.
Pengcheng Yin, Nan Duan, Ben Kao, Junwei Bao, Ming Zhou.International Conference on Information and Knowledge Management (CIKM), 2015.
[Paper] | [Project Page] -
New Word Detection and Tagging on Chinese Twitter Stream.
Miya Liang*, Pengcheng Yin*, S.M. Yiu.
International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), 2015.
Industrial Experiences
- Researcher Intern, Microsoft Semantic Machines
- Part-time Research Collaborator, Facebook AI Research
- Research Intern, Facebook AI Research London
- Research Intern, Microsoft Research Cambridge, UK
- Research Intern, Microsoft Research
- Research Intern, Noah's Ark Lab, Huawei
- Research Intern, Microsoft Research Asia
Professional Services
- Program Committee Member: ICLR 2019 Workshop on Deep Generative Models for Highly Structured Data, EMNLP 2020 Workshop on Iteractive and Executable Semantic Parsing.
- Reviewer: ACL (outstanding reviewer @ ACL 2020), EMNLP, NAACL, NeurIPS, ICML (top 33% reviewer @ ICML '20), ICLR, etc.
- External Reviewer: CIKM '15, ICDM '15, KDD '16, KDD '17, KDD '18
Teaching and Coding
- TranX: a general-purpose syntax-driven neural semantic parser
- Strong results on six semantic parsing benchmarks
- pytorch_basic_nmt: a basic implementation of attentional nerual seq2seq models
- Used for instructional purposes in Stanford CS224N Nautral Language Processing with Deep Learning and CMU 11-731 Machine Translation and Sequence-to-Sequence Models.
Talks
- Towards Building Generalized Neural Semantic Parsers, Pengcheng Yin, Talk @ Microsoft Semantic Machines, Microsoft Research Redmond.
- Neural Network Models for Generating Source Code from Natural Language, Graham Neubig and Pengcheng Yin, Software Research Seminar @ Institute for Software Research, CMU.
- Towards Open-domain Generation of Programs from Natural Language, Pengcheng Yin, Machine Learning Lunch @ Machine Learning Department, CMU.
- Towards Open-domain Generation of Programs from Natural Language, Pengcheng Yin, Talk @ Microsoft Research Asia.
- Towards Open-domain Generation of Programs from Natural Language, and More, Pengcheng Yin, Talk @ DeeplyCurious.ai.
Awards
- IBM Ph.D. Fellowship, class of 2019
- Postgraudate Scholarship, The University of Hong Kong, 2016-2018
- Meritorious Winner of Mathematical Contest in Modeling, 2013