Yu ZHANG 张彧

Contact

Email: yuz9 [AT] illinois [DOT] edu
Office: Room 2113, Siebel Center for Computer Science, 201 N. Goodwin Ave, Urbana, IL 61801

About Me

I will join the Department of Computer Science and Engineering at Texas A&M University as a tenure-track Assistant Professor in January 2025.

I am looking for self-motivated Ph.D. students and interns! Please fill out this form if you are interested in working with me. After completing the form, you are also welcome to reach out via email. I will read all submitted forms and emails but I do apologize for not being able to respond to each of them.

From August 2024 to December 2024, I will be visiting the Paul G. Allen School of Computer Science & Engineering at the University of Washington, working with Prof. Sheng Wang.

I am a final-year Ph.D. candidate at the University of Illinois at Urbana-Champaign, advised by Prof. Jiawei Han. My research has been supported by the Dissertation Completion Fellowship (awarded by the UIUC Graduate College) and the Yunni & Maxine Pao Memorial Fellowship (awarded by the UIUC College of Engineering).

Prior to UIUC, I received my B.Sc. degree in Computer Science from Peking University, advised by Prof. Yan Zhang.

In the summers of 2022, 2021, and 2020, I interned at Microsoft Research Redmond three times, working in different groups with different researchers, including Dr. Iris Shen, Dr. Hao Cheng, Dr. Xiaodong Liu, and Dr. Yuxiao Dong.

In the summer of 2016, I visited the School of Computer Science at Carnegie Mellon University, working with Prof. Kathleen M. Carley.

For further information, please see my CV.

Research Interests

Data Mining and NLP for Science (Biomedicine, Chemistry, and Science of Science):
[arXiv 2023, EMNLP 2023, KDD 2023b, WWW 2023, Bioinformatics 2019]

Large Language Models + Graphs:
[KDD 2023a, ICLR 2023]

Structure/Knowledge-Enhanced Text Mining:
[WWW 2022, WSDM 2022, WWW 2021, WSDM 2021, SIGIR 2020]

What’s New [What’s Not New…]

2024-01 to 2024-06 Invited to be a PC member of KDD 2024 and CIKM 2024.

2024-03-03 Attended WSDM 2024 in Mérida, Mexico (in person) to present our tutorial and give a keynote at the Machine Learning on Graphs (MLoG) Workshop [slides].

2023-07 to 2023-12 Invited to be a PC member of WSDM 2024, ICLR 2024, WWW 2024, SDM 2024, and ICML 2024.

2023-12-09 Our paper on Seed-Guided Entity Typing was accepted by AAAI 2024! The acceptance rate is 23.7% (2342/9862).

2023-12-06 Attended EMNLP 2023 in Singapore (in person) to present our paper.

2023-10-28 Our tutorial proposal on Joint Text and Graph Modeling was accepted by WSDM 2024 tutorial track!

2023-10-07 Two papers on Scientific Language Model Pre-training and Weakly Supervised Text Classification were accepted by EMNLP 2023 (1 main conference + 1 findings)!

2023-08 Attended KDD 2023 in Long Beach, CA (in person) to present our research paper and tutorial.

2023-08-09 Honored to be chosen for the Best Reviewer Award by KDD 2023!

Selected Publications [Full List]

(“*” indicates equal contribution. Unless otherwise specified, the paper is accepted as a research track long/regular paper.)

Preprint

“Why Should I Review This Paper?” Unifying Semantic, Topic, and Citation Factors for Paper-Reviewer Matching
Yu Zhang*, Yanzhen Shen*, Xiusi Chen, Bowen Jin, and Jiawei Han.
arXiv:2310.14483

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs
Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Suhang Wang, Yu Meng, and Jiawei Han.
arXiv:2404.07103

Ontology Enrichment for Effective Fine-grained Entity Typing
Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, and Jiawei Han.
arXiv:2310.07795

Learning Multiplex Embeddings on Text-rich Networks with One Text Encoder
Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Han Zhao, and Jiawei Han.
arXiv:2310.06684

2024

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains [PDF] [arXiv] [code]
Yu Zhang*, Yunyi Zhang*, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, and Jiawei Han.
AAAI 2024. Vancouver, Canada.

2023

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [PDF] [arXiv] [project page] [code] [model] [dataset] [PMC-Patients leaderboard]
Yu Zhang*, Hao Cheng*, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, and Jianfeng Gao.
EMNLP 2023 Findings. Singapore, Singapore.

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers [PDF] [arXiv] [code]
Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, and Jiawei Han.
KDD 2023. Long Beach, CA, USA.

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study [PDF] [arXiv] [code] [dataset]
Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, and Jiawei Han.
WWW 2023. Austin, TX, USA.

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts [PDF] [arXiv] [code]
Yu Zhang*, Yunyi Zhang*, Martin Michalski*, Yucheng Jiang*, Yu Meng*, and Jiawei Han.
WSDM 2023. Singapore, Singapore.

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training [PDF] [arXiv] [code]
Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, and Jiawei Han.
EMNLP 2023. Singapore, Singapore.

Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks [PDF] [arXiv] [code]
Bowen Jin, Yu Zhang, Qi Zhu, and Jiawei Han.
KDD 2023. Long Beach, CA, USA.

Patton: Language Model Pretraining on Text-Rich Networks [PDF] [arXiv] [code]
Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, and Jiawei Han.
ACL 2023. Toronto, Canada.

Chain-of-Skills: A Configurable Model for Open-Domain Question Answering [PDF] [arXiv] [code]
Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, and Jianfeng Gao.
ACL 2023. Toronto, Canada.

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning [PDF] [arXiv] [code]
Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, and Jiawei Han
ICML 2023. Honolulu, HI, USA.

Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks [PDF] [arXiv] [code]
Bowen Jin, Yu Zhang, Yu Meng, and Jiawei Han.
ICLR 2023. Kigali, Rwanda.

Gotta: Generative Few-shot Question Answering by Prompt-based Cloze Data Augmentation [PDF] [arXiv]
Xiusi Chen, Yu Zhang, Jinliang Deng, Jyun-Yu Jiang, and Wei Wang.
SDM 2023. Minneapolis, MN, USA. (Regular Paper, Best Poster Award Honorable Mention)

2022

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds [PDF] [arXiv] [code]
Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, and Jiawei Han.
NAACL 2022. Seattle, WA, USA.

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification [PDF] [arXiv] [code]
Yu Zhang, Zhihong Shen, Chieh-Han Wu, Boya Xie, Junheng Hao, Ye-Yi Wang, Kuansan Wang, and Jiawei Han.
WWW 2022. Lyon, France.

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information [PDF] [arXiv] [code]
Yu Zhang*, Shweta Garg*, Yu Meng, Xiusi Chen, and Jiawei Han.
WSDM 2022. Tempe, AZ, USA.

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark [PDF] [arXiv] [code]
Carl Yang*, Yuxin Xiao*, Yu Zhang*, Yizhou Sun, and Jiawei Han.
TKDE. Volume 34, Issue 10. IEEE.

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding [PDF] [arXiv] [code]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
NeurIPS 2022. New Orleans, LA, USA.

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, and Jiawei Han.
WWW 2022. Lyon, France.

2021

MATCH: Metadata-Aware Text Classification in A Large Hierarchy [PDF] [arXiv] [code]
Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, and Jiawei Han.
WWW 2021. Ljubljana, Slovenia.

Hierarchical Metadata-Aware Document Categorization under Weak Supervision [PDF] [arXiv] [code]
Yu Zhang, Xiusi Chen, Yu Meng, and Jiawei Han.
WSDM 2021. Jerusalem, Israel.

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, and Jiawei Han.
EMNLP 2021. Punta Cana, Dominican Republic.

2020

Minimally Supervised Categorization of Text with Metadata [PDF] [arXiv] [code]
Yu Zhang*, Yu Meng*, Jiaxin Huang, Frank F. Xu, Xuan Wang, and Jiawei Han.
SIGIR 2020. Xi’an, China.

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, and Jiawei Han.
KDD 2020. San Diego, CA, USA.

Discriminative Topic Mining via Category-Name Guided Text Embedding [PDF] [arXiv] [code]
Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, and Jiawei Han.
WWW 2020. Taipei, Taiwan.

2019

HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories [PDF] [arXiv] [code]
Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, and Jiawei Han.
ICDM 2019. Beijing, China.

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning [PDF] [arXiv] [bioRxiv] [code]
Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, and Jiawei Han.
Bioinformatics. Volume 35, Issue 10. Oxford University Press.

Integrating Local Context and Global Cohesiveness for Open Information Extraction [PDF] [arXiv] [code]
Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, and Jiawei Han.
WSDM 2019. Melbourne, VIC, Australia.

2018

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning [PDF] [arXiv] [code]
Meng Qu, Xiang Ren, Yu Zhang, and Jiawei Han.
WWW 2018. Lyon, France.

Open Information Extraction with Global Structure Constraints [PDF] [code]
Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Frank F. Xu, and Jiawei Han.
WWW 2018. Lyon, France. (Poster, Best Poster Award Honorable Mention)

2017

RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation [PDF] [arXiv] [code]
Yu Zhang, Wei Wei, Binxuan Huang, Kathleen M. Carley, and Yan Zhang.
CIKM 2017. Singapore, Singapore. (Short Paper)

Top-K Influential Nodes in Social Networks: A Game Perspective [PDF] [code]
Yu Zhang and Yan Zhang.
SIGIR 2017. Shinjuku, Tokyo, Japan. (Short Paper)

Conference Tutorials (In Proceedings)

Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery [PDF] [tutorial page]
Bowen Jin, Yu Zhang, Sha Li, and Jiawei Han.
WSDM 2024. Mérida, Mexico. (Tutorial)

Pretrained Language Representations for Text Understanding: A Weakly-Supervised Perspective [PDF] [tutorial page]
Yu Meng, Jiaxin Huang, Yu Zhang, Yunyi Zhang, and Jiawei Han.
KDD 2023. Long Beach, CA, USA. (Tutorial)

Tutorials at The Web Conference 2023 [PDF] [tutorial page]
Valeria Fionda, Olaf Hartig, et al. (including Yu Zhang)
WWW 2023. Austin, TX, USA. (Tutorial)

Mining Structures from Massive Texts by Exploring the Power of Pre-trained Language Models [PDF] [tutorial page]
Yu Zhang, Yunyi Zhang, and Jiawei Han.
EDBT 2023. Ioannina, Greece. (Tutorial)

Adapting Pretrained Representations for Text Mining [PDF] [tutorial page]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
KDD 2022. Washington, DC, USA. (Tutorial)

On the Power of Pre-Trained Text Representations: Models and Applications in Text Mining [PDF] [tutorial page]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
KDD 2021. Singapore, Singapore. (Tutorial)

Honors and Awards

KDD Best Reviewer (30 in 1469), 2023
Dissertation Completion Fellowship, Graduate College, UIUC (the only recipient from CS), 2023
WWW Best Reviewer, 2023
SDM Best Poster Award Honorable Mention, 2023
Data Mining Research Excellence Gold Award, Data Mining Group, UIUC, 2023, 2024
CIKM Best Reviewer, 2022
Yunni & Maxine Pao Memorial Fellowship, the Grainger College of Engineering, UIUC, 2022
WWW Student Scholarship, 2021
WSDM Student Travel Grant, 2021, 2022, 2023, 2024
WWW Best Poster Award Honorable Mention, 2018
Outstanding Undergraduate Thesis Award, School of EECS, Peking University (10 in 320), 2017
Outstanding Graduates, Peking University, 2017
SIGIR Student Travel Grant, 2017, 2020
China National Scholarship (top 1% in Peking University), 2014
First Prize, National Olympiad in Informatics in Provinces, 2011, 2012

Professional Services

Conference Program Committee
KDD 2022-2024 (Best Reviewer 2023); WWW 2022-2024 (Best Reviewer 2023); WSDM 2023-2024;
CIKM 2021-2024 (Best Reviewer 2022); SDM 2024; ECML/PKDD 2022;
NeurIPS 2021-2022; ICML 2022-2024; ICLR 2021-2024; AAAI 2022;
ACL 2021, 2023; EMNLP 2020, 2022-2023; NAACL 2021; COLING 2022

Journal Reviewer
IEEE Transactions on Knowledge and Data Engineering (TKDE);
ACM Transactions on Information Systems (TOIS);
ACM Transactions on Knowledge Discovery from Data (TKDD);
ACM Transactions on the Web (TWEB) (Distinguished Reviewer);
IEEE Transactions on Audio, Speech and Language Processing (TASLP);
IEEE Transactions on Big Data (TBD);
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Student Volunteer
SIGIR 2020; KDD 2022-2023

Conference Session Chair
WWW 2023

Miscellany

I was born and raised in Shanghai, China. I graduated from the High School Affiliated to Fudan University.

I played bridge during high school and undergraduate time.