Yu ZHANG 张彧

Contact

Email: yuz9yuz [AT] gmail [DOT] com
Office: CSE 502, Paul G. Allen Center, 185 E Stevens Way NE, Seattle, WA 98195-2350

About Me

I am an incoming Assistant Professor at the Department of Computer Science and Engineering at Texas A&M University. I received my Ph.D. and M.Sc. degrees in Computer Science from the University of Illinois at Urbana-Champaign, advised by Prof. Jiawei Han. My research has been supported by the Dissertation Completion Fellowship (awarded by the UIUC Graduate College) and the Yunni & Maxine Pao Memorial Fellowship (awarded by the UIUC College of Engineering). I am currently visiting the Paul G. Allen School of Computer Science & Engineering at the University of Washington, working with Prof. Sheng Wang.

Prior to UIUC, I received my B.Sc. degree in Computer Science from Peking University, advised by Prof. Yan Zhang.

In the summers of 2022, 2021, and 2020, I interned at Microsoft Research Redmond three times, working in different groups with different researchers, including Dr. Iris Shen, Dr. Hao Cheng, Dr. Xiaodong Liu, and Dr. Yuxiao Dong.

In the summer of 2016, I visited the School of Computer Science at Carnegie Mellon University, working with Prof. Kathleen M. Carley.

For further information, please see my CV.

I am looking for self-motivated Ph.D. students and interns! Please fill out this form if you are interested in working with me. After completing the form, you are also welcome to reach out via email. I will read all submitted forms and emails but I do apologize for not being able to respond to each of them.

Research Interests

Data Mining and NLP for Science (Biomedicine, Chemistry, and Science of Science):
[EMNLP 2024, EMNLP 2023, KDD 2023a, WWW 2023, Bioinformatics 2019]

Large Language Models + Graphs:
[ACL 2024, ACL 2023, KDD 2023b, ICLR 2023]

Structure/Knowledge-Enhanced Text Mining:
[WWW 2022, WSDM 2022, WWW 2021, WSDM 2021, SIGIR 2020]

What’s New [What’s Not New…]

2024-07 to 2024-12 Invited to be a PC member of KDD 2025, ICLR 2025, and WWW 2025.

2024-10-23 Our paper on Scientific Document Retrieval was accepted by WSDM 2025! The acceptance rate is 17.3% (106/614).

2024-10-14 Successfully defended my Ph.D. thesis: Structure-Enhanced Text Mining for Science! I would like to express my deepest gratitude to my advisor, Prof. Jiawei Han, and my thesis committee members, Prof. Tarek Abdelzaher, Prof. Hanghang Tong, Prof. Wei Wang, and Dr. Iris Shen!

2024-09-20 Our survey on Scientific Large Language Models was accepted by EMNLP 2024 main conference!

2024-08 Started my visit at the University of Washington! Super excited to work with Prof. Sheng Wang!

2024-01 to 2024-06 Invited to be a PC member of KDD 2024, CIKM 2024, NeurIPS 2024, and WSDM 2025.

2024-05-17 Our paper on Ontology-Enhanced Fine-grained Entity Typing was accepted by KDD 2024!

2024-05-16 Our paper on Graph Chain-of-Thought Prompting was accepted by ACL 2024 Findings!

2024-05-15 Gave a guest lecture at Northwestern University [slides].

2024-03 Attended WSDM 2024 in Mérida, Mexico (in person) to present our tutorial and give a keynote at the Machine Learning on Graphs (MLoG) Workshop [slides].

Selected Publications [Full List]

(“*” indicates equal contribution. Unless otherwise specified, the paper is accepted as a research track long/regular paper.)

2025

Improving Scientific Document Retrieval with Concept Coverage-based Query Set Generation
SeongKu Kang, Bowen Jin, Wonbin Kweon, Yu Zhang, Dongha Lee, Jiawei Han, and Hwanjo Yu.
WSDM 2025. Hannover, Germany.

2024

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [PDF] [arXiv] [project page]
Yu Zhang*, Xiusi Chen*, Bowen Jin*, Sheng Wang, Shuiwang Ji, Wei Wang, and Jiawei Han.
EMNLP 2024. Miami, FL, USA. (450+ stars on GitHub!)

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains [PDF] [arXiv] [code]
Yu Zhang*, Yunyi Zhang*, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, and Jiawei Han.
AAAI 2024. Vancouver, Canada.

Ontology Enrichment for Effective Fine-grained Entity Typing [PDF] [arXiv]
Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, and Jiawei Han.
KDD 2024. Barcelona, Spain.

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [PDF] [arXiv] [code]
Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Zheng Li, Ruirui Li, Xianfeng Tang, Suhang Wang, Yu Meng, and Jiawei Han.
ACL 2024 Findings. Bangkok, Thailand.

2023

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [PDF] [arXiv] [project page] [code] [model] [dataset] [PMC-Patients leaderboard]
Yu Zhang*, Hao Cheng*, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, and Jianfeng Gao.
EMNLP 2023 Findings. Singapore, Singapore.

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers [PDF] [arXiv] [code]
Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, and Jiawei Han.
KDD 2023. Long Beach, CA, USA.

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study [PDF] [arXiv] [code] [dataset]
Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, and Jiawei Han.
WWW 2023. Austin, TX, USA.

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts [PDF] [arXiv] [code]
Yu Zhang*, Yunyi Zhang*, Martin Michalski*, Yucheng Jiang*, Yu Meng*, and Jiawei Han.
WSDM 2023. Singapore, Singapore.

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training [PDF] [arXiv] [code]
Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, and Jiawei Han.
EMNLP 2023. Singapore, Singapore.

Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks [PDF] [arXiv] [code]
Bowen Jin, Yu Zhang, Qi Zhu, and Jiawei Han.
KDD 2023. Long Beach, CA, USA.

Patton: Language Model Pretraining on Text-Rich Networks [PDF] [arXiv] [code]
Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, and Jiawei Han.
ACL 2023. Toronto, Canada.

Chain-of-Skills: A Configurable Model for Open-Domain Question Answering [PDF] [arXiv] [code]
Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, and Jianfeng Gao.
ACL 2023. Toronto, Canada.

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning [PDF] [arXiv] [code]
Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, and Jiawei Han
ICML 2023. Honolulu, HI, USA.

Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks [PDF] [arXiv] [code]
Bowen Jin, Yu Zhang, Yu Meng, and Jiawei Han.
ICLR 2023. Kigali, Rwanda.

Gotta: Generative Few-shot Question Answering by Prompt-based Cloze Data Augmentation [PDF] [arXiv] [code]
Xiusi Chen, Yu Zhang, Jinliang Deng, Jyun-Yu Jiang, and Wei Wang.
SDM 2023. Minneapolis, MN, USA. (Regular Paper, Best Poster Award Honorable Mention)

2022

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds [PDF] [arXiv] [code]
Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, and Jiawei Han.
NAACL 2022. Seattle, WA, USA.

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification [PDF] [arXiv] [code]
Yu Zhang, Zhihong Shen, Chieh-Han Wu, Boya Xie, Junheng Hao, Ye-Yi Wang, Kuansan Wang, and Jiawei Han.
WWW 2022. Lyon, France.

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information [PDF] [arXiv] [code]
Yu Zhang*, Shweta Garg*, Yu Meng, Xiusi Chen, and Jiawei Han.
WSDM 2022. Tempe, AZ, USA.

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark [PDF] [arXiv] [code]
Carl Yang*, Yuxin Xiao*, Yu Zhang*, Yizhou Sun, and Jiawei Han.
TKDE. Volume 34, Issue 10. IEEE.

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding [PDF] [arXiv] [code]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
NeurIPS 2022. New Orleans, LA, USA.

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, and Jiawei Han.
WWW 2022. Lyon, France.

2021

MATCH: Metadata-Aware Text Classification in A Large Hierarchy [PDF] [arXiv] [code]
Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, and Jiawei Han.
WWW 2021. Ljubljana, Slovenia.

Hierarchical Metadata-Aware Document Categorization under Weak Supervision [PDF] [arXiv] [code]
Yu Zhang, Xiusi Chen, Yu Meng, and Jiawei Han.
WSDM 2021. Jerusalem, Israel.

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, and Jiawei Han.
EMNLP 2021. Punta Cana, Dominican Republic.

2020

Minimally Supervised Categorization of Text with Metadata [PDF] [arXiv] [code]
Yu Zhang*, Yu Meng*, Jiaxin Huang, Frank F. Xu, Xuan Wang, and Jiawei Han.
SIGIR 2020. Xi’an, China.

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding [PDF] [arXiv] [code]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, and Jiawei Han.
KDD 2020. San Diego, CA, USA.

Discriminative Topic Mining via Category-Name Guided Text Embedding [PDF] [arXiv] [code]
Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, and Jiawei Han.
WWW 2020. Taipei, Taiwan.

2019

HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories [PDF] [arXiv] [code]
Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, and Jiawei Han.
ICDM 2019. Beijing, China.

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning [PDF] [arXiv] [bioRxiv] [code]
Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, and Jiawei Han.
Bioinformatics. Volume 35, Issue 10. Oxford University Press.

Integrating Local Context and Global Cohesiveness for Open Information Extraction [PDF] [arXiv] [code]
Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, and Jiawei Han.
WSDM 2019. Melbourne, VIC, Australia.

2018

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning [PDF] [arXiv] [code]
Meng Qu, Xiang Ren, Yu Zhang, and Jiawei Han.
WWW 2018. Lyon, France.

Open Information Extraction with Global Structure Constraints [PDF] [code]
Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Frank F. Xu, and Jiawei Han.
WWW 2018. Lyon, France. (Poster, Best Poster Award Honorable Mention)

2017

RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation [PDF] [arXiv] [code]
Yu Zhang, Wei Wei, Binxuan Huang, Kathleen M. Carley, and Yan Zhang.
CIKM 2017. Singapore, Singapore. (Short Paper)

Top-K Influential Nodes in Social Networks: A Game Perspective [PDF] [code]
Yu Zhang and Yan Zhang.
SIGIR 2017. Shinjuku, Tokyo, Japan. (Short Paper)

Conference Tutorials (In Proceedings)

Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery [PDF] [tutorial page]
Bowen Jin, Yu Zhang, Sha Li, and Jiawei Han.
WSDM 2024. Mérida, Mexico. (Tutorial)

Pretrained Language Representations for Text Understanding: A Weakly-Supervised Perspective [PDF] [tutorial page]
Yu Meng, Jiaxin Huang, Yu Zhang, Yunyi Zhang, and Jiawei Han.
KDD 2023. Long Beach, CA, USA. (Tutorial)

Tutorials at The Web Conference 2023 [PDF] [tutorial page]
Valeria Fionda, Olaf Hartig, et al. (including Yu Zhang)
WWW 2023. Austin, TX, USA. (Tutorial)

Mining Structures from Massive Texts by Exploring the Power of Pre-trained Language Models [PDF] [tutorial page]
Yu Zhang, Yunyi Zhang, and Jiawei Han.
EDBT 2023. Ioannina, Greece. (Tutorial)

Adapting Pretrained Representations for Text Mining [PDF] [tutorial page]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
KDD 2022. Washington, DC, USA. (Tutorial)

On the Power of Pre-Trained Text Representations: Models and Applications in Text Mining [PDF] [tutorial page]
Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han.
KDD 2021. Singapore, Singapore. (Tutorial)

Honors and Awards

KDD Best Reviewer (30 in 1469), 2023
Dissertation Completion Fellowship, Graduate College, UIUC (the only recipient from CS), 2023
WWW Best Reviewer, 2023
SDM Best Poster Award Honorable Mention, 2023
Data Mining Research Excellence Gold Award, Data Mining Group, UIUC, 2023, 2024
CIKM Best Reviewer, 2022
Yunni & Maxine Pao Memorial Fellowship, the Grainger College of Engineering, UIUC, 2022
WWW Student Scholarship, 2021
WSDM Student Travel Grant, 2021, 2022, 2023, 2024
WWW Best Poster Award Honorable Mention, 2018
Outstanding Undergraduate Thesis Award, School of EECS, Peking University (10 in 320), 2017
Outstanding Graduates, Peking University, 2017
SIGIR Student Travel Grant, 2017, 2020
China National Scholarship (top 1% in Peking University), 2014
First Prize, National Olympiad in Informatics in Provinces, 2011, 2012

Invited Talks

Graph-Enhanced Scientific Text Mining [slides]
May 2024, guest lecture at Northwestern University.
March 2024, keynote at the Machine Learning on Graphs (MLoG) Workshop at WSDM 2024.

Professional Services

Conference Program Committee
KDD 2022-2025 (Best Reviewer 2023); WWW 2022-2025 (Best Reviewer 2023); WSDM 2023-2025;
CIKM 2021-2024 (Best Reviewer 2022); SDM 2024; ECML/PKDD 2022;
NeurIPS 2021-2022, 2024; ICML 2022-2024; ICLR 2021-2025; AAAI 2022;
ACL Rolling Review; ACL 2021, 2023; EMNLP 2020, 2022-2023; NAACL 2021; COLING 2022

Journal Reviewer
IEEE Transactions on Knowledge and Data Engineering (TKDE);
ACM Transactions on Information Systems (TOIS);
ACM Transactions on Knowledge Discovery from Data (TKDD);
ACM Transactions on the Web (TWEB) (Distinguished Reviewer);
IEEE Transactions on Audio, Speech and Language Processing (TASLP);
IEEE Transactions on Big Data (TBD);
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Student Volunteer
SIGIR 2020; KDD 2022-2023

Conference Session Chair
WWW 2023

Miscellany

I was born and raised in Shanghai, China. I graduated from the High School Affiliated to Fudan University.

I played bridge during high school and undergraduate time.