Research

Research and Publications

Theory and Algorithms: Submodular optimization and approximation algorithms

In today's data-driven landscape, the steep growth and complexity of datasets necessitate advanced algorithms for efficient processing and analysis. Efficient and robust algorithms are pivotal in managing and extracting insights from vast amounts of information. Among these, submodular optimization stands out for its scalability and robust theoretical foundations.

A submodular function is defined on a discrete domain, such as a set of data points, data features, or users within social networks. Many machine learning and graph mining tasks can be formulated as maximizing submodular objective function(s) under constraints, a problem commonly referred to as Constrained Submodular Maximization (CSM). Examples include (i) finding exemplars in k-medoid clustering, (ii) selecting an optimal subset of features in sparse linear regression, and (iii) identifying key social influencers in viral marketing campaigns. Beyond these, submodular objectives also arise in diverse tasks such as data summarization, active learning, ad allocation, and optimizing large language models (LLMs). Therefore, I am interested in

designing general provable efficient algorithms to solve a class of real-world computational problems with submodular structures

CSM problems have been proven to be NP-hard yet admit approximation algorithms with theoretical guarantees. Thus, the main objectives of my research are:

Theoretical Robustness: developing approximation algorithms with strong theoretical guarantees.
Computational Efficiency: improving the running time of approximation algorithms while retaining good approximation guarantees.
Fairness: designing algorithms with considerations of fairness and diversity, and studying the effects of fair solutions in computational problems.
Social Good: exploring extensive applications of submodular optimization in machine learning and data science, and providing rigorous solutions.

Fairness in k-submodular Maximization: Algorithms and Applications.

Yanhui Zhu, Samik Basu, and Pavan Aduri.

IEEE International Conference on Big Data (BigData), 2024.

Submodular Optimization: Variants, Theory and Applications.

Yanhui Zhu.

ACM International Conference on Information and Knowledge Management (CIKM), PhD Symposium Track, 2024.

Regularized Unconstrained Weakly Submodular Maximization.

Yanhui Zhu, Samik Basu, and Pavan Aduri.

ACM International Conference on Information and Knowledge Management (CIKM), 2024.

Improved Evolutionary Algorithms for Submodular Maximization with Cost Constraints.

Yanhui Zhu, Samik Basu, and Pavan Aduri.

International Joint Conference on Artificial Intelligence (IJCAI), 2024.

Size-constrained k-submodular Maximization In Near-Linear Time.

Guanyu Nie*, Yanhui Zhu*, Yididiya Y. Nadew, Samik Basu, Pavan Aduri, and Christopher John Quinn. ( * equal contributions, alphabetical order. )

Conference on Uncertainty in Artificial Intelligence (UAI), 2023.

Maximizing Submodular Functions under Submodular Constraints.

M. R. Padmanabhan, Yanhui Zhu, Samik Basu, and Pavan Aduri.

Conference on Uncertainty in Artificial Intelligence (UAI), 2023.

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback.

Guanyu Nie, Yididiya Y. Nadew, Yanhui Zhu, Vaneet Aggarwal, and Christopher John Quinn.

International Conference on Machine Learning (ICML), 2023.

Graph mining: Community detection and information diffusion

Topological structures naturally exist in a wide range of real-world scenarios, e.g., social networks, protein interactions in biological metabolism, etc. Effective graph analytics can benefit a lot of applications. My research aims to develop fast and effective algorithms to

unveil hidden structural information (clustering) in large-scale networks via community detection, and
explore information diffusion in social networks with submodular optimization.

The algorithms are easy to implement, noise-resilient, theoretically sound, and can be applied to various real-world networks.

SCOREH+: A High-Order Node Proximity Spectral Clustering on Ratios-of-Eigenvectors Algorithm for Community Detection.

Yanhui Zhu, Fang Hu, Lei Hsin Kuo, and Jia Liu.

IEEE Transactions on Big Data (2024).

Computing communities in complex networks using the Dirichlet processing Gaussian mixture model with spectral clustering.

Fang Hu, Yanhui Zhu, Jia Liu, and Yalin Jia.

Physics Letters A (2019).

Data science and modeling

The applied data science research brings together modern machine learning techniques and applies them to cutting-edge problems in environmental sciences, geosciences, and medical sciences.

Example: With a real-world dataset, such as electronic health records (EHR), we begin by identifying the problems to address and pre-processing the data under the guidance of doctors and practioners. From out end, we identify and test a range of machine learning and statistical models, and the best and most suitable models are selected (usually with modifications and optimizations) whereas they are not necessarily the most complicated and fancy ones. We analyze the results and draw conclusions from our models, independent of the domain experts' experiences. Our modelings often uncover latent information previously unknown to the experts, providing valuable insights that can inform and enhance their ongoing research.

Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest.

Jianchao Cai, Kai Xu, Yanhui Zhu, Fang Hu, and Liuhuan Li.

Applied Energy (2020).

An efficient Long Short-Term Memory model based on Laplacian Eigenmap in artificial neural networks.

Fang Hu, Yanhui Zhu, Jia Liu, and Liuhuan Li.

Applied Soft Computing (2020).

Google Sites

Report abuse