In today's data-driven landscape, the steep growth and complexity of datasets necessitate advanced algorithms for efficient processing and analysis. Efficient and robust algorithms are pivotal in managing and extracting insights from vast amounts of information. Among these, submodular optimization stands out for its scalability and robust theoretical foundations.
A submodular function is defined on a discrete domain, such as a set of data points, data features, or users within social networks. Many machine learning and graph mining tasks can be formulated as maximizing submodular objective function(s) under constraints, a problem commonly referred to as Constrained Submodular Maximization (CSM). Examples include (i) finding exemplars in k-medoid clustering, (ii) selecting an optimal subset of features in sparse linear regression, and (iii) identifying key social influencers in viral marketing campaigns. Beyond these, submodular objectives also arise in diverse tasks such as data summarization, active learning, ad allocation, and optimizing large language models (LLMs). Therefore, I am interested in
designing general provable efficient algorithms to solve a class of real-world computational problems with submodular structures
CSM problems have been proven to be NP-hard yet admit approximation algorithms with theoretical guarantees. Thus, the main objectives of my research are:
Theoretical Robustness: developing approximation algorithms with strong theoretical guarantees.
Computational Efficiency: improving the running time of approximation algorithms while retaining good approximation guarantees.
Fairness: designing algorithms with considerations of fairness and diversity, and studying the effects of fair solutions in computational problems.
Social Good: exploring extensive applications of submodular optimization in machine learning and data science, and providing rigorous solutions.
Topological structures naturally exist in a wide range of real-world scenarios, e.g., social networks, protein interactions in biological metabolism, etc. Effective graph analytics can benefit a lot of applications. My research aims to develop fast and effective algorithms to
unveil hidden structural information (clustering) in large-scale networks via community detection, and
explore information diffusion in social networks with submodular optimization.
The algorithms are easy to implement, noise-resilient, theoretically sound, and can be applied to various real-world networks.
Fang Hu, Yanhui Zhu, Jia Liu, and Yalin Jia.
Physics Letters A (2019).
The applied data science research brings together modern machine learning techniques and applies them to cutting-edge problems in environmental sciences, geosciences, and medical sciences.
Example: With a real-world dataset, such as electronic health records (EHR), we begin by identifying the problems to address and pre-processing the data under the guidance of doctors and practioners. From out end, we identify and test a range of machine learning and statistical models, and the best and most suitable models are selected (usually with modifications and optimizations) whereas they are not necessarily the most complicated and fancy ones. We analyze the results and draw conclusions from our models, independent of the domain experts' experiences. Our modelings often uncover latent information previously unknown to the experts, providing valuable insights that can inform and enhance their ongoing research.
Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest.
Jianchao Cai, Kai Xu, Yanhui Zhu, Fang Hu, and Liuhuan Li.
Applied Energy (2020).
An efficient Long Short-Term Memory model based on Laplacian Eigenmap in artificial neural networks.
Fang Hu, Yanhui Zhu, Jia Liu, and Liuhuan Li.
Applied Soft Computing (2020).