Xiangpeng Hao

(he/him, pronunciations: Shyang-pung How)

📢📢📢 My partner (Ao Li) and I are on the tenure track faculty job market this year, seeking positions in computer science and related areas.

I'm a final year PhD student at the University of Wisconsin-Madison studying computer science with a focus on database/storage systems.

I'm advised by Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau.

I build efficient, cost-effective storage systems for the cloud. My research treats practical impact as a core goal: not only proposing ideas that advance academic knowledge, but also delivering them through careful engineering that pays attention to every detail. I have built: caching structures (SIGMOD '24) for multi-tier memory systems, BfTree (VLDB '24), a range index that leverages variable-length buffer pools for efficient caching; LiquidCache (VLDB '25), a novel pushdown-based disaggregated caching system that evaluates filters on cache servers before transmitting data to compute nodes. LiquidEvict (ongoing), a new caching mechanism that scales efficiently to modern storage.

Xiangpeng and Ruby (professional photo)

Blog

Life

GitHub

Two passions that guide my research:

To build practical systems for the public good.
To pursue and propagate human knowledge.

Funding:

2025-2026: InfluxData, Spiral, BauPlan
2024-2025: InfluxData

Publications

LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics.
Xiangpeng Hao, Andrew Lamb, Yibo Wu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau. (VLDB 2025) [code, paper]
Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range Index.
Xiangpeng Hao, Badrish Chandramouli. (VLDB 2024) [more]
Shadow Filesystems: Recovering from Filesystem Runtime Errors via Robust Alternative Execution.
Jing Liu, Xiangpeng Hao, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tej Chajed. (HotStorage '24)
Towards Buffer Management with Tiered Main Memory.
Xiangpeng Hao, Xinjing Zhou, Xiangyao Yu, Michael Stonebraker. (SIGMOD 2024)
Blink-hash: An Adaptive Hybrid Index for In-Memory Time-Series Databases
Hokeun Cha, Xiangpeng Hao, Tianzheng Wang, Huanchen Zhang, Aditya Akella, Xiangyao Yu. Proceedings of the VLDB Endowment (VLDB 2023)
Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs
Jiaxin Lin, Tao Ji, Xiangpeng Hao, Hokeun Cha, Yanfang Le, Xiangyao Yu, Aditya Akella Proceedings of the ACM on Measurement and Analysis of Computing Systems
PiBench Online: Interactive Benchmarking of Persistent Memory Indexes (Demo).
Xiangpeng Hao, Lucas Lersch, Tianzheng Wang, Ismail Oukid. 45th International Conference on Very Large Data Bases (VLDB 2020)
DASH: Dynamic and Scalable Hashing on Persistent Memory.
Baotong Lu, Xiangpeng Hao, Tianzheng Wang, Eric Lo. 45th International Conference on Very Large Data Bases (VLDB 2020)
Evaluating Persistent Memory based Range Indexes.
Lucas Lersch, Xiangpeng Hao, Ismail Oukid, Tianzheng Wang, Thomas Willhalm. 45th International Conference on Very Large Data Bases (VLDB 2020)
Evaluating Colour Constancy on the new MIST dataset of Multi-Illuminant Scenes.
Xiangpeng Hao, Brian Funt, Hanxiao Jiang. 27th Color Image Conference
A Multi-illuminant Synthetic Image Test Set.
Xiangpeng Hao, Brian Funt. Color Research and Application

I build systems

I contribute to open-source:

I co-created and maintain LiquidCache, a distributed pushdown caching for DataFusion.
I build Parquet Viewer, an online tool to explore Parquet schema and data.
I actively contribute to Apache DataFusion, Arrow, and Parquet.

Building principles

People-centric. I build systems for people to {use | build-upon | contribute}, not just for academic records.
Reliability. I code in Rust, fuzz test all the core components, run systematic concurrency tests on all multi-threaded code.
Performance, from keyboard to screen.

It’s the disease of thinking that a really great idea is 90% of the work. And if you just tell all these other people “here’s this great idea,” then of course they can go off and make it happen. And the problem with that is that there’s just a tremendous amount of craftsmanship in between a great idea and a great product. And as you evolve that great idea, it changes and grows. It never comes out like it starts because you learn a lot more as you get into the subtleties of it. -- Steve Jobs

Students:

2024-09 - now: Yibo Wu
2025-01 - now: Nikhil Nayak
2025-01 - now: JP Guthi
2025-03 - now: Proteet Paul
2025-05 - now: Hrishikesh Shinde
2025-06 - now: Saeshu Karthika Murugan Indumathi
2025-09 - now: Zixuan Peng

Advisors and mentors:

Tianzheng Wang: my undergraduate advisor. He is a rock-star database researcher.
Xiangyao Yu: my PhD advisor for the first two and a half years.
Yixin Luo: my intern mentor @Google. We did great work on database auto-tuning.
Badrish Chandramouli : my intern mentor @MSR. He is a great researcher and mentor.
Andrew Lamb: my intern mentor @InfluxData. His passion and professionalism in DataFusion development have reshaped my research to connect more closely to real-world applications.

Talks:

LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics (research talk)
What is LiquidCache? (industry talk)
Introduction to efficient data systems (lecture)

2025-11: DB systems seminar at HPI and TUDa (LiquidCache)
2025-11: Bauplan Labs (LiquidCache)
2025-10: UIUC Systems Research Mini Workshop (LiquidCache)
2025-10: UIUC CS423(intro to data system)
2025-08: Bengaluru Systems (LiquidCache)
2025-08: DC Systems (LiquidCache)
2025-07: University of Washington system lab (LiquidCache)
2025-05: UW-Madison (LiquidCache)
2025-01: CMU Cylab (LiquidCache)
2025-01: CMU DB group (LiquidCache)
2024-12: Chicago DataFusion meetup (LiquidCache)
2024-12: InfluxData (LiquidCache)
2024-06: Cornell (Bf-Tree)