Xiangpeng Hao

(he/him, pronunciations: Shyang-pung How)

📢📢📢 I'm on the tenure track faculty job market this year, seeking positions in computer science and related areas.


I'm a final year PhD student at the University of Wisconsin-Madison studying computer science with a focus on database/storage systems.

I'm advised by Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau.

I build efficient, cost-effective storage systems for the cloud. My research treats practical impact as a core goal: not only proposing ideas that advance academic knowledge, but also delivering them through careful engineering that pays attention to every detail. I have built: caching structures (SIGMOD '24) for multi-tier memory systems, BfTree (VLDB '24), a range index that leverages variable-length buffer pools for efficient caching; LiquidCache (VLDB '26), a novel pushdown-based disaggregated caching system that evaluates filters on cache servers before transmitting data to compute nodes.

Xiangpeng and Ruby (his dog)
Xiangpeng and Ruby (professional photo)
Two passions that guide my research:
  1. To build practical systems for the public good.
  2. To pursue and propagate human knowledge.
Publications
  1. LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics.
    Xiangpeng Hao, Andrew Lamb, Yibo Wu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau. (VLDB 2026) [code, paper]
  2. Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range Index.
    Xiangpeng Hao, Badrish Chandramouli. (VLDB 2024) [more]
  3. Shadow Filesystems: Recovering from Filesystem Runtime Errors via Robust Alternative Execution.
    Jing Liu, Xiangpeng Hao, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tej Chajed. (HotStorage '24)
  4. Towards Buffer Management with Tiered Main Memory.
    Xiangpeng Hao, Xinjing Zhou, Xiangyao Yu, Michael Stonebraker. (SIGMOD 2024)
  5. Blink-hash: An Adaptive Hybrid Index for In-Memory Time-Series Databases
    Hokeun Cha, Xiangpeng Hao, Tianzheng Wang, Huanchen Zhang, Aditya Akella, Xiangyao Yu. Proceedings of the VLDB Endowment (VLDB 2023)
  6. Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs
    Jiaxin Lin, Tao Ji, Xiangpeng Hao, Hokeun Cha, Yanfang Le, Xiangyao Yu, Aditya Akella Proceedings of the ACM on Measurement and Analysis of Computing Systems
  7. PiBench Online: Interactive Benchmarking of Persistent Memory Indexes (Demo).
    Xiangpeng Hao, Lucas Lersch, Tianzheng Wang, Ismail Oukid. 45th International Conference on Very Large Data Bases (VLDB 2020)
  8. DASH: Dynamic and Scalable Hashing on Persistent Memory.
    Baotong Lu, Xiangpeng Hao, Tianzheng Wang, Eric Lo. 45th International Conference on Very Large Data Bases (VLDB 2020)
  9. Evaluating Persistent Memory based Range Indexes.
    Lucas Lersch, Xiangpeng Hao, Ismail Oukid, Tianzheng Wang, Thomas Willhalm. 45th International Conference on Very Large Data Bases (VLDB 2020)
  10. Evaluating Colour Constancy on the new MIST dataset of Multi-Illuminant Scenes.
    Xiangpeng Hao, Brian Funt, Hanxiao Jiang. 27th Color Image Conference
  11. A Multi-illuminant Synthetic Image Test Set.
    Xiangpeng Hao, Brian Funt. Color Research and Application
I build systems
I contribute to open-source:
  1. I co-created and maintain LiquidCache, a distributed pushdown caching for DataFusion.
  2. I build Parquet Viewer, an online tool to explore Parquet schema and data.
  3. I actively contribute to Apache DataFusion, Arrow, and Parquet.
Building principles
  1. People-centric. I build systems for people to {use | build-upon | contribute}, not just for academic records.
  2. Reliability. I code in Rust, fuzz test all the core components, run systematic concurrency tests on all multi-threaded code.
  3. Performance, from keyboard to screen.
It’s the disease of thinking that a really great idea is 90% of the work. And if you just tell all these other people “here’s this great idea,” then of course they can go off and make it happen. And the problem with that is that there’s just a tremendous amount of craftsmanship in between a great idea and a great product. And as you evolve that great idea, it changes and grows. It never comes out like it starts because you learn a lot more as you get into the subtleties of it. -- Steve Jobs
Students:
Talks:
    What is LiquidCache?
  1. 2025-08: Bengaluru Systems (LiquidCache)
  2. 2025-08: DC Systems (LiquidCache)
  3. 2025-07: University of Washington system lab (LiquidCache)
  4. 2025-05: UW-Madison (LiquidCache)
  5. 2025-01: CMU Cylab (LiquidCache)
  6. 2025-01: CMU DB group (LiquidCache)
  7. 2024-12: Chicago DataFusion meetup (LiquidCache)
  8. 2024-12: InfluxData (LiquidCache)
  9. 2024-06: Cornell (Bf-Tree)
Advisors and mentors:
Tianzheng Wang: my undergraduate advisor. He is a rock-star database researcher.
Xiangyao Yu: my PhD advisor for the first two and a half years.
Yixin Luo: my intern mentor @Google. We did great work on database auto-tuning.
Badrish Chandramouli : my intern mentor @MSR. He is a great researcher and mentor.
Andrew Lamb: my intern mentor @InfluxData. His passion and professionalism in DataFusion development have reshaped my research to connect more closely to real-world applications.