TL;DR
You donate (charitable giving) $50k to support my PhD through graduation (ETA: June 2026).
- Together, we build LiquidCache, a disaggregated caching system for disk-less cloud analytics.
- I acknowledge your support through project repo, blog posts, research papers, academic/industry talks, and my thesis report/defense.
What I did when InfluxData funded me (2024.09-2025.05)
Coding
- Kickstarted the ambitious LiquidCache
project. With 20k lines of Rust (and increasing) building on top of DataFusion, it runs all
TPC-H and ClickBench queries. We see up to 10x latency and cost reduction for scan intensive
workloads compared to existing caching systems. We expect it to be the
infrastructure of the next generation cloud analytics. We acknowledge
InfluxData's support in the repo README.
- For shorter term impacts, I build the Parquet Viewer, an online tool (also VSCode/Cursor extension) to explore Parquet data as well as its
schema and file layout. It compiles DataFusion/Arrow/Parquet to WASM for efficient edge
computing + LLM for natural language to SQL. We acknowledge InfluxData in both the open
source repo and in the tool itself.
- Contribute directly to Apache DataFusion, Arrow and Parquet. List of merged PRs
(2024.09-2025.05):
- apache/datafusion #12405
- apache/datafusion #12575
- apache/datafusion #14116
- apache/datafusion #14139
- apache/datafusion #15102
- apache/datafusion #15449
- apache/datafusion #15460
- apache/datafusion #15595
- apache/datafusion #15827
- apache/arrow-rs #6623
- apache/arrow-rs #6930
- apache/arrow-rs #6931
- apache/arrow-rs #6945
- apache/arrow-rs #6961
- apache/arrow-rs #7229
Writing
- I wrote five blog posts related to the DataFusion ecosystem, all of them acknowledge
InfluxData's support.
- Parquet pruning in DataFusion
(cross posted to
DataFusion blog)
- Efficient Filter Pushdown in Parquet
(cross posted to
DataFusion blog)
- I wrote a VLDB paper about LiquidCache that, once published (ETA: 2025.08), will acknowledge InfluxData's support.
Presentations
All of the presentations acknowledge InfluxData's support.- I presented LiquidCache at the DataFusion Chicago Meetup.
- I presented LiquidCache to the CMU database group and advertised to Andy Pavlo how InfluxData funded me.
- I presented LiquidCache to InfluxData's engineering team.
- I presented LiquidCache at Vyas Sekar's networking group at CMU.
- I presented LiquidCache to the CS department at UW-Madison as part of my PhD preliminary
exam; posters are shown below.
What's next (now - 2026.06)?
Coding
- (new) Directly connect to the engineering team to help integrate/try LiquidCache in your infrastructure.
- Building LiquidEvict (as part of LiquidCache), a cache management system designed from ground up for Arrow-native memory management. Aims to achieve efficient memory and disk usage. Full proposal is here.
- Continue contributing to Apache DataFusion, Arrow, and Parquet, making them more flexible, performant, and easier to use.
Writing
- (new) I'm happy to directly write blog posts to your blog.
- Write a research paper about LiquidCache and LiquidEvict, with acknowledgements to your support.
- Write multiple blog posts about LiquidCache and LiquidEvict.
Presentations
- (new) I expect to present LiquidCache much more this year as we have more to discuss.
- Present LiquidCache at my PhD defense.
Let's connect
Email: xiangpeng.hao@wisc.edu