Current projects

linuxfun

FAST: Fair Assignment for Storage Tenants
Build a block-level cloud storage system, with a more predictable performance. You could see more in our HotCloud12 paper.

TR6 - Performance Interference in Ceph and FAST
git-public.flux.utah.edu:/flux/git/users/xinglin/writting/TR6.git
rbd driver: git-public.flux.utah.edu:/flux/git/users/xinglin/projects/fast/linux-3.2.16.git
Ceph: git-public.flux.utah.edu:/flux/git/users/xinglin/projects/fast/ceph-0.56.3.git

Improve the performance of deduplication storage system
Deduplication storage systems need massive and parallel computations for hash, index lookup, block existence check, compression, and decompression operations. We propose to use GPU to accelerate these computations, which reduces overheads from these computations for read and write operations. In deduplication storage systems, files are stored in disks in nonsequential orders. However, disks are only good at sequential accesses. As a result, disks in deduplication storage systems have a significant performance degradation and increased load. For a set of Linux images we store in Venti, we can observe a significant drop(82.04%) in the read performance: the read performance drops from 34.43 MB/s to be only 6.19MB/s. We are investigating the reasons for such a huge drop and try to optimize it.

Past projects

June. 2011 ~ Jan. 2012
High-performance Disk Imaging With Deduplicated Storage
In clouds and network testbeds, a disk image deployment system is needed to quickly distribute and install virtual machine images or operating system images at host devices. Previous work has shown that for these images, deduplication can save a significant amount of disk space. However, the read and write performance in deduplication storage systems is poor relative to traditional filesystem storage. In this work, we demonstrate that we can use deduplication storage systems as the backend of a high-performance image deployment system with only a negligible drop in performance by carefully pipelining to produce a balanced system.
[short paper][poster]

Jan. 2011 ~ June. 2011
Refining the Utility Metric for Utility-Based Cache Partitioning
Miss rate is widely used to determine cache partitioning for multi-core systems. However, a well recognized fact in the community is that MPKI can lead to sub-optimal cache partitioning. This project is to quantify the extent of sub-optimal for MPKI based cache partitioning and proposed a simple scheme for CPI predictions.
[paper] [source code]

Dec. 2010
Linux physical memory deduplication
The main goal is to deduplicate identical pages in physical memory. We have implemented a kernel module to calculate a hash for every single physical page for both x86 and x86_64 Linux. Another kernel module is also implemented to export the content of a single specified physical page. After we found that Linux has already implemented this function in /mm/ksm.c, we stopped this project.
[source code]

Resources:
storage-related I/O traces:
Traces from UCSC SNIA traces
open source deduplication storage systems:
Venti ZFS opendedup
Workloads:
DVDStore, Microsoft Exchange Server(Loadgen), TPC-H IO profiles VDI profiles