Name : Ninghui
Li
Ninghui Li is an Associate Professor of Computer
Science at Purdue University, where he joined in 2003. He received a Bachelor's degree from the
University of Science and Technology of China in 1993 and a Ph.D. in Computer
Science from New York University in 2000. Before joining Purdue, he was a
Research Associate at Stanford University Computer Science Department for 3
years. Prof. Li's research interests
are in computer and information security and privacy. He has published over 80
referred papers in journals and conference proceedings. He has served on the
Program Committees of more than 50 international conferences and workshops,
including serving as the Program Chair of the 2008 ACM Symposium on Access
Control Models and Technologies, and the 2009 IFIP Conference on Trust
Management. He is on the editorial
board of the VLDB Journal. His research is funded by several projects funded by
the US National Science Foundation, by the US Army Research Office, and by
companies including IBM and Google. In
2005, he received an NSF CAREER award.
Publications : (2 maximum)
·
Tiancheng Li and Ninghui Li: “On the Tradeoff between Privacy and Utility
in Data Publishing”. To appear in ACM KDD-09, June 2009.
·
Ninghui Li, Tiancheng Li, and Suresh
Venkatasubramanian. “t-Closeness: Privacy beyond k-Anonymity and l-Diversity”. In ICDE, June 2007.
Title of Project : A Framework for Privacy Preserving Microdata
Publishing
We aim at
developing a framework for privacy preserving microdata publishing that
considers the interactions among four critical aspects: Data properties, Privacy
threats, Publishing methods, and Utility measures. We
consider data with high dimensionality as well as data without a clear
separation between quasi-identifier and sensitive attributes. For privacy,
we consider addressing three threats: (1) presence threats, in which an
adversary learns that an individual's record is in the published data; (2)
identity disclosure threats, in which an individual is linked to a particular
record in the released data; and (3) attribute disclosure threats, in which new
information about some attribute of an individual is revealed. We consider both existing publishing methods
such as generalization and bucketization, and a method we introduce:
slicing. For utility, we consider the
utility measures for a number of data mining tasks on anonymized data.
For privacy,
we focus on developing a privacy notion that formalizes the same intuition as
differential privacy, but is practically achievable for microdata. While differential privacy has been studied
intensively in recent years, all existing results are about publishing
statistical information, rather than publishing microdata. Differential privacy aims at capturing the
following intuition of privacy: “Any disclosure will be, within a small
multiplicative factor, just likely whether or not the individual participates
in the database.” To formalize this, we
need to define the two cases: (1) the individual participates, and (2) the
individual does not participate. In
existing literature, this is modeled by D and D’ such that D\D’={t}. This modeling, however, results in a
requirement too strong for microdata publishing. We need a more suitable formulation.
For
publishing methods, we introduce a new technique called slicing. Slicing
partitions the dataset both vertically and horizontally. Vertical partitioning
is done by grouping attributes into columns based on the correlations among the
attributes. Each column contains a subset of attributes that are highly
correlated. Horizontal partitioning is done by grouping tuples into buckets.
Finally, within each bucket, values in each column are randomly permutated (or
sorted) to break the linking between different columns. Slicing breaks the association cross
columns, but preserves the association within each column. This reduces the
dimensionality of the data and preserves better utility.