Home     Publications     Links     Conferences     History            20 20 20

Data Privacy

Privacy Preserving Data Mining (PPDM)

Statistical Disclosure Control (SDC)

http://www.ppdm.cat


Microaggregation: Microaggregation is a perturbative data protection method. Given the original data file, it consists of constructing small clusters from the data (each cluster should have between k and 2k elements), and then replacing each original data by the centroid of the corresponding cluster. k is a parameter of the method. The larger the k, the larger the information loss and the lesser the disclosure risk.
Univariate vs. Multivariate: When the original data file consists of several variables, different approaches exist for microaggregation. Univariate microaggregation applies microaggregation to each variable independently. In contrast, multivariate microaggregation constructs clusters considering all (or a subset) of the variables at a time. Polynomial-time optimal algorithms exist for univariate microaggregation, but not for multivariate microaggregation, which is NP-hard. Due to this, heuristic methods have been developed in this case.
Microaggregation and k-anonymity: Microaggregation ensures k-anonymity only when multivariate microaggregation is applied processing all the variables of the data file at the same time. Otherwise, this is not ensured. In fact, it is often the case that k-anonymity is not ensured. This is so because the set of variables is often partitioned, and microaggregation is applied independently to each partition element. This is done to achieve a lower information loss (higher data utility) than when applying it to the whole set. In this case, a trade-off has to be found between the information loss and the disclosure risk.
Our publications:
  • Domingo-Ferrer, J., Torra, V., (2001) A quantitative comparison of disclosure control methods for microdata, Confidentiality, disclosure, and data access : Theory and practical applications for statistical agencies. Doyle, P.; Lane, J.I.; Theeuwes, J.J.M.; Zayatz, L.V. eds., Elsevier, pp. 111-133. PDF@URV
    • This paper compares the performance of several data protection methods with respect to a trade off of disclosure risk and information loss (data utility). It includes several variations of microaggregation (all numerical microaggregation). Some of the variations were shown to have a good performance.
  • Torra, V. (2004) Microaggregation for categorical variables: a median based approach, Privacy in Statistical Databases, 2004. (Lecture Notes in Computer Science 3050 162-174) PDF @ Springer Link
    • In this paper we extended microaggregation, previously only defined for numerical data, for its application to categorical data (both ordinal and nominal).
  • Domingo-Ferrer, J., Torra, V. (2005) Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation, Data Mining and Knowledge Discovery, 11:2 195-212. PDF @ Springer Link
    • In this paper we combine numerical and categorical microaggregation. The microaggregation described in this paper follows MDAV (an heuristic approach for microaggregation). We also discuss the advantages of using microaggregation for achieving k-anonymity.
  • Nin, J., Herranz, J., Torra, V. (2008) How to Group Attributes in Multivariate Microaggregation. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS) 16:1 121-138 PDF @ World Scientific
    • Microaggregation does not usually take all attributes/variables at once, but attributes are partitioned and then each of them is microaggregated independently. In this paper we discuss about which is the optimal way of selecting the attributes for an optimal performance with respect to a trade-off between disclosure risk and information loss (data utility).
  • Torra, V., Miyamoto, S. (2004) Evaluating fuzzy clustering algorithms for microdata protection, Privacy in Statistical Databases, 2004. (Lecture Notes in Computer Science 3050 175-186) PDF @ Springer Link
    • In this paper we present an heuristic approach for microaggregation to blurren the microclusters so that the risk of disclosure is decreased. The heuristic approach is based on fuzzy clustering.
  • Nin, J., Torra, V. (2009) Analysis of the Univariate Microaggregation Disclosure Risk. New Generation Computing, Springer. PDF @ Springer
    • In this paper we attack a file protected with univariate microaggregation showing that intruders implementing an ad hoc attack (dedicated software for attacking the data base) can reidentify much more records than using a standard / generic approach. In addition, we show that in this case there is no uncertainty on whether a record has been reidentified or not. Note that some approaches to reidentification only give a probability of reidentification. This type of analysis is needed in order to apply data privacy with transparency. Information about transparency here.
  • Nin, N., Herranz, J., Torra, V. (2008) On the Disclosure Risk of Multivariate Microaggregation. Data and Knowledge Engineering (DKE), Elsevier, 67:3 399-412. Paper @ ScienceDirect
    • In this paper we attack a file protected using multivariate microaggregation. As in the previous paper, we show that more records can be reidentified. Again, this type of analysis is needed to apply masking methods with transparency (i.e., informing the user how data has been protected).
  • Torra, V. (2008) Constrained Microaggregation: Adding Constraints for Data Editing, Transactions on Data Privacy 1:2 (2008) 86 - 104 Paper @ TDP (open access)
    • The paper discusses microaggregation when the data file to be protected satisfies some constraints (between the variables) and it is expected the protected file to also satisfy such constraints.
  • Nin, J., Torra, V. (2006) Extending microaggregation procedures for time series protection, Lecture Notes in Artificial Intelligence, 4259 899-908. (5th Int. Conf. on Rough Sets and Current Trends in Computing, RSCTC RSCTC 2006). http://dx.doi.org/10.1007/11908029_93
    • This paper extends microaggregation so that it can be applied to time series. We have also studied measures of risk for data protection methods for time series and studied the performance of our microaggregation approach for this type of data. See the next paper.
  • Nin, J., Torra, V. (2009) Towards The Evaluation of Time Series Protection Methods. Information Sciences, 179:11 1663-1677. http://dx.doi.org/10.1016/j.ins.2009.01.024
    • This paper presents a whole framework for the analysis of data protection methods for time series. It includes an analysis of the risk and of the information loss. We use this framework to analyse microaggregation for time series.

 

Cite this site as:
V. Torra, Data privacy, Springer, 2017. Associated website: http://www.ppdm.cat/dp/

Vicenç Torra, Last modified: 15 : 34 December 11 2014.