AN ADVANCED BOTTOM UP GENERALIZATION APPROACH FOR BIG DATA ON CLOUD
##plugins.themes.bootstrap3.article.main##
Abstract
At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage and process such large-scale data within a tolerable elapsed time. In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation power provided by public cloud services. As a result is challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of
scalability. In this paper we propose a scalable Advanced Bottom up ge neralization approach for data anonymization based on Map Reduce on cloud. To make full use of the parallel capability of Map Reduce on cloud, specializations required in an anonymization process. Original datasets are split up into a group of smaller datasets, and these datasets are anonymized in parallel, producing intermediate results. Then, the intermediate results are merged into one, and further
anonymized to achieve consistent k-anonymous data sets. A group of MapReduce jobs are deliberately designed and coordinated to perform specializations on data sets collaboratively.