Posted by Admin: System Admin
Federated machine learning which enables resource constrained node devices (e.g., mobile phones and IoT devices) to learn a shared model while keeping the training data local, can provide privacy, security and economic benefits by designing an effective communication protocol. However, the communication protocol amongst different nodes could be exploited by attackers to launch data poisoning attacks, which has been demonstrated as a big threat to most machine learning models. In this paper, we attempt to explore the vulnerability of federated machine learning. More specifically, we focus on attacking a federated multi-task learning framework, which is a federated learning framework via adopting a general multi-task learning framework to handle statistical challenges. We formulate the problem of computing optimal poisoning attacks on federated multi-task learning as a bilevel program that is adaptive to arbitrary choice of target nodes and source attacking nodes. Then we propose a novel systems-aware optimization method, ATTack on Federated Learning (AT2FL), which is efficiency to derive the implicit gradients for poisoned data, and further compute optimal attack strategies in the federated machine learning. Our work is an earlier study that considers issues of data poisoning attack for federated learning. To the end, experimental results on real-world datasets show that federated multi-task learning model is very sensitive to poisoning attacks, when the attackers either directly poison the target nodes or indirectly poison the related nodes by exploiting the communication protocol. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, different type of algorithms is trained to make classifications or predictions, and to uncover key insights in this project. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. Machine learning algorithms build a model based on this project data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of datasets, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.
For the data poisoning attacks, it has become an urgent research field in the adversarial machine learning, in which the target is against machine learning algorithms [4], [10]. The earlier attempt that investigates the poisoning attacks on support vector machines (SVM) [5], where the adopted attack uses a gradient ascent strategy in which the gradient is obtained based on properties of the SVM’s optimal solution. Furthermore, poisoning attack is investigated on many machine learning models, including autoregressive models [1], matrix factorization based collaborative filtering [13] and neural networks for graph data [22]. In addition to single task learning models, perhaps [21] is the most relevant work to ours in the context of data poisoning attacks, which provides the first study on one much challenging problem, i.e., the vulnerability of multi-task learning. However, the motivations for [21] and our work are significantly different as follows: _ The data sample in [21] are put together, which is different from the scenario in federated machine learning, i.e., machine learning models are built based on datasets that are distributed across multiple nodes/devices while preventing data leakage. _ The proposed algorithm in [21] is based on optimization method of current multi-task learning methods, which is not suited to handle the systems challenges in federated learning, including high communication cost, etc. Handling these challenges in the setting of data poisoning attacks is a key component of this work. For the federated machine learning, its main purpose is to update classifier fast for modern massive datasets, and the training data it can handle are with the following properties [12]: 1) Non-IID: data on each node/device may be drawn from a different distribution; 2) Unbalanced: the number of training samples for different nodes/devices may vary by orders of magnitude. Based on the distribution characteristics of the data, federated learning [18] can be categorized into: 1) horizontal (sample-based) federated learning, i.e., datasets share the same feature space but different in samples. The representative work is a multi-task style federated learning system [15], which is proposed to allow multiple nodes to complete separate tasks while preserving security and sharing knowledge; 2) vertical (feature-based) federated learning, i.e., two datasets share the same sample ID space but differ in feature space. Several privacy-preserving machine learning methods have been presented for vertically partitioned data, e.g., secure linear regression [8], gradient descent methods [9]; 3) federated transfer learning, i.e., two datasets differ not only in samples but also in feature space. Disadvantages 1) .The system doesn’t have data poisoning attack model on federated machine learning. 2). There is no technique called Data Integrity Check on data poisoning attacks.
o The system proposes a bilevel optimization framework to compute optimal poisoning attacks on federated machine learning. To our best knowledge, this is an earlier attempt to explore the vulnerability of federated machine learning from the perspective of data poisoning. o The proposed system derives an effective optimization method, i.e., Attack on Federated Learning (AT2FL), to solve the optimal attack problem, which can address systems challenges associated with federated machine learning. o The proposed system demonstrates the empirical performance of our optimal attack strategy, and our proposed AT2FL algorithm with several real-world datasets. The experiment results indicate that the communication protocol among multiple nodes opens a door for attacker to attack federated machine learning. Advantages ? The system is more effective due to federated machine learning. ? The proposed system handles both direct attack and indirect attack.