A Hybrid Approach to Real-Time Fault Detection and Recovery in Federated Cloud Systems using Federated Byzantine Fault-Tolerant Cloud Recovery (FBFT-CR)

##plugins.themes.bootstrap3.article.main##

Shamsudeen E

Abstract

The increasing reliance on federated cloud environments for large-scale distributed applications has led to new challenges in ensuring fault tolerance and system availability. Traditional fault tolerance mechanisms often struggle to maintain system integrity in the face of diverse failures, including hardware malfunctions, network issues, and Byzantine faults. To address these challenges, we propose a novel Federated Byzantine Fault-Tolerant Cloud Recovery (FBFT-CR) framework that combines real-time fault detection, advanced recovery mechanisms, and Byzantine fault tolerance. The framework integrates dynamic machine learning-based fault prediction, hybrid recovery techniques such as checkpointing and replication, and the Byzantine Fault Tolerance (BFT) protocol to ensure system reliability in a federated cloud environment. The proposed approach provides a robust solution for ensuring high availability, minimizing downtime, and maintaining system correctness even in the presence of malicious or faulty nodes. Experimental results demonstrate the efficiency of FBFT-CR in mitigating system failures while maintaining system performance and scalability in a federated cloud infrastructure.

##plugins.themes.bootstrap3.article.details##

Section
Articles

References

[1]. Lyu, M. R., Zhang, Y., & Zheng, Z. (2011). BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing. Proceedings of the 4th International Conference on Cloud Computing (CloudCom 2011). IEEE.
[2]. Garraghan, P., Townend, P., & Xu, J. (2012). Real-Time Fault-Tolerance in Federated Cloud Environments. Proceedings of the 15th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops (ISORC 2012). IEEE.
[3]. Obaidat, M. S., Bedi, H., Bhandari, A., Don Bosco, M. S., Maheshwari, A., Dhurandher, S. K., & Woungang, I. (2011). Design and Implementation of a Fault Tolerant Multiple Master Cloud Computing System. Proceedings of the International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing (iThings/CPSCom 2011). IEEE.
[4]. Al-Jaroodi, J., Mohamed, N., & Al Nuaimi, K. (2012). An Efficient Fault-Tolerant Algorithm for Distributed Cloud Services. Proceedings of the Second Symposium on Network Cloud Computing and Applications (NCCA 2012). IEEE.
[5]. Cachin, C., & Liskov, B. (2002). Practical Byzantine Fault Tolerance. Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation (OSDI 2002). USENIX Association.
[6]. Castro, M., & Liskov, B. (2002). Practical Byzantine Fault Tolerance. ACM Transactions on Computer Systems (TOCS), 20(4), 398-461.
[7]. Garg, V., & Soni, M. (2011). Cloud Computing: Fault Tolerance Techniques for the Cloud Computing Environment. International Journal of Computer Applications, 36(9), 33-38.
[8]. Vukolic, M. (2015). The Byzantine Fault Tolerance of the Blockchain. Proceedings of the International Conference on Cloud Computing (CloudCom 2015). IEEE.
[9]. Jiang, W., Zhang, Z., & Chen, H. (2013). Fault-Tolerant and Load Balancing in Cloud Computing. Proceedings of the 9th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2013). IEEE.
[10]. Zhang, C., & Zheng, W. (2013). A Survey on Fault-Tolerant Techniques in Cloud Computing. Journal of Cloud Computing: Advances, Systems and Applications, 2(1), 1-11.
[11]. Mauve, M., & Struif, D. (2005). Fault-Tolerant Distributed Systems: Concepts, Design, and Implementation. Springer-Verlag.
[12]. Almeida, M., & Sousa, S. (2010). Fault Tolerance in Distributed Systems: A Survey of Techniques and Applications. Journal of Computer Science and Technology, 25(2), 215-229.
[13]. Zhang, H., & Liu, B. (2012). Real-Time Fault Detection and Recovery in Cloud Computing. Proceedings of the International Conference on Cloud and Service Computing (CSC 2012). IEEE.
[14]. Zhao, Z., & Chen, L. (2011). Fault Tolerant Mechanisms in Cloud Computing Systems. Journal of Cloud Computing: Theory and Applications, 1(3), 20-26.
[15]. Zheng, Z., Lyu, M. R., & Zhang, Y. (2011). Fault-Tolerant Algorithms in Cloud Computing. Proceedings of the 2011 International Conference on Cloud Computing and Service Computing (CCSC 2011). IEEE.