With the increasing advancements in technology and development, the internet has become widely used over the last years and has become an everyday part of our lives and of those around us. With the increasing trends of social media and online social networking, billions of people use these services at the same time around the globe. With such a huge number of people, these social networks bring in a lot of traffic and a flood of data that comes from many resources.
Whether it be social media websites, finance, telecommunication, media or any of the many online businesses, all the big data clusters need to be processed into simple programming models or in simple words, all the big data needs to be distributed and stored. Hadoop is an open source framework, which allows all the big data to be distributed and stored. Hadoop is an efficient, reliable and easy to use and is an open source i.e it can be freely used in simple terms.
One of Hadoop architecture’s main components is the HDFS architecture. The Hadoop distributed file system can hold a very large amount of data and to store this huge amount of data, the storage is done across multiple machines. It is also highly suitable for storage and processing of data. Another component of the Hadoop Architecture is the MapReduce Overview. It is a framework which is used to process a large amount of data in a distributed manner across multiple machines.
MapReduce can distribute an input of data into pairs, shuffle the pairs and also reduce them. YARN, which is another very important component of the Hadoop Architecture is responsible for data processing resources such as the CPU, ram delivered, memory, etc. needed to successfully run any application. Important elements of YARN (Yet another Resource Negotiator) are the resource manager and the node manager, working as the master and the slave.
The resource manager works as the master and knows where all the slaves are located and how many sources they have. The resource manager helps to assign resources. The node manager being the slave, when it starts to work it makes an announcement to the resource manager or sends a signal. These were the important and the main components that the Hadoop architecture comprises of and are unique in their own way with their own set of characteristics.
Some of the characteristics and features of Hadoop include it being fault tolerant, which means it can easily look out for any failure and heal itself. It is also very cost effective and not expensive. Hadoop can store anything and has a huge storing capacity as well.
By focusing on the characteristics and the functions that can be performed by Hadoop, it can be concluded that Hadoop indeed is very useful to store large amounts of data and can greatly help lessen the burden that any party has. With its huge storing capacity and its availability for everyone, Hadoop already is being used by the many famous social networking sites. Famous search engines like yahoo and amazon also use Hadoop.
Facebook, which is the largest social network uses Hadoop for log processing and also for storing data. It is used in almost every domain. Some of the famous renowned social websites that use Hadoop include Amazon, Google, IBM, New York Times, Twitter, Linked In etc. The list goes on. There are also other several projects with Hadoop including ZooKeeper, Pig, Hive etc. These can further help, store and filter the data. The final statement can be made about Hadoop that it has definitely provided companies that have huge amounts of data with effective solutions. Since it is an open source, it is widely used by companies.