Beginner Level Projects on Apache Spark
Apache Spark:
Spark, also an open-source framework for performing general data analytics on distributed computing cluster, was originally designed at the University of California, and later donated to the Apache Software Foundation. Spark’s real-time data processing capability provides it a substantial lead over Hadoop’s MapReduce.
Spark is a multi-stage RAM-capable compute framework with libraries for machine learning, interactive queries and graph analytics. It can run on a Hadoop cluster with YARN but also Mesos or in standalone mode. Apples and oranges, really. An interesting point to note here is that Spark is devoid of its own distributed filesystem. So, for distributed storage, it has to either use HDFS or other alternatives, such as MapR File System, Cassandra, OpenStack Swift, Amazon S3, Kudu, etc.
Now that we have caught a glimpse of Hadoop and Spark, it’s time to talk about different types of data processing they perform.
For beginner’s level use cases in Spark , refer the below links:
Comments
Post a Comment