This presentation will include an introduction to Apache Spark, a general purpose engine for handling a range of batch, interactive, and streaming applications with large-scale data. Spark is built for mixed workloads, so we'll review how it operates and how to think about building Spark apps — particularly how those aspects fit well with Mesos. We will also show how to build and run Apache Spark on Apache Mesos. A demo based on using the free-tier service atop AWS https://elastic.mesosphere.io/ from Mesosphere will cover build, config, packaging, deployment, and then run sample apps using the Scala REPL for Spark. Based on these Spark jobs running, we will explore the Mesos and Spark consoles together, drilling down into details about system performance and troubleshooting.
This is intended primarily for a developer audience, but not with a lot of expertise required. Some familiarity with running Linux shell commands is needed. The expectation is that audience members will learn how to run Spark atop Mesos, along with some basic introduction to building Spark apps in general.
Many people using Spark in the field have heard more about YARN, but do not realize that Spark on Mesos is much simpler to get started running. Also, there are performance benefits, and a closer match for the mixed workloads intention of Mesos. We will show how the two blend together well atop Linux for a full-stack solution.
Survey this Session