Welcome back to Spark Tutorials at Learning Journal. It's time to take a pause and plan for the next set of Spark videos.
I started my discussion on Spark with an Introduction. Then I talked about a couple of options to
help you set up a learning environment. Then I jumped into discussing Apache Spark architecture.
A few videos earlier, I started my discussion on data frames. By now, you have learned three main
things about data frames.
Let me quickly recap.
- How to load data from CSV and how to infer or specify a schema?
- How to implement your business logic using transformations, UDFs, and SQL?
- How Spark data frame goes through logical, physical, and execution planning?
In fact, that is all that you need.
- Load data.
- Transform Data
- Tune and Optimize.
That's it. And so far, we have been able to touch base on all those three things. Now all that we need is to elaborate on
these three things and expand our knowledge and skills in each of these areas. Here is my plan for
next set of Spark videos.
We will start with the data load and save process. In my next two videos, I will formalize the
data load and save process. I will create a set of examples and cover Spark's core set of data sources
like
CSV, JSON, Parquet, ORC, and JDBC. The list of third-party Spark data sources is long. We
can't cover all. However, I will cover some third-party and community data sources also to give you
an idea of using community connectors.
Then we will move on to the transformations. Since SQL is the easiest method to achieve complex
transformations, I will formally cover Spark SQL in a couple of videos.
Then I will publish another video to talk about data sets and How they differ from data frames.
The next thing that you need to know is a little complex and confusing part. I have seen beginners
struggling to model transformations using APIs. In a set of 2-3 videos, I will try to cover the syntactical
part of the data frame and dataset APIs.
And finally, we will close the batch processing part of apache spark with a couple of videos
on Spark RDD.
Great! So, get ready for the next set of 10 spark videos.
I will keep Spark streaming in a separate playlist.
Do you think I missed something critical for Spark Batch Processing scope? Write a comment below,
and I will collate your comments and plan them for the next set of Spark videos.
Thank you for watching learning journal. Keep learning and keep growing.