PySpark requires Java version 7 or later and Python version 2.6 or later. The official Spark documentation does mention about supporting Windows. So I had to first figure out if Spark and PySpark would work well on Windows. Often times, many open source projects do not have good Windows support. In case you need a refresher, a quick introduction might be handy. You do not have to be an expert, but you need to know how to start a Command Prompt and run commands such as those that help you move around your computer’s file system. I am also assuming that you are comfortable working with the Command Prompt on Windows. So the screenshots are specific to Windows 10. In this post, I describe how I got started with PySpark on Windows. Spark supports a Python programming API called PySpark that is actively maintained and was enough to convince me to start learning PySpark for working with big data. While I had heard of Apache Hadoop, to use Hadoop for working with big data, I had to write code in Java which I was not really looking forward to as I love to write code in Python. ![]() I decided to teach myself how to work with big data and came across Apache Spark.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |