Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
I wrote this launcher script a while back expressly for that purpose. I wanted to be able to interact with the pyspark shell from within the bpython(1) code-completion interpreter and WING IDE, or any IDE for that matter because they have code completion as well as provide a complete development experience. Learning Spark core by just typing 'pyspark' isn't good enough. So I wrote this. This was written in a Cloudera CDH5 environment, but with a little tweaking you can get this to work in whatever your environment is (even manually installed ones).
How to use:
NOTE:You can place all of the following in your .profile (or equivalent).(1) linux$ export MASTER='yarn-client | local[NN] | spark://host:port'(2) linux$ export SPARK_HOME=/usr/lib/spark # Your's will vary.(3) linux$ export JAVA_HOME=/usr/java/latest # Your's will vary.(4) linux$ export NAMENODE='vps00'# Your's will vary.(5) linux$ export PYSTART=${PYTHONSTARTUP}# See in-line commends about the reason for the need for this alias to PYTHONSTARTUP.(6) linux$ export HADOOP_CONF_DIR=/etc/hadoop/conf # Your's will vary. This one may not be necessary to set. Try and see.(7) linux$ export HADOOP_HOME=/usr/lib/hadoop # Your's will vary. This one may not be necessary to set. Try and see.(8) bpython -i /path/to/script/below # The moment of truth. Note that this is 'bpython' (not just plain 'python', which would not give the code completion you desire).>>> sc
<pyspark.context.SparkContext object at 0x2798110>>>>
Now for use with an IDE, you simply determine how to specify the equivalent of a PYTHONSTARTUP script for that IDE, and set that to '/path/to/script/below'. For example, as I described in the in-line comments below, for WING IDE you simply set the key/value pair 'PYTHONSTARTUP=/path/to/script/below' inside the project's properties section.
See in-line comments for more information.
#! /usr/bin/env python# -*- coding: utf-8 -*-## ===========================================================================# Author: Noel Milton Vega (PRISMALYTICS, LLC.)# ===========================================================================# Start-up script for 'python(1)', 'bpython(1)', and Python IDE iterpreters# when you want a 'client-mode' SPARK Shell (i.e. interactive SPARK shell)# environment either LOCALLY, on a SPARK Standalone Cluster, or on SPARK# YARN cluster. The code-sense/intelligence of bpython(1) and IDEs, in# particular will aid in learning the SPARK core API.# # This script basically (1) first sets up an environment to launch a SPARK# Shell, then (2) launches the SPARK Shell using the 'shell.py' python script# provided in the distribution's SPARK_HOME; and finally (3) imports our# favorite Python modules (for convenience; e.g. numpy, scipy; etc.).## IMPORTANT:# DON'T RUN THIS SCRIPT DIRECTLY. It is meant to be read in by interpreters# (similar, in that respect, to a PYTHONSTARTUP script).## Thus, there are two ways to use this file:# # We can't refer to PYTHONSTARTUP inside this file b/c that causes a recursion loop# # when calling this from within IDEs. So in step (0) we alias PYTHONSTARTUP to# # PYSTARTUP at the O/S level, and use that alias here (since no conflict with that).# (0): user$ export PYSTARTUP=${PYTHONSTARTUP} # We can't use PYTHONSTARTUP in this file# (1): user$ export MASTER='yarn-client | local[NN] | spark://host:port'# user$ bpython|python -i /path/to/this/file## (2): From within your favorite IDE, specify it as your python startup# script. For example, from within a WINGIDE project, set the following# variables within a WING Project: 'Project -> Project Properties':# 'PYTHONSTARTUP=/path/to/this/very/file'# 'MASTER=yarn-client | local[NN] | spark://host:port'# ===========================================================================import sys, os, glob, subprocess, random
namenode = os.getenv('NAMENODE')
SPARK_HOME = os.getenv('SPARK_HOME')# ===========================================================================# =================================================================================# This functions emulates the action of "source" or '.' that exists in bash(1),# and can be used to set PYTHON environment variables (in Pythons globals dict).# =================================================================================def source(script, update=True):
proc = subprocess.Popen(". %s; env -0"% script, stdout=subprocess.PIPE, shell=True)
output = proc.communicate()[0]
env =
No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that
you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice
sessions to improve your knowledge and scores.
manpreet
Best Answer
2 years ago
I could run PySpark from the terminal line and everything works fine.
Welcome to
Using Python version 2.7.6 (default, May 27 2014 14:50:58)
However when I try to this on a Python IDE
How do I import it like other Python libraries such numpy, scikit etc.?
Working in the terminal works fine, I just wanted to work in the IDE.