Several Python resources

Steve Simon

2019-05-02

I have not had the time to learn Python yet, but it is on my short term list of research goals. I attended a very nice talk about Python and data science and tried to get a list of interesting resources in Python from that talk. Here is my incomplete and imperfect list.

Sphinx is a tool to autodocument your Python code. It is similar in spirit to R Markdown. A tutorial on Sphinx is available in html format on the Code and Chaos blog.

Zeppelin is a notebook for Python code that can also integrate code in Scala, Spark, SQL and many others (but not R, unfortunately). It is similar in spirit to R Studio or Jupyter. Zeppelin is an open source system and documented on the Apache web site in html format.

Pyspark is a Python implementation of Spark, a system for distributed computing. It is also an open source system documented on the Apache web site in html format.

Pytorch is a Python system for deep learning. Deep learning is a subset of machine learning associated with various neural network systems. It is open source and available on a dedicated website in html format.

Runawayhorse001 is the github site for the George Feng, who gave the talk listing all these resources. He is developing some interesting tutorial resources, mostly for Python. He works at DST systems, a company headquartered in Kansas City that provides consulting in the financial and health care markets.

I asked a question at the end of this talk about running Pyspark on a cluster of Raspberry Pis and got the suggestion to look at the Kubernetes website. Kubernetes is a system for maintaining containers and providing a portable cloud solution.

Data Science KC, the group that sponsored this talk, has a meetup page and a slack channel. In order to use the slack channel, you need to register as a free user at the Slack website.