[notebook] MLlib: machine learning on PySpark

Today I gave a tutorial on MLlib in PySpark. I post the notebook here for whoever could be interested =)

MLlib is a package of Spark (available also in PySpark).

MLlib is just a package of Spark, therefore, no need for extra intallation (once you have your Spark up and running). There are different (sub-)packages available in MLlib that can be useful for machine learning on big data.

In this lab we will see something from Statistics, Regression, Classification, and Clustering. But the documentation often comes with example, so I enourage you to take a look: MLlib on PySpark

Dataset

In this lab, we will use data about the 2016 US Presidential elections. The data is available on Kaggle: here

Continue reading

“We are bad judges of friendship”

Our last findings are getting some good press.

“Only half of your friends would consider you their friend, a new study suggests. People have a very poor perception of friendship ties, and this limits their ability to influence their ‘friends,’ according to the research.”

Source: We are bad judges of friendship, new study shows: Researchers find inability to determine who our real friends are limits our powers of persuasion

For the scientific article:  Are You Your Friends’ Friend? Poor Perception of Friendship Ties Limits the Ability to Promote Behavioral Change [PlosONE].

Workshop on papers writing

Today I participated to an Author Workshop by Jaap van Harten (Elsevier) titled “How to write great papers.”

The main take-out of the workshop was that many authors make so many basic obvious mistakes in the writing of their paper, that if you manage to avoid them you are already a step ahead of many others.

In the following, I assume you have the content of your paper ready. You already did your experiments, collected your data, and made sense of the results. But it does not hurt to read even if you are not there yet: you will get there sooner or later 🙂

Continue reading