Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline
. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.
Scatter Plot
housing.plot(kind="scatter", x="longitude", y="latitude")
You can set the parameter alpha
to study the density of points:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap
) called jet
. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s
), and the color represents the price (option c
).
1 %matplotlib inline 2 import matplotlib.pyplot as plt 3 housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4, 4 s=housing["population"]/100, label="population", figsize=(10,7), 5 c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True, 6 sharex=False) 7 plt.legend() 8 save_fig("housing_prices_scatterplot")
Note that the argument sharex=False
fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).
Scatter Matrix
1 from pandas.plotting import scatter_matrix 2 3 attributes = ["median_house_value", "median_income", "total_rooms", 4 "housing_median_age"] 5 scatter_matrix(housing[attributes], figsize=(12, 8)) 6 save_fig("scatter_matrix_plot")
Histogram
Histogram is a useful method to study the distribution of numeric attributes.
1 %matplotlib inline 2 import matplotlib.pyplot as plt 3 housing.hist(bins=50, figsize=(20,15)) 4 save_fig("attribute_histogram_plots") 5 plt.show()
For single attribute, you can use the following statement:
housing["median_income"].hist()
Correlation Plot
We can calculate the correlation coefficients between each pair of attributes using corr()
method and look at the value by sort_values()
:
corr_matrix = housing.corr() corr_matrix["median_house_value"].sort_values(ascending=False)
Also, we can use scatter_matrix
function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.
from pandas.tools.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"] scatter_matrix(housing[attributes], figsize=(12, 8))