Scatter Plot

Scatter Plot

Definition

A chart that displays values for two variables as points on a coordinate system

Anatomy

A scatter plot is a two-dimensional graphical representation where each point represents an individual observation (a data point), with its position determined by the corresponding values of the observation in terms of the x-variable and y-variable.

Interpreting a scatter plot

Points can cluster, spread out, or follow a discernible pattern, revealing relationships between variables. A sign of the importance of scatter plots is that they provide the best illustration of terms like “clusters” and “linear relationship”.

When examining a scatter plot, you may look for:

  • Linear relationships, in which points cluster along a line
  • Nonlinear relationships, in which points may cluster along a curve or other nonlinear shape
  • Clusters of points
  • Outliers or unusual data points
  • Density of data point distribution

When and how to use a scatter plot

Strengths

  • Effectively displays relationship between two continuous variables
  • Reveals correlation patterns
  • Shows individual data points
  • Highlights potential trends and clusters
  • Identifies potential outliers

Caveats and limitations

  • Scatter plots can only represent continuous variables as x and y coordinates.
  • Overlapping data may cause issues.
  • Remember that correlation does not imply causation

Recommendations

  • Label axes clearly
  • Consider transparency for overlapping points
  • Consider density-based visualizations (e.g. hexbin plot) if the number of points is too high

Variations and related visualizations

  • The scatter plot can encode additional variables with the size of points (bubble plot), their color or their shape (for a categorical variable).
  • The 3D scatter plot is an extension of the scatter plot to three dimensions, allowing 3 continuous variables to be displayed as x-, y- and z-coordinates.

Links

Wikidata entity: Q1045782

Wikipedia page: Scatter plot