Scatter Plot
Definition
A chart that displays values for two variables as points on a coordinate system
- superclass of bubble plot
Anatomy
A scatter plot is a two-dimensional graphical representation where each point represents an individual observation (a data point), with its position determined by the corresponding values of the observation in terms of the x-variable and y-variable.
Interpreting a scatter plot
Points can cluster, spread out, or follow a discernible pattern, revealing relationships between variables. A sign of the importance of scatter plots is that they provide the best illustration of terms like “clusters” and “linear relationship”.
When examining a scatter plot, you may look for:
- Linear relationships, in which points cluster along a line
- Nonlinear relationships, in which points may cluster along a curve or other nonlinear shape
- Clusters of points
- Outliers or unusual data points
- Density of data point distribution
When and how to use a scatter plot
Strengths
- Effectively displays relationship between two continuous variables
- Reveals correlation patterns
- Shows individual data points
- Highlights potential trends and clusters
- Identifies potential outliers
Caveats and limitations
- Scatter plots can only represent continuous variables as x and y coordinates.
- Overlapping data may cause issues.
- Remember that correlation does not imply causation
Recommendations
- Label axes clearly
- Consider transparency for overlapping points
- Consider density-based visualizations (e.g. hexbin plot) if the number of points is too high
Variations and related visualizations
- The scatter plot can encode additional variables with the size of points (bubble plot), their color or their shape (for a categorical variable).
- The 3D scatter plot is an extension of the scatter plot to three dimensions, allowing 3 continuous variables to be displayed as x-, y- and z-coordinates.
Links
Wikidata entity: Q1045782
Wikipedia page: Scatter plot