Project overview
This project explores demographic income patterns through cleaning, aggregation, visualization, and interpretation.
The case study focuses on the decisions behind making messy data understandable without overstating what the analysis can prove.
The product problem
Income data can be difficult to communicate because relationships between education, occupation, geography, and age are easy to flatten into misleading summaries.
The challenge was to build an analysis workflow that preserves nuance while still producing clear charts and takeaways.
Solution direction
The workflow separates cleaning, feature grouping, exploratory analysis, and visualization so each chart has a clear analytical purpose.
Visualizations are selected for comparison and pattern recognition rather than decorative dashboard density.
What I owned
- ›Cleaned and prepared census-style data for analysis.
- ›Grouped demographic dimensions for income comparison.
- ›Created visualizations for education, occupation, age, and geography patterns.
- ›Documented limitations and interpretation notes for responsible communication.
Feature system
Cleaning pipeline
Data preparation is separated from visualization so assumptions are easier to inspect.
Grouped analysis
Income patterns are explored across meaningful demographic dimensions.
Readable visuals
Charts are chosen to make comparisons visible without unnecessary complexity.
Interpretation notes
Findings include caveats so charts are not presented as stronger evidence than they are.
How the system fits together
income_by_group = (
df.groupby(["education", "occupation"])
.agg(median_income=("income", "median"))
.sort_values("median_income", ascending=False)
)Engineering decisions
Avoiding misleading averages
decision.01Use grouped medians and compare distributions where possible.
The analysis better reflects skewed income patterns.
Communicating limitations
decision.02Document caveats around correlation, sampling, and category grouping.
The final story stays more responsible and analytically honest.
Where the project stands
The project is in progress and presented as a public case study with private source.
It demonstrates data cleaning, exploratory analysis, visualization judgment, and responsible interpretation.
