Transparency is central to our mission. This page explains exactly how we collect, process, and present salary data on AmericaByNumbers.com.
We take raw data from official U.S. government sources — the Bureau of Labor Statistics (BLS) for salary data and the U.S. Census Bureau for city demographics — process it into clear, comparable profiles, and present it in user-friendly formats. We do not conduct our own surveys or modify the underlying data.
Our data pipeline follows a structured, repeatable process:
1 Source Data Download
We download the OEWS flat data file (oe.data.0.Current) directly from the BLS public data repository. This file contains the complete set of current occupational employment and wage estimates published by the BLS.
2 Occupation Discovery
We parse the BLS occupation reference file (oe.occupation) to identify all detailed occupation codes (SOC codes) with available data. As of 2024, this yields 831 distinct occupations with state-level wage data.
3 Data Extraction
For each occupation and each of the 50 U.S. states, we extract seven key data points from the BLS flat file:
| Metric | BLS Code | Description |
|---|---|---|
| Employment | 01 | Total number of workers in this occupation in the state |
| Mean Annual Wage | 04 | Average (mean) salary across all workers |
| 10th Percentile | 11 | Salary at the bottom 10% — entry-level benchmark |
| 25th Percentile | 12 | Salary at the bottom 25% |
| Median (50th) | 13 | Middle salary — half earn more, half earn less |
| 75th Percentile | 14 | Salary at the top 25% |
| 90th Percentile | 15 | Salary at the top 10% — experienced professionals |
4 Database Storage
Extracted data is stored in a structured SQLite database with indexed lookups by occupation code (SOC) and state (FIPS). This ensures fast, accurate retrieval during page generation.
5 Page Generation
Pages are generated programmatically using Python and Jinja2 templates. Each page pulls its data directly from the database. Before publishing, an automated link audit verifies that every internal link points to a valid page.
6 Quality Verification
After every build, we run a comprehensive link audit that checks all internal links across all 34,900+ pages. Any broken link causes the build to flag an error. We also spot-check random pages against the original BLS data to verify accuracy.
We perform a small number of straightforward calculations to make the data more useful:
The BLS publishes OEWS data annually, typically in the spring for the prior year's estimates. Our current data reflects May 2024 estimates. We update our database each time the BLS releases new OEWS data.
Like any data source, BLS OEWS data has certain limitations that users should be aware of:
Our data sources are entirely public. Anyone can verify our numbers by accessing the same BLS data files we use:
If you find a discrepancy between our data and the BLS source, please let us know and we will investigate and correct it.
In addition to salary data, AmericaByNumbers.com provides demographic, economic, and housing profiles for over 28,000 U.S. cities and towns. This data follows a similar rigorous pipeline:
1 Census API Data Download
We query the U.S. Census Bureau's American Community Survey (ACS) 5-Year Estimates API for all incorporated places and census-designated places (CDPs) across all 50 states and the District of Columbia. The ACS 5-Year dataset provides the most reliable estimates for small geographies.
2 Multi-Table Data Extraction
For each place, we extract data from multiple Census tables:
| Category | Census Table | Key Variables |
|---|---|---|
| Demographics | B01003, B02001, B01002 | Population, race/ethnicity, median age |
| Income | B19013, B19301, B17001 | Median household income, per capita income, poverty |
| Education | B15003 | HS diploma, bachelor's, graduate degree attainment |
| Housing | B25077, B25064, B25003 | Median home value, median rent, ownership rate |
| Employment | B23025 | Civilian labor force, unemployment rate |
3 Data Cleaning & Normalization
Census place names are cleaned (removing designations like "city", "town", "CDP"), duplicate city names are disambiguated using FIPS codes, and all percentages are calculated from raw population counts. Places with incomplete data across all key fields are still included but with appropriate "Data not available" labels.
4 Database Storage & Page Generation
Data is stored in a SQLite database (census.db) with 6 normalized tables. City profile pages are generated using Jinja2 templates with CSS-only visualizations (no JavaScript dependencies for charts). A link audit verifies all internal links after every build.