Apache Spark Community Health Report
February 14 -- 21, 2026 (7-day window)
Newcomer Welcoming
| Reviewer | Newcomer PRs Reviewed | Total Reviews |
|---|---|---|
| anishshri-db | 44 | 44 |
| cloud-fan | 27 | 81 |
| dongjoon-hyun | 26 | 68 |
| heyihong | 22 | 22 |
| viirya | 12 | 21 |
| zhengruifeng | 9 | 22 |
| pan3793 | 8 | 10 |
| gengliangwang | 7 | 9 |
| HyukjinKwon | 7 | 40 |
| HeartSaVioR | 7 | 7 |
anishshri-db reviewed exclusively newcomer/infrequent contributor PRs in Structured Streaming. cloud-fan and dongjoon-hyun review broadly across experience levels.
Interaction Breadth
| Contributor | Unique People | Role |
|---|---|---|
| dongjoon-hyun | 19 | Broadest; interacts across all subsystems |
| cloud-fan | 18 | SQL and cross-cutting review |
| HyukjinKwon | 13 | PySpark and infra bridge |
| pan3793 | 12 | Build system and cross-cutting |
| zhengruifeng | 10 | PySpark review and planning |
| Yicong-Huang | 9 | Python serialization cluster |
| gaogaotiantian | 9 | PySpark testing and review |
| szehon-ho | 7 | SQL and data sources |
| HeartSaVioR | 6 | Streaming-focused cluster |
| anishshri-db | 5 | Streaming-only cluster |
dongjoon-hyun and cloud-fan function as connective tissue. anishshri-db and HeartSaVioR form a tight streaming cluster.
Helping vs Self-Promoting (Net Reviewer Ratio)
| Contributor | PRs Authored | PRs Reviewed | Net Reviewer |
|---|---|---|---|
| cloud-fan | 2 | 81 | +76 (overwhelmingly helping) |
| anishshri-db | 0 | 44 | +44 (pure helper) |
| dongjoon-hyun | 15 | 68 | +52 (mostly helping) |
| HyukjinKwon | 1 | 40 | +39 (mostly helping) |
| gengliangwang | 0 | 9 | +9 (pure helper) |
| mikhailnik-db | 0 | 6 | +6 (pure helper) |
| HeartSaVioR | 3 | 7 | -18 (primarily authoring) |
| holdenk | 1 | 3 | -20 (primarily authoring) |
Strong cadre of pure helpers. cloud-fan's 81:2 review-to-author ratio is extraordinary.
Top Net Reviewers
| Rank | Contributor | Given | Received | Net |
|---|---|---|---|---|
| 1 | cloud-fan | 84 | 8 | +76 |
| 2 | dongjoon-hyun | 69 | 17 | +52 |
| 3 | anishshri-db | 44 | 0 | +44 |
| 4 | HyukjinKwon | 40 | 1 | +39 |
| 5 | zhengruifeng | 26 | 10 | +16 |
Top 4 net reviewers carry the project's quality burden. Concentration risk if any reduce activity.
Consistency (Jira Cross-Reference)
Consistently active: dongjoon-hyun (1,087 Jira + 15 PRs + 68 reviews), cloud-fan (622 Jira + 81 reviews), zhengruifeng (619 Jira + 22 reviews).
GitHub-only (implementation-focused): uros-db, AlSchlo, dichlorodiphen.
High Jira, lower GitHub this week: HyukjinKwon (859 assigned, but 1 PR + 40 reviews).
Summary
| Dimension | Strength | Concern |
|---|---|---|
| Newcomer welcoming | Broad coverage from top reviewers | anishshri-db concentrated on few people |
| Interaction breadth | dongjoon-hyun (19) and cloud-fan (18) connect community | Streaming subsystem is a tight cluster |
| Helping vs self-promoting | Exceptional pure helper ratio | Few both author and review at scale |
| Net reviewer ratio | Top 4 carry quality burden | Concentration risk |
| Consistency | Strong Jira cross-reference signal | 7-day window too short for full measurement |