Apache Airflow Engineering Digest
Jan 23 – Feb 22, 2026 · GitHub PRs + PR Complexity Scores + GitHub Issues + dev@ mailing list
The headline
Jarek Potiuk runs this project. Not in the sense that he has the most complex code, but in the sense that he simultaneously operates as the top code shipper (63 PRs merged), the top reviewer (412 reviews), the top mailing list participant (60 messages), a top issue closer (25 issues closed, 18 other people's), the release manager for Airflow 2.11.1, and the primary enforcer against AI-generated spam PRs. No other contributor comes close on any single dimension, let alone all six. The complexity data adds nuance: only 1 of his 16 scored PRs was high-complexity. His impact comes from breadth and volume across every surface of the project.
The more surprising finding is what's happening below him. Ash Berlin-Taylor, who a dashboard would show as completely inactive (0 PRs, 0 lines, 0 issues, 0 mailing list messages), reviewed 148 PRs with 214 review comments. He approved only 7 times out of 148 reviews. His reviews target the project's hardest architectural PRs: 16 of the PRs he reviewed scored high-complexity (>= 0.5), including the top-scoring PR in the entire project (#58992, serde migration, 0.772). He is Airflow's architectural conscience, and he never says yes.
Meanwhile, Shahar Epstein, who a PR dashboard would show as a minor contributor (2 merged PRs, 238 lines), is the project's most prolific issue triager. She closed 39 issues this month, 37 filed by other people, commented 70 times across 43 issues, gave 240 PR reviews, cast 5 binding votes, and was promoted to PMC member during this period. She is the second-most-important person in the project, visible only when you cross-reference all four sources.
The complexity data reveals Airflow's real architecture of effort: 1,595 PRs were active this month, but only 548 had review comments. Of 4,692 classified review comments, 11% were probing (reviewer uncertainty), 80% directing, and 8% polishing. Probing ratio correlates 0.352 with review rounds, weaker than Kafka's 0.449, suggesting Airflow's review culture is more directive ("change this") and less interrogative ("should we?").
Ash Berlin-Taylor: The architectural conscience
GitHub: 0 PRs merged · 148 reviews · 214 review comments · 0 issue comments · 0 lines Mailing list: 0 messages
Zero code. Zero issues. Zero mailing list presence. His entire contribution is 214 review comments, and they are uniformly the highest-quality feedback in the dataset. He approved 7 times out of 148 reviews, a 5% approval rate.
The complexity data makes his role precise: 16 of the PRs he reviewed scored >= 0.5 complexity. He is not randomly distributed across the project; he concentrates on the hardest work. His top reviewee is amoghrajesh (33 reviews), who has the highest average complexity (0.400) among prolific contributors. ashb is essentially amoghrajesh's dedicated architectural reviewer.
On PR #58992 (move serde to task SDK, the highest-complexity PR at 0.772, probing ratio 0.43): he questioned API impact and deserialization behavior across the SDK boundary. On PR #57744 (move Config Parser to shared library, 0.684): he gave 6 detailed reviews including re-export patterns, dead comment removal, and questioning why tests needed changing in a pure refactor. On PR #53821 (write-to-ES, 207 days to merge): "This can't? shouldn't? work in Airflow 3-- tasks logs are uploaded in the worker, and workers no long have access to the Database." He provided 12 reviews and 22 comments on this single PR, guiding it through fundamental Airflow 3 logging architecture concerns.
On PR #59764 (dag clear only_new): "I don't think we should ever change the version of an existing Dag Run. I'm fairly sure that will break a number of core assumptions."
On PR #57778 (task timeouts): "But let me be crystal clear: This is wrong. UP_FOR_RETRY is not a running state. It is a finished state, the task process is no longer running when this state is reached."
His most distinctive behavior: he almost never approves. Of 148 reviews: 133 COMMENTED, 7 CHANGES_REQUESTED, 7 APPROVED, 1 DISMISSED. He operates as a pure architectural advisor who leaves formal approval to other maintainers. A dashboard would show ashb as inactive. In practice, he is preventing design mistakes in the core scheduler, task runner, and logging architecture.
Jarek Potiuk: The everything contributor
GitHub PRs: 63 merged · 412 reviews · 179 review comments · 242 issue comments · 29,000+/16,000- lines GitHub Issues: 3 opened · 25 closed (18 other people's) · 69 comments across 56 issues Mailing list: 60 messages · 8 threads started · 32 threads participated · 12 votes (5 binding) Complexity: 16 PRs scored, avg 0.223. 1 high-complexity, 12 low-complexity.
The complexity data reframes Potiuk's contribution. His 63 merged PRs include release synchronizations (+11,290/-7,021 in a single sync PR), dependency cleanup, SPDX licensing, CI changes, and docs updates. This is the mechanical work that keeps the project running. Only 1 of 16 scored PRs crossed the high-complexity threshold (#58825, pre-K hook imports check, 0.512).
His 412 reviews are broadly distributed: amoghrajesh (46), jscheffl (39), xBis7 (12), Arunodoy18 (8). He approves 60% of the time (245 APPROVED vs 14 CHANGES_REQUESTED), the highest approval rate among top reviewers. He is the final gatekeeper who unblocks frequently.
On the mailing list, he drove the entire Airflow 2.11.1 release lifecycle, started the most contentious discussion (AI-slop prevention, 14 messages, 7 participants), and raised security concerns about task provenance in AIP-67. His review comments enforce quality standards on GitHub: on PR #59407, "Please. If you use agents, do not commit this reasoning as part of your PR." On PR #62108: "We reported you to Github for Inauthentic / Spam activity."
He is simultaneously the project's builder, reviewer, release manager, governance participant, spam enforcer, and community organizer. The complexity data clarifies that his building is maintenance-grade: dependency updates, release syncs, CI fixes. That is exactly what the project needs from him, and nobody else is doing it at this scale.
The most complex work
The complexity scoring classifies every review comment as probing (reviewer uncertain), directing (reviewer knows the fix), or polishing (nits). The top PRs by composite score:
| PR | Title | Author | Score | Probing | Rounds | Comments | Pattern |
|---|---|---|---|---|---|---|---|
| #58992 | Move serde to task SDK | amoghrajesh | 0.772 | 43% | 22 | 23 | Core refactor, SDK boundary |
| #57354 | Upgrade email notifications to SmtpNotifier | amoghrajesh | 0.733 | 33% | 30 | 18 | Breaking change concern |
| #55139 | Move DagBag to task SDK | amoghrajesh | 0.716 | 21% | 44 | 24 | Foundational abstraction move |
| #61310 | Thread-safe auth manager init | stegololz | 0.700 | 63% | 7 | 8 | Unresolved, never merged |
| #60410 | YAML connection form metadata | amoghrajesh | 0.689 | 22% | 23 | 32 | JSONSchema pushback |
| #57744 | Move Config Parser to shared lib | amoghrajesh | 0.684 | 21% | 26 | 19 | Core refactor |
amoghrajesh dominates the complexity leaderboard. 4 of the top 6 highest-complexity PRs are his. These are the core Airflow 3 refactoring: moving serde, DagBag, and Config Parser to the task SDK. ashb reviewed all of them. The amoghrajesh-to-ashb pipeline is where Airflow's hardest architectural work happens.
PR #61310 (thread-safe auth manager initialization, probing ratio 63%) is the only top PR that never merged. Multiple reviewers (jscheffl, pierrejeambrun) questioned unrelated changes included in the PR, and the design questions about thread-safety remained unresolved. The highest probing ratio in the top 10, and the only one that didn't land.
PR #60410 shows ashb's review quality. amoghrajesh initially reimplemented JSONSchema validation; ashb pushed back, and jscheffl questioned unnecessary container classes. amoghrajesh responded by integrating JSONSchema directly. The review made the implementation fundamentally better.
Contributor complexity profiles
The complexity data distinguishes who does hard work from who does a lot of work:
| Contributor | PRs scored | Avg complexity | High (>=0.5) | Low (<0.3) | Pattern |
|---|---|---|---|---|---|
| xBis7 | 4 | 0.618 | 3 | 0 | Consistently hard: OTel, scheduler |
| johnhoran | 3 | 0.530 | 2 | 0 | K8s Pod Operator specialist |
| Lee-W | 5 | 0.500 | 3 | 0 | Partitioned assets, timetables |
| guan404ming | 6 | 0.463 | 3 | 1 | AssetGraph, dag run filtering |
| dabla | 8 | 0.441 | 4 | 2 | GenericTransfer, deferrable ops |
| amoghrajesh | 34 | 0.400 | 12 | 15 | Core refactoring + routine providers |
| potiuk | 16 | 0.223 | 1 | 12 | High volume, maintenance-focused |
| jscheffl | 13 | 0.192 | 1 | 11 | Maintenance + Edge UI plugin |
| Arunodoy18 | 16 | 0.191 | 0 | 14 | Routine contributions |
xBis7 has the highest average complexity (0.618) with zero low-complexity PRs. All work focused on OTel metrics and scheduler query optimization. PR #56150 (OTel env variables) took 84 review rounds and 108 comments over 143 days. jason810496 gave 6 reviews covering code reuse, error types, and module placement.
amoghrajesh is the most interesting profile: 34 scored PRs, 12 high-complexity and 15 low-complexity. The split reveals two distinct modes. Core Airflow 3 refactoring (serde migration, config parser moves) generates deep architectural review. Routine provider YAML metadata work passes quickly. He is simultaneously the project's deepest architectural contributor and a volume producer of mechanical changes.
potiuk and jscheffl, the project's two most active maintainers by review count, both focused their authored PRs this month on maintenance and tooling (avg 0.223 and 0.192). Their highest-impact work came through reviews, release verification, and governance rather than through the PRs they wrote.
The four reviewers
| Reviewer | Reviews | Comments | Approval rate | Top reviewee | Pattern |
|---|---|---|---|---|---|
| Potiuk | 412 | 179 | 60% | amoghrajesh (46) | High-volume gatekeeper, unblocks frequently |
| jscheffl | 317 | 210 | 49% | potiuk (29) | Balanced maintainer, mutual review with Potiuk |
| jason810496 | 241 | 204 | 34% | Owen-CH-Leung (21) | Review-heavy mentor, files 26 issues from review |
| ashb | 148 | 214 | 5% | amoghrajesh (33) | Architectural conscience, almost never approves |
Potiuk approves 60% of reviews, the highest rate, indicating a gatekeeper who actively unblocks. ashb approves 5%, the lowest, indicating an advisor who challenges but doesn't gate. jscheffl sits in between (49%), reviewing potiuk's work as often as potiuk reviews his, creating a mutual accountability loop.
jason810496 (Zhe-You Liu) is the second-most-prolific reviewer (241) with only 6 PRs opened. His reviews are 158 COMMENTED vs 82 APPROVED, and he also filed 26 issues this month (more than anyone else), discovering bugs through his review work and filing them systematically. He operates as a mentoring/quality layer.
Shahar Epstein (shahar1) rounds out the top 5 with 240 reviews. Her 21 CHANGES_REQUESTED (the highest count of any reviewer) shows she gates more actively than anyone except ashb. Combined with her 39 issues closed and 5 binding votes, she operates at every layer: issue triage, PR review, and governance.
The AI-slop war
Airflow is actively fighting AI-generated spam, visible across both GitHub and the mailing list.
On the mailing list: Potiuk's "[DISCUSS] Stop assigning unknown contributors to issues (AI-slop prevention)" thread (14 messages, 7 participants) proposed removing the ability for unknown contributors to self-assign issues. The discussion produced a compromise: "soft assignments" that decay over time. Aritra Basu initially opposed, then came around. Damian Shaw proposed automatic unassignment.
On GitHub: Potiuk's review comments show active enforcement. PR #62108 was reported to GitHub for inauthentic spam. PR #61017 was flagged as "a completely hallucinated PR." PR #60901 was identified as "clearly AI generated - including the response."
The bot layer itself: GitHub Copilot generates 104 review comments (more than most human maintainers) but has zero formal review authority and never approves anything. The project uses AI as a supplementary reviewer while actively removing AI-generated human contributions. The irony is quantifiable: the AI reviewer produces more comments than 490 of 496 human contributors.
Release engineering
| Release | RCs | Result | Key detail |
|---|---|---|---|
| Airflow 3.1.7 | 2 | Shipped | RC1 had issues; accelerated RC2 with 6 binding +1 |
| Airflow 2.11.1 | 1 (rc2) | Shipped | Potiuk drove multi-week effort. 6 binding + 5 non-binding |
| Helm Chart 1.19.0 | 3 | Shipped | Jens found deployment issues in rc1. rc2 couldn't reach quorum. rc3 passed |
| Providers (Jan 27) | n/a | Shipped | Vincent Beck's first release as RM. 7 binding +1 |
| Providers (Feb 10) | n/a | Shipped | Shahar Epstein as RM. 5 binding +1 |
| AirflowCTL 0.1.2 | 1 | Stalled | Only 1 binding +1 (Jens). Bugra voted 0 (abstain) |
| AIP-94 | n/a | Passed | Decouple remote commands to airflowctl. 4 binding +1 |
The Helm Chart release is the story. Three RC cycles over 18 days. Jens Scheffler's deployment testing (Docker installs, kind clusters, Edge Executor integration, cryptographic signature verification, SVN checksums) caught issues that other voters missed. The rc2 vote failed not because anyone objected; it simply never attracted enough binding voters. Governance bottleneck: the release was ready, but the quorum couldn't be reached.
Top binding voters: Jens Scheffler (6), Bugra Ozturk (6), Shahar Epstein (5), Jarek Potiuk (5), Amogh Desai (4), Rahul Vats (4).
The issue layer
662 issues had activity this month. 216 opened, 248 closed. The label distribution tells the story:
| Label | Issues | Signal |
|---|---|---|
| kind:bug | 344 | Bugs dominate 2:1 over features |
| area:core | 271 | Core scheduler/executor is where the hard problems are |
| kind:feature | 182 | Feature requests filed but lag behind bug fixes |
| good first issue | 150 | Large onramp, but many sit unclaimed |
| needs-triage | 138 | 21% still need initial classification |
Who triages vs who files:
| Contributor | Opened | Closed | Comments | Signal |
|---|---|---|---|---|
| jason810496 | 26 | 12 | 29 | Top opener, files issues from review work |
| shahar1 | 5 | 39 | 70 | Top closer, triages others' issues |
| potiuk | 3 | 25 | 69 | Issue-level maintainer |
| vincbeck | 13 | 16 | 15 | Full-cycle on auth issues |
| amoghrajesh | 7 | 16 | 39 | Balanced: files and triages |
Shahar barely files issues (5) but closes 8x as many (39), almost all filed by other people. jason810496 files 26 issues while giving 241 PR reviews, discovering bugs through review and filing them systematically.
The longest-running and most-debated PRs
| PR | Author | Days/Rounds | Score | Status | Story |
|---|---|---|---|---|---|
| #45931 Secrets backends order | moiseenkov | 396+ days, 34 reviews | n/a | Still open | Nobody can agree on the approach. Crowiant provided 16 of 34 reviews |
| #53821 Write-to-ES fix | Owen-CH-Leung | 207 days, 69 rounds | 0.636 | Merged | ashb (12 reviews, 22 comments) + jason810496 (21 reviews) shepherded it through Airflow 3 logging concerns. Community users pressed for merge due to upgrade breakage |
| #56150 OTel env variables | xBis7 | 143 days, 84 rounds | 0.664 | Merged | 108 total comments. jason810496 drove review on module placement and error types |
| #53216 Version change indicators | choo121600 | 224+ days, 25 reviews | n/a | Still open | 77 comments, still iterating |
| #57610 Informatica provider | cetingokhan | 106 days, 23 reviews | 0.573 | Merged | First provider under AIP-95. jscheffl: "then you could be the first provider here following the new contribution model" |
The complexity scores separate these long-running PRs into two categories. PR #56150 (OTel, 0.664) and PR #53821 (write-to-ES, 0.636) were genuinely complex: high probing ratios, architectural debate, multiple reviewers disagreeing. PR #57610 (Informatica, 0.573) was process-complex: the first provider under a new governance model, where the review was about onboarding procedures as much as code quality.
Provider ecosystem
185 of 1,595 PRs (12%) touch provider code. The landscape:
- Google Cloud: 36 PRs. BigQuery lineage, Cloud Batch hooks, GKE operators, GCS sensors
- Amazon: 25 PRs. ECS/EKS system tests, SES email, Glue catalog
- FAB: 24 PRs. Yarn-to-pnpm migration (+6,181/-9,378 by jscheffl), auth manager
- Kubernetes: 16 PRs. Pod operator, secrets backend. johnhoran's work (avg complexity 0.530) is the hardest provider work in the dataset
- Edge: 15 PRs. Edge executor, multi-team (AIP-67)
- OpenLineage: 7 PRs. kacpermuda authored all 6 merged. He is the lineage domain owner
- New: Informatica. AIP-95, first provider under steward/contributor model. 106 days, 23 reviews
AIP governance
| AIP | Subject | Status | Key detail |
|---|---|---|---|
| AIP-94 | Decouple remote commands to airflowctl | Passed | 4 binding +1 (Jens, Potiuk, Dheeraj, Shahar) |
| AIP-95 | New provider: Informatica | Merged | First under new governance model. 15 messages, 9 participants |
| AIP-67 | Execution API access control for multi-team | Active debate | Potiuk: "no protection against the tasks making claims that they belong to." Security question unresolved |
| AIP-100 | Eliminate scheduler starvation on concurrency limits | Early | asquator proposed, Jens responded. Low engagement |
The AIP-67 debate is the most architecturally significant. Vincent Beck proposed spawn-based isolation. Potiuk rejected it: "does not change much and is not good for performance." He called for original thinking from Ash, Kaxil, and Amogh. The multi-team feature will ship as experimental in Airflow 3.2, with this security question still unresolved.
The complexity vocabulary
Airflow's review comments reveal what reviewers worry about most:
| Concern | Mentions | Signal |
|---|---|---|
| Breaking change | 33 | Airflow 3 migration creates constant backward-compatibility anxiety |
| Backward compatibility | 18 | Same concern, different phrasing (51 total mentions of compatibility) |
| Race condition | 4 | Thread safety in auth manager init, scheduler |
"Breaking change" appears 33 times across 548 scored PRs, nearly 3x Kafka's 12 mentions. Airflow is in the middle of a major version migration (2.x to 3.x), and almost every core PR triggers compatibility questions. This is the project's dominant complexity concern.
Cross-source view
| Contributor | PR reviews | PRs merged (complexity) | Issues closed | ML msgs | ML binding | Role |
|---|---|---|---|---|---|---|
| Potiuk | 412 | 63 (avg 0.223, 1 high) | 25 | 60 | 5 | Breadth across every surface |
| ashb | 148 | 0 | 0 | 0 | 0 | Architectural review, GitHub only |
| jscheffl | 317 | 32 (avg 0.192, 1 high) | 11 | 20 | 6 | Verification + governance |
| shahar1 | 240 | 2 | 39 | 20 | 5 | Triage across all sources |
| jason810496 | 241 | 2 | 12 opened, 12 closed | 6 | 0 | Review + issue discovery |
| amoghrajesh | 144 | 42 (avg 0.400, 12 high) | 16 | 18 | 4 | Hardest code + governance |
| pierrejeambrun | 202 | 20 | 13 | 0 | 0 | UI gatekeeper, GitHub only |
| vincbeck | 188 | 15 | 16 | 4 | 0 | Dependency manager + auth |
| Nataneljpwd | 81 | 3 | 0 | 0 | 0 | Mentor: Spark/K8s/Databricks |
| Copilot | 0 | 0 | 0 | 0 | 0 | 104 review comments, no authority |
The table reveals that amoghrajesh is the only person who ships high-complexity code AND participates in governance. Potiuk and jscheffl ship volume but their authored PRs this month were maintenance-focused. ashb reviews the hardest work but ships nothing. Shahar triages everything but codes little. The project's hardest architectural work flows through one person (amoghrajesh) reviewed by one other person (ashb), a two-person pipeline for the most consequential changes.
What a dashboard would show vs. what actually happened
| Dashboard says | What actually happened |
|---|---|
| potiuk: 63 PRs, 29,000 lines | Also 412 reviews, 25 issues closed, 60 mailing list messages, drove the 2.11.1 release, and actively reports AI-spam accounts. But only 1 of 16 scored PRs was high-complexity. His impact comes from covering every surface of the project at scale. |
| ashb: 0 PRs, 0 lines | 148 reviews, 214 review comments, 5% approval rate. 16 of his reviewed PRs scored high-complexity. He reviewed the serde migration, Config Parser move, write-to-ES feature, and every other hard architectural PR. Never says yes. |
| jason810496: 6 PRs, 241 reviews | Also filed 26 issues (the most of anyone). Discovers bugs through review and files them systematically. 158 COMMENTED vs 82 APPROVED: mentors more than he gates. |
| Shahar Epstein: 2 PRs, 238 lines | 39 issues closed (37 others'), 240 PR reviews, 20 mailing list messages, 5 binding votes. The project's #1 issue triager, promoted to PMC during this period. Only visible when you cross-reference all four sources. |
| amoghrajesh: 42 PRs merged | 12 of 34 scored PRs were high-complexity (the most of anyone). Owns the serde migration, SmtpNotifier upgrade, Config Parser move, and YAML metadata. ashb reviewed him 33 times. The project's deepest architectural contributor. |
| jscheffl: 32 PRs, 317 reviews | 6 binding votes (tied highest), 20 mailing list messages. Caught Helm Chart rc1 deployment issues that other voters missed. His 13 scored PRs this month were primarily maintenance and tooling; his highest-impact work was in review and release verification. |
| xBis7: 4 PRs | Highest average complexity (0.618) of any contributor. Zero low-complexity PRs. The OTel env variables PR took 84 rounds and 143 days. Quietly doing the hardest work in the project. |
| Copilot (AI): 0 PRs | 104 review comments, more than 490 of 496 human contributors. Never approves anything. The project uses AI to supplement review while removing AI-generated human contributions. |
| Helm Chart 1.19.0: shipped | Three RCs. rc2 failed not because anyone objected; it couldn't attract enough binding voters. Governance bottleneck. |
| 1,595 PRs active | Only 548 had review comments. 11% probing, 80% directing, 8% polishing. Breaking change (33 mentions) and backward compatibility (18) dominate: the Airflow 3 migration drives complexity. |
Generated by Canopy from four sources: GitHub PRs (with PR complexity classification scores), GitHub Issues, and the dev@ mailing list. Cross-referenced into a single narrative. Jan 23 – Feb 22, 2026.