Back to Article
Reviewer 1
Download Notebook

Reviewer 1

This paper generously offers more than a story of creating the tidyverse, but also shares rare insights into your broader software and community philosophies. It is easy to read, and the discussion on the design, development, and marketing of statistical software is useful for anyone wishing to create tools that others will use. The paper is comprehensive in presenting all aspects of tidyverse, beyond simply the software itself.

My main suggestion for improving this paper would be reworking the tidy evaluation section. I acknowledge that writing about such a poorly understood topic is difficult to do concisely (and it is appropriately the longest topic of section 4). For readers unfamiliar with tidy evaluation, this section is limited by not providing more context/framing for why tidy evaluation is helpful for data analysis (despite the challenges introduced in programming with it). For example without tidy evaluation the equivalent code is df |> mutate(z = df$x + y), or df$z <- df$x + y (tedious code without leveraging the data context). Perhaps also draw parallels to similarities in base R functions. The ‘data-variable’ and ‘environment-variable’ is also unclear, and could be helped with the introduction of their corresponding pronouns, e.g. df |> mutate(z = .data$x + .env$y). The programming difficulties with tidy evaluation (and the evolution of tooling) could then follow, with the benefit of more background context.

There are many minor corrections needed in the paper, which I briefly list below:

  • grammatical problems and typos (e.g. “the the”, embracing explanation {} not {{}}, etc.)
  • the first explanatory sentence of grouped_mean function has the wrong variable names - it should be group_var = g1, and/or summary_var = a.
  • the citations are sparse, with notable omissions including citing R itself and other discussed packages.
  • the tidymodels section feels like a less-personal stub that could be moved elsewhere or integrated into the earlier posit section.

I appreciate the personal perspective you’ve shared throughout the paper, and I hope these suggestions help to improve the clarity of its ideas.

Reviewer 2

The paper describes author’s personal history of tidyverse. Overall, it is a nice summary of author’s personal history of tidyverse, however, a lot of the information presented were previously publicly available so there is not much new information or new insight for those who followed the author’s work. Would the author be willing to share some additional information or insight that may not be commonly known? Some questions below may be helpful.

Further questions that might be interest to the reader

  • Page 11 - The paper notes the high cost of introducing new data structures. However, better-designed structures might yield computational or memory efficiency benefits. Could the author expand on how they weigh human-centric design against computational performance?
  • Page 13 - The distinction between data-variable and environment-variable remains a subtle source of confusion. Readers may benefit from a discussion of past approaches like aes_string() in ggplot2 that used string inputs to resolve this. Why was this strategy eventually deprecated in favor of tidy evaluation?
  • Page 13 - The tidy evaluation section could be enhanced by acknowledging alternative approaches in base R, such as bquote() and eval(). For example:
grouped_mean <- function(df, var_name, summary_var) {
  bquote(df %>%
           group_by(.(substitute(var_name))) %>%
           summarise(mean = mean(.(substitute(summary_var))))
         ) %>%
    eval()
}

grouped_mean(cars, dist, speed)

This method is accessible using base tools without the need of tidyeval and can be conceptualized as macro substitution. Some readers may wonder why the tidyverse chose not to promote this simpler approach. A brief comment or comparison would be welcome. - Page 17 - The tidymodels framework is introduced, but its reception in the community is not discussed. Some readers may be curious whether it achieved the same adoption and clarity as earlier tidyverse tools. Were there challenges in unifying APIs for different statistical models? Has the community embraced it, or found it complex compared to, say, lm() or glm() workflows? - What was author’s influence on the design of the dplyr package? Çetinkaya-Rundel et al (2022) “An educator’s perspective of the tidyverse” suggests that dplyr has benefits of transferability with SQL syntax, but was the author motivated by SQL? - How did he become a contractor for Winston Chang? How did his job at RStudio start? Where did he meet JJ Allaire?

Typos or grammar issues

  • Page 2: “this led to an general interest” -> “this led to a general interest”
  • Page 3: “my Dad” -> “my dad” (to be consistent with page 2)
  • Page 3: “This lead me to” -> “This led me to”
  • Page 3: “through generally provided” -> doesn’t grammatically make sense
  • Page 4: “NSF” -> National Science Foundation.
  • Page 4 - “for the the tidyverse” -> “for the tidyverse”?
  • Page 5: “it requires that understand functions” -> “it requires that users understand functions”
  • Page 6 - “In the this section” -> “In this section”
  • Page 6: “an pragmatic approach” (in the footnote) -> “a pragmatic approach”
  • Page 14 - “I fully I embraced” -> “I fully embraced”
  • Page 17: “usethis::use_release_isse()” -> “usethis::use_release_issue()”
  • Page 19: “we the community found” -> “the community found”
  • Page 20: “IDE”, “LLMs” -> acronyms should be defined first

“new users bring fresh perspectives and questions that challenge assumptions, while experienced practitioners share deep insights that push the ecosystem forward.” -> very nice writing.

Possible fixes

  • Figure 1: the grey points I believe represent the update releases to CRAN. Caption needs to include this.
  • Page 17, tidymodels team doesn’t mention Julia Silge, but isn’t she part of the tidymodels team?
  • Page 18: mention of R forwards task force (https://forwards.github.io/) is I think fitting next to R-Ladies, etc.
  • Page 21 - The author mentions re-licensing to MIT. It may be worth briefly acknowledging the 2020 GitHub discussions about ggplot2 re-licensing, where some community members raised concerns about enabling commercial reuse without attribution. A sentence to acknowledge the diversity of views would provide transparency and balance.