Thoughts on LLM AI to learn R programming

In my Quant UX book with Kerry Rodden, we outline the skills needed to succeed and thrive as a Quant UX researcher. The book's editor recently asked me what I thought about using generative AI such as ChatGPT to learn R programming (I have a different book about learning R for analytics; and I have used R as my primary language since 1999). I've adapted this post from my reply.

The comments here apply primarily to beginning coders using ChatGPT. (Experienced coders will have somewhat overlapping concerns; I'll try to say more about those at another time.)

Update: another interesting take appeared simultaneously today on Slate. A Computer Science professor found that ChatGPT performed quite poorly — although TBH better than I expected — on an intro CS exam: https://slate.com/technology/2023/08/chatgpt-vs-algorithms-class.html#cxrecs_s This aligns with my recommendation to "code it yourself".

I see a few advantages to using LLMs to learn R. First, most new R users want to "get something done." They are not learning the R language for its own sake ... and LLMs are fast at getting something done. Second, the R language itself is compact and yet most tasks require accessing the vast (over 19000) set of R packages. There is no good front end to those and it is quite difficult to determine where to begin for even simple tasks. LLMs can help with that. Third, the space of public R code is huge and focuses on the tasks most people want to accomplish, so the odds are good that an LLM will have seen similar examples and can plausibly riff on them.

However, there are several problems that I see in using LLMs to learn programming in general and R in particular. First, as with everything LLM-based, the quality of the answer is dubious at best and extremely risky at worst. Let's dig into that. In the Quant UX book we present R programming problems that are approximately at the level of 1st semester programming. That is also the level of a whiteboard interview for many Quant UX jobs (full disclosure: from 2017-2022, I led the process to write the standard quant UX interview questions used at Google).

Here's the first problem in the Quant UX book, which I would expect a strong candidate to complete quickly and with no major issues during an interview:

Using the R language, write a function that will take a typical license plate string and convert it to the NATO phonetic alphabet (https://en.wikipedia.org/wiki/NATO_phonetic_alphabet). For example, the license plate input "QUANT 1" would give the result "Quebec Uniform Alpha November Tango One".

When I give that to ChatGPT 3.5, I get the following response:

ChatGPT
Sure, I can help you with that! Here's a function written in R that takes a license plate string and converts it to the NATO phonetic alphabet:

# r
# Copy code

license_plate_to_nato <- function(license_plate) {
  nato_alphabet <- c("Alpha", "Bravo", "Charlie", "Delta", "Echo", "Foxtrot",
                     "Golf", "Hotel", "India", "Juliett", "Kilo", "Lima",
                     "Mike", "November", "Oscar", "Papa", "Quebec", "Romeo",
                     "Sierra", "Tango", "Uniform", "Victor", "Whiskey",
                     "X-ray", "Yankee", "Zulu")

  license_plate <- gsub("[^A-Za-z0-9]", " ", license_plate)  # Replace non-alphanumeric characters with spaces
  words <- unlist(strsplit(license_plate, " "))

  nato_words <- sapply(words, function(word) {
    if (word %in% nato_alphabet) {
      return(word)
    } else if (grepl("^[0-9]+$", word)) {
      digits <- strsplit(word, "")[[1]]
      nato_digits <- sapply(digits, function(digit) {
        nato_alphabet[as.numeric(digit) + 1]
      })
      return(paste(nato_digits, collapse = " "))
    } else {
      return(word)
    }
  })

  nato_result <- paste(nato_words, collapse = " ")
  return(nato_result)
}

# Test the function
license_plate <- "QUANT 1"
nato_conversion <- license_plate_to_nato(license_plate)
cat("License Plate:", license_plate, "\n")
cat("NATO Phonetic:", nato_conversion, "\n")

What does that do? Here's what happens when I run it in R:

> license_plate <- "QUANT 1"

> nato_conversion <- license_plate_to_nato(license_plate)

> cat("License Plate:", license_plate, "\n") License Plate: QUANT 1

> cat("NATO Phonetic:", nato_conversion, "\n") NATO Phonetic: QUANT Bravo

Although it runs, the output is markedly incorrect — it does not even succeed in answering the test that it cites!

I won't go into all of the reasons the code is wrong, but I would point out that it not only doesn't iterate correctly, but also doesn't include the full NATO alphabet (which includes translation of numbers to phonetics, such as converting "9" to "niner"). It splits the strings incorrectly, has incorrect lookup logic (matching against the NATO strings themselves), and compounds that with an incorrect indexing procedure (matching "1" to "Bravo"). In short, it contains so many errors that one would be better served to start over. If a beginner managed to patch the code enough to work, it would be quite likely to contain other bugs and unintended consequences.

In this case, the omissions and errors are trivial only because the problem itself is trivial. Such errors could become dangerous in situations where the correctness of the output is not immediately inspectable, as in many statistical models (among other things).

That highlights the second problem: the skills that are necessary to use an LLM for coding are quite different than those needed to write code from scratch. They involve the expertise to debug some other entity's code. New programmers typically learn to write small sections of code first, and develop skills in debugging gradually, becoming proficient after many years. Put differently, learning to code with ChatGPT is a different problem than "learning to code."

The third problem is closely related: R is notorious for having many different ways to accomplish the same task, thanks to the odd nature of the language (I won't go into that except to say that it is really a functional language similar to Lisp but superficially mimics a procedural language like Python or C), and also thanks to the package ecosystem. This means two things: (1) generated code is that much more difficult to test, adding to the problems above; and (2) generated code is much more difficult to understand.

For instance, in the ChatGPT example above, there is a vectorized sapply() function for iteration instead of a more basic for() loop construction. There are pros and cons of each, and an experienced R programmer might choose either (or some other options). The point is that R offers multiple solutions and this poses high cognitive complexity. One must have deep knowledge to understand every possible kind of solution (especially when the code is also incorrect, as is ChatGPT's for the sapply() iteration). The divergent solutions are syntactically and conceptually quite different from one another, at least for beginners.

This doesn't mean that it's impossible to learn R programming with major assistance from an LLM. But it does mean that if you are going to use an LLM for R code, expect to invest extremely heavily in writing unit tests, and learn all the ins and outs of debugging R code. On the positive side, it will stretch your understanding of R debugging very rapidly!