Should you use polars in R? – Erik Gahner Larsen

If you work with big data in a tabular format in Python, it is difficult to avoid polars in the current environment. There are many good reasons to use polars in Python (performance and the API are the two most relevant ones), and for R users the question is whether polars is worth considering as well. There are a few different packages that you can use to work with polars, but for now I will say that the answer is no. That is, there is no good reason to start using polars in R.

The primary reason is that if you work in R you will most likely have better alternatives at your disposal both in terms of performance and API. The official {polars} package with R bindings for polars is not a good choice as it is in maintenance mode and a new package is in development. The main thing I do not like about the original package is that the functions are accessible via the pl$ prefix. Here is, for example, how you can create a data frame (from the package vignette):

library("polars")

pl$DataFrame(a = 1:5, b = letters[1:5])

I do not like this for two reasons. First, I do not think it is needed. If you are familiar with how the global namespace works in R and in particular the importance of the order in which packages are loaded, you do not – in most cases – need to be too concerned about the namespace (and if you do, you should maybe consider alternative function names – e.g., pl_DataFrame() and pl_select()). Second, if you want to be explicit in the function calls to avoid namespace conflicts, why not stipulate the use of polars::DataFrame() rather than pl$DataFrame?

For the “next generation of Polars R API”, neo-r-polars, it does not look like anything is going to change in terms of the API. Here is a simple example on how the fields binding is available for Struct:

pl$Struct(a = pl$Int32)$fields

This might not look bad if you are mostly familiar with Python, but if you are mostly using tidyverse in R and do not rely on $ in most of your work, this is not easy code to read. Of course, there is a solution in the form of {tidypolars}. This package provides a polars backend for the tidyverse API. While this is promising, it is important to keep in mind that it still relies on a no longer maintained polars package and, for that reason alone, I would not recommend using it right now.

The polars API is better than pandas API in Python, especially as you do not need to work with the implicit row index in pandas, so no need for .loc[] and iloc[], but compared to the APIs you will find in R with {dplyr} and {data.table}, there is simply no argument to be made for polars in R in terms of the API.

For performance, while polars is a great choice in Python, I have not seen any performance metrics for polars in R, and I doubt they will do better than the alternatives available in R, such as {data.table} and {duckdb}. That is, if performance is your main concern, you are most likely better off working with polars in Python (using lazy evaluations) or a better alternative in R.

In sum, while things can and most likely will change in the future, as of right now, I do not see a strong case for using polars in R.