One of the things I miss most (when working in R) is Python’s builtin string methods and string manipulation functions. The two methods I miss the most are startswith and endswith. Here’s and example of how they work in Python:
1]: "apple".endswith("e")
In [True
2]: "rubbersoul".startswith("rubber")
In [True
3]: "billiondollar".endswith("babies")
In [False
Everything in Python is an object, and all Python string objects expose these two methods (and many others). I wanted to make the same functionality available in R while maintaining the simplicity of the Python approach. I found a way to accomplish this using user-defined binary operators in R.
Binary Operators
User-defined binary operators in R consist of a string of characters between two %
characters. Some frequently used builtin binary operators include %/%
for integer division and %%
, which represents the modulus operator. Declaring a binary operator is identical to declaring any other function, except for the name. Here’s an implementation of %startswith%
and %endswith%
:
# Example of declaring user-defined binary operators in R.
`%startswith%` = function(teststr, testchars) {
# `teststr`: The target string.
# `testchars`: The character(s) to test for in `teststr`.
return(grepl(paste0("^", testchars), teststr))
}
`%endswith%` = function(teststr, testchars) {
# `teststr`: The target string.
# `testchars`: The character(s) to test for in `teststr`.
return(grepl(paste0(testchars, "$"), teststr))
}
Once read in to the current session, both individual strings and vectors of strings can be passed to either operator to test for the specified leading or trailing character(s). For example, if I had the following vector:
= c("January", "February", "March", "April", "May", "June", "July",
months "August", "September", "October", "November", "December")
And wanted to test whether or not the elements of months
start with “J”,
%startswith%
could be used as follows:
> months %startswith% "J"
1] TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE [
Similarly, to check whether elements of months
end with “ber”, run:
> months %endswith% "ber"
1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE [
To obtain the indicies of the elements of months ending with “ber”, we can use %endswith%
in conjunction with which
:
> months %endswith% "ber"
1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
[> which(months %endswith% "ber")
1] 9 10 11 12
[> months[which(months %endswith% "ber")]
1] "September" "October" "November" "December" [