The runoff triangle is a data structure familiar to both pricing and reserving actuaries commonly used to organize losses by date of occurrence (generally the vertical axis), and in the case of paid loss triangles, the date of payment (horizontal axis). In practice, triangles present losses in one of two states: Incremental loss triangles represent losses for a given accident period an a particular point in time. Cumulative loss triangles represent the cumulative losses to date up to and including the development period from which losses are evaluated.
The following article demonstrates four approaches that can be used to create runoff triangles in R: Using base R, foreach, data.table and the ChainLadder package. All examples reference a loss dataset made available by the Reinsurance Association of America, which can be downloaded here. This is the same dataset identified as RAA in the ChainLadder package, the only difference being the above referenced dataset contains incremental (as opposed to cumulative) losses. The columns are:
ORIGIN The year in which the loss occurred.
DEV The development period. In the case of paid losses, the number of periods (months, years, etc.) after the occurance that payment was made.
VALUE In the case of paid losses, the paid loss amount (in dollars) for losses at each valid cell representing the intersection of ORIGIN and DEV.
Method I: Base R
It is possible to convert transactional loss data into a triangle format without the use of 3rd-party libraries. The next example demonstrates one method:
# Loss Triangle Compilation in R: Method I.
= "https://gist.githubusercontent.com/jtrive84/976c80786a6e97cce7483e306562f85b/raw/06a5c8b1f823fbe2b6da15f90a672517fa5b4571/RAA.csv"
raaPath
= read.table(
DF file=raaPath, header=TRUE, sep=",", row.names=NULL,
stringsAsFactors=FALSE
)
= matrix(
triData1 nrow=length(unique(DF$ORIGIN)),
ncol=length(unique(DF$DEV)),
dimnames=list(
ORIGIN=sort(unique(DF$ORIGIN)), DEV=sort(unique(DF$DEV))
)
)
cbind(factor(DF$ORIGIN), DF$DEV)] = DF$VALUE triData1[
Inspecting triData1
:
> triData1
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The base R method returns the triangle as a matrix of incremental losses. A triangle of cumulative losses can be obtained by running:
> triData1c = t(apply(apply(triData1, 2, cumsum), 1, cumsum))
> triData1c
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
Method II: Using foreach + data.table
The foreach package provides a looping construct for executing R code repeatedly. It is especially useful in developing platform-independent applications that distribute tasks across multiple cores. data.table facilitates fast aggregation of large datasets, fast ordered joins and fast add/modify/delete of columns by group using no copies. In this example, we use foreach along with data.table’s rbindlist
function to create a triangle of incremental losses:
# Loss Triangle Compilation in R: Method II.
library("data.table")
library("foreach")
= fread(file=raaPath, sep=",", stringsAsFactors=FALSE)
DF
= split(DF, by="ORIGIN")
lossList
= foreach(
triData2 i=1:length(lossList),
.final=function(ll) rbindlist(ll, fill=TRUE)
%do% {
) = setNames(
iterList as.list(lossList[[i]]$VALUE),
nm=as.character(lossList[[i]]$DEV)
)append(list(ORIGIN=names(lossList)[i]), iterList)
}
Inspecting triData2
:
> print triData2
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
In method II, the transactional loss data is first split into a list of data.tables by common ORIGIN which was then bound to lossList
. The foreach constructor then iterates over lossList
, transforming each data.table into what amounts to a single row in the incremental triangle. Notice that within the foreach constructor, the .final
parameter is set to a function: When specified, .final
represents a function of one argument that is called to return the final result. data.table’s rbindlist
takes as input a list of data.tables/data.frames and concatenates them horizontally, returning a single data.table.
When the resulting triangle of incremental losses is returned as a data.table/data.frame, we can generate the cumulative loss triangle as follows:
> devDF = triData2[,-c("ORIGIN")]
> devDF[,names(devDF):=Reduce(`+`, devDF, accumulate=TRUE)]
> triData2c = cbind(triData2[, .(ORIGIN)], devDF)
> triData2c
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
First ORIGIN is removed from triData2
, so that devDF
consists only of the development period columns. Next Reduce
is applied to all rows, specifying accumulate=TRUE
so that a cumulative sum is calculated. Finally, ORIGIN is vertically concatenated back with the cumulated losses by development period and bound to triData2c
.
Method III: data.table
data.table’s dcast
function is a powerful and flexibile utility used primarily for reshaping datasets. Fieldnames are specified and used as column and row indicies in terms of a formula expression that follows the form LHS ~ RHS
. For Method III, dcast
is used to convert the transactional loss data into an incremental loss triangle:
# Loss Triangle Compilation in R: Method III.
library("data.table")
= fread(file=raaPath, sep=",", stringsAsFactors=FALSE)
DF = dcast(DF, ORIGIN ~ DEV, value.var="VALUE", fill=NA) triData3
A description of the arguments passed to dcast
:
DF
: The data.table to transform into a runoff triangle.ORIGIN ~ DEV
: A formulaic expression of the formLHS ~ RHS
dictating how to transformDF
. The argument on the LHS, in this exampleORIGIN
, represents the key in the resulting table. The RHS argument, in this exampleDEV
represents the column whose levels become column names in the new table.fun.aggregate=sum
: Although not required in this case, if our original data wasn’t aggregated by yyyy and dev, including this argument would perform the aggregation. In our example, this argument has no effect.fill=NA
: Specifies how to populate missing values. Excluding values in cells intentionally removed to provide a more robust example, for a givenORIGIN
, development periods in excess of(1990 - ORIGIN + 1) * 12
represent future evaluation dates which should be set toNA
.
Inspecting triData3:
> triData3
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
Since dcast
returns a data.table, the approach demonstrated in Method II can be used to obtain a triangle of cumulative losses, and so it will not be repeated here. However, the data.table approach enables us to perform a task which is non-trivial to perform using the other methods as easily: Converting a loss triangle back into a dataset of transactional losses. This is accomplished with data.table’s melt
function. In the next example, we convert triData3
back into the original RAA dataset:
# Convert a loss triangle back into transactional loss data.
# Assume triData3 exists and represents incremental loss
# data compiled from RAA.csv.
library("data.table")
= data.table::melt(
DF2 id.vars="ORIGIN", variable.name="DEV",
triData3, value.name="VALUE", value.factor=FALSE,
variable.factor=FALSE, na.rm=TRUE
)
# Convert DEV back to numeric and order records.
:=as.integer(DEV)]
DF2[,DEVsetorderv(DF2, c("ORIGIN", "DEV"))
Test equivalence between DF and DF2:
> all.equal(DF, DF2)
1] TRUE [
Method IV: ChainLadder
The ChainLadder package provides various statistical methods which are typically used for the estimation of outstanding claims reserves in general insurance. It includes a suite of utilities that can be used to estimate outstanding claim liabilities, but those utilities will not be covered here. Of interest is the triangle
class made available by the ChainLadder package. The as.triangle
specification is provided below:
as.triangle(Triangle, origin="origin", dev="dev", value="value",…)
Triangle
represents the transactional loss dataset. For origin
, dev
and value
, specify the corresponding fieldnames present in the loss data. Referring again to the RAA dataset:
# Loss Triangle Compilation in R: Method IV.
library("ChainLadder")
= fread(
DF file=raaPath, header=TRUE, sep=",", row.names=NULL,
stringsAsFactors=FALSE
)
# Fieldnames in DF are "ORIGIN", "DEV", "VALUE".
= as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE") triData4
Inspecting triData4:
> triData4
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 3257 2638 898 1734 2642 1828 599 54 172
1982 106 4179 1111 5270 3116 1817 -103 673 535 NA
1983 3410 5582 4881 2268 2594 3479 649 603 NA NA
1984 5655 5900 4211 5500 2159 2658 984 NA NA NA
1985 1092 8473 6271 6333 3786 225 NA NA NA NA
1986 1513 4932 5257 1233 2917 NA NA NA NA NA
1987 557 3463 6926 1368 NA NA NA NA NA NA
1988 1351 5596 6165 NA NA NA NA NA NA NA
1989 3133 2262 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
The ChainLadder package comes with two convenience functions that allow for conversion from incremental-to-cumulative or cumulative-to-incremental triangles, identified as incr2cum
and cum2incr
respectively. Next incr2cum
is used to create the cumulative counterpart of triData4
:
> triData4c = incr2cum(triData4)
> triData4c
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
A point worth mentioning with respect to the behavior of incr2cum
and cum2incr
: After converting tabular loss data into a triangle object, there is no internal reference that tracks whether the data originally represented cumulative or incremental losses. If you have incremental losses that have been transformed into a triangle instance and then run incr2cum
on that triangle, a triangle of incremental losses is returned as expected. However, if you pass that cumulative loss triangle incr2cum
again, the function will cumulate the already cumulated losses. For example:
# Read in RAA.csv.
> DF = fread(
file=raaPath, header=TRUE, sep=",", row.names=NULL,
stringsAsFactors=FALSE
)> triData4 = as.triangle(DF, origin="ORIGIN", dev="DEV", value="VALUE")
> triData4c = incr2cum(triData4)
>triData4c
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 8269 10907 11805 13539 16181 18009 18608 18662 18834
1982 5118 12554 16303 22471 27321 31780 33505 34777 35366 NA
1983 8528 21546 30176 38612 46056 53994 56368 58243 NA NA
1984 14183 33101 45942 59878 69481 80077 83435 NA NA NA
1985 15275 42666 61778 82047 95436 106257 NA NA NA NA
1986 16788 49111 73480 94982 111288 NA NA NA NA NA
1987 17345 53131 84426 107296 NA NA NA NA NA NA
1988 18696 60078 97538 NA NA NA NA NA NA NA
1989 21829 65473 NA NA NA NA NA NA NA NA
1990 23892 NA NA NA NA NA NA NA NA NA
# So far so good...Pass triData4c to incr2cum again.
> triData4c2 = incr2cum(triData4c)
> triData4c2
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 13281 24188 35993 49532 65713 83722 102330 120992 139826
1982 106 4391 9787 20453 34235 49834 65330 81499 98203 NA
1983 3410 12402 26275 42416 61151 83365 106228 129694 NA NA
1984 5655 17210 32976 54242 77667 103750 130817 NA NA NA
1985 1092 10657 26493 48662 74617 100797 NA NA NA NA
1986 1513 7958 19660 32595 48447 NA NA NA NA NA
1987 557 4577 15523 27837 NA NA NA NA NA NA
1988 1351 8298 21410 NA NA NA NA NA NA NA
1989 3133 8528 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
Pass triData4c2 to incr2cum again.> triData4c3 = incr2cum(triData4c2)
> triData4c3
DEV1 2 3 4 5 6 7 8 9 10
ORIGIN 1981 5012 18293 42481 78474 128006 193719 277441 379771 500763 640589
1982 106 4497 14284 34737 68972 118806 184136 265635 363838 NA
1983 3410 15812 42087 84503 145654 229019 335247 464941 NA NA
1984 5655 22865 55841 110083 187750 291500 422317 NA NA NA
1985 1092 11749 38242 86904 161521 262318 NA NA NA NA
1986 1513 9471 29131 61726 110173 NA NA NA NA NA
1987 557 5134 20657 48494 NA NA NA NA NA NA
1988 1351 9649 31059 NA NA NA NA NA NA NA
1989 3133 11661 NA NA NA NA NA NA NA NA
1990 2063 NA NA NA NA NA NA NA NA NA
This usually won’t be a problem, especially for the conscientious actuary. Just be sure to track the state of your data when relying on ChainLadder to convert between cumulative and incremental losses.