Extended Example: Extracting a Subtable

Một phần của tài liệu No starch press the art of r programming (Trang 157 - 160)

Let’s continue working with our voting example:

> cttab

Voted.for.X.Last.Time Vote.for.X No Yes

No 2 0

Not Sure 0 1

Yes 1 1

Suppose we wish to present this data at a meeting, concentrating on those respondents who know they will vote for X in the current election. In other words, we wish to eliminate the Not Sure entries and present a sub- table that looks like this:

Voted.for.X.Last.Time Vote.for.X No Yes

No 2 0

Yes 1 1

The functionsubtable()below performs subtable extraction. It has two arguments:

• tbl: The table of interest, of class"table".

• subnames: A list specifying the desired subtable extraction. Each compo- nent of this list is named after some dimension oftbl, and the value of that component is a vector of the names of the desired levels.

So, let’s review what we have in this example before looking at the code.

The argumentcttabwill be a two-dimensional table, with dimension names Voted.for.XandVoted.for.X.Last.Time. Within those two dimensions, the level names areNo,Not Sure, andYesin the first dimension, andNoandYesin the second. Of those, we wish to exclude theNot Surecases, so our actual argu- ment corresponding to the formal argumentsubnamesis as follows:

list(Vote.for.X=c("No","Yes"),Voted.for.X.Last.Time=c("No","Yes")) We can now call the function.

> subtable(cttab,list(Vote.for.X=c("No","Yes"), + Voted.for.X.Last.Time=c("No","Yes")))

Voted.for.X.Last.Time Vote.for.X No Yes

No 2 0

Yes 1 1

Now that we have a feel for what the function does, let’s take a look at its innards.

1 subtable <- function(tbl,subnames) {

2 # get array of cell counts in tbl

3 tblarray <- unclass(tbl)

4 # we'll get the subarray of cell counts corresponding to subnames by

5 # calling do.call() on the "[" function; we need to build up a list

6 # of arguments first

7 dcargs <- list(tblarray)

8 ndims <- length(subnames) # number of dimensions

9 for (i in 1:ndims) {

10 dcargs[[i+1]] <- subnames[[i]]

11 }

12 subarray <- do.call("[",dcargs)

13 # now we'll build the new table, consisting of the subarray, the

14 # numbers of levels in each dimension, and the dimnames() value, plus

15 # the "table" class attribute

16 dims <- lapply(subnames,length)

17 subtbl <- array(subarray,dims,dimnames=subnames)

18 class(subtbl) <- "table"

19 return(subtbl)

20 }

So, what’s happening here? To prepare for writing this code, I first did a little detective work to determine the structure of objects of class"table". Looking through the code of the functiontable(), I found that at its core, an object of class"table"consists of an array whose elements are the cell counts.

So the strategy is to extract the desired subarray, then add names to the di- mensions of the subarray, and then bestow"table"class status to the result.

For the code here, then, the first task is to form the subarray corre- sponding to the user’s desired subtable, and this constitutes most of the code. To this end, in line 3, we first extract the full cell counts array, stor- ing it intblarray. The question is how to use that to find the desired sub- array. In principle, this is easy. In practice, that’s not always the case.

To get the desired subarray, I needed to form a subsetting expression on the arraytblarray—something like this:

tblarray[some index ranges here]

In our voting example, the expression is as follows:

tblarray[c("No","Yes"),c("No","Yes")]

This is simple in concept but difficult to do directly, sincetblarraycould be of different dimensions (two, three, or anything else). Recall that R’s array subscripting is actually done via a function named"["(). This function takes a variable number of arguments: two for two-dimensional arrays, three for three-dimensional arrays, and so on.

This problem is solved by using R’sdo.call(). This function has the fol- lowing basic form:

do.call(f,argslist)

wherefis a function andargslistis a list of arguments on which to callf(). In other words, the preceding code basically does this:

This makes it easy to call a function with a variable number of arguments.

For our example, we need to form a list consisting first oftblarrayand then the user’s desired levels for each dimension. Our list looks like this:

list(tblarray,Vote.for.X=c("No","Yes"),Voted.for.X.Last.Time=c("No","Yes")) Lines 7 through 11 build up this list for the general case. That’s our sub- array. Then we need to attach the names and set the class to"table". The former operation can be done via R’sarray()function, which has the follow- ing arguments:

• data: The data to be placed into the new array. In our case, this issubarray.

• dim: The dimension lengths (number of rows, number of columns, num- ber of layers, and so on). In our case, this is the valuendims, computed in line 16.

• dimnames: The dimension names and the names of their levels, already given to us by the user as the argumentsubnames.

This was a somewhat conceptually complex function to write, but it gets easier once you’ve mastered the inner structures of the"table"class.

Một phần của tài liệu No starch press the art of r programming (Trang 157 - 160)

Tải bản đầy đủ (PDF)

(404 trang)