Avian Home >> Research and Monitoring >> Publications
Jock S. Young and Richard L. Hutto
Division of Biological Sciences
University of Montana
Missoula, MT 59812, USA
Bird habitats are often described by categorizing the world into various "cover types" such as "ponderosa pine" or "clearcut" or "grassland." This is the method we used in the GTR we published in 1999. To use this approach for management, however, we would have to assume that all of the stands within each of those categories are pretty much the same, and all we have to do is maintain "ponderosa pine" or "grassland" and we'll be maintaining all of the birds that like that type. But we know all of the stands within each type are not the same, and some species may require special elements that are not in all stands. So, it may be better to look at the world as a continuum, using "continuous" variables such as "canopy cover" and "percent ponderosa pine." These type of data are usually analyzed using regression, and the following paper discusses this type of approach.
I hope to make a more straighforward and user-friendly version of this soon, but in the meantime here is a paper we have written that shows our approach.
This paper has been published and should be cited as:
Young, J. S., and Hutto, R. L. 2002. Use of regional-scale exploratory studies to determine bird-habitat relationships. Pp. 107-119 (Chapter 8) in Scott, J.M., et al. (eds.). Predicting Species Occurrences: Issues of Scale and Accuracy. Island Press, Covelo, CA.
The wide geographic extent of regional bird monitoring programs usually makes them non-experimental and exploratory in nature. In such studies, variables are often chosen by expedience, and sources of variation are not controlled. Even so, there is great potential to learn something meaningful about bird-habitat relationships when bird distribution or abundance information is linked with additional information about vegetation characteristics at the sites (Wiens 1981, Ralph and colleagues 1995).
One goal of bird-habitat relationship studies is to identify environmental conditions that presumably control the distribution and abundance of a bird species. Knowledge of the biologically important variables would help us make more informed management decisions, as well as making relatively accurate predictions of bird occurrence in new, unsurveyed sites. Because we cannot measure all biologically important variables, however, the resulting models are heavily influenced by the choice of variables and the methods used for exploratory analysis. So, there is a critical need for discussion of the best approaches to get the most out of such observational data sets.
In this paper, we present analyses of data from the USFS Northern Region Landbird Monitoring Program (Hutto and Young 1999). Full-scale monitoring began under this program in 1994. One aim of the program was to conduct long-term monitoring, so a series of permanently marked transects were located randomly within a geographic stratification scheme. This is not the most efficient design for discerning habitat associations, however, because a preponderance of points lie in common cover types (Austin and Meyers 1996, Heglund and colleagues, this volume?). Nonetheless, we collected data on local vegetation characteristics in the area surrounding each point to obtain as much basic habitat-relationship information as possible to supplement the population monitoring data, and to help explain the possible reasons for any population declines that may be detected later.
The distributions of various landbird species across cover types in the northern Rocky Mountains have already been published (Hutto and Young 1999). However, the 18 cover types used in those analyses pooled together a diverse assemblage of vegetation structures. No bird species was detected at all points within any given cover type. Some of those absences may have been due to sampling error, population fluctuations, or chance, but we assume that there were also absences due to finer-resolution variation in habitat characteristics among the points within a single cover type. We need to know what these features are if we are to manage bird species successfully.
In this paper, we discuss methods for this second step: building regression models to expose finer-resolution patterns of occurrence due to the continuously variable nature of vegetation features. We apply our proposed model-building methods to a single example species, the Swainson's Thrush (Catharus ustulatus). We then examine several subsets of the data to determine the consistency and accuracy of our results under different conditions. Even though this is an exploratory exercise, and even though we make use of uncontrolled data not designed primarily to expose habitat relationships, the resulting descriptive models should at least suggest ecological relationships and help focus future research.
Sample points were distributed across 12 National Forests of the USFS Northern Region, in northern Idaho and western Montana. Transects were geographically stratified by 7.5-minute topographic quadrangle maps throughout non-wilderness Forest Service lands and part of the Potlatch Corporation lands in central Idaho (Fig. 1). Potential transect start points were located by positioning a random point within each quarter of the quadrangle maps and then finding the nearest point on an unpaved, secondary or tertiary, open or closed road or trail (Hutto and Young 1999). Transects were selected randomly from this potential set, subject to logistical constraints. The 10 points along each transect were placed at least 250 m (straight-line distance) apart. Each point was sampled once during the breeding season in each of three consecutive years (1994 to 1996). There were some changes in transect locations between years. In this paper we used the 428 transects that were visited in all three years, although point selection criteria discussed below resulted in only 292 transects being represented in the data set used for analyses.
Our field technique followed recommendations discussed by Ralph and colleagues (1995) and methods described by Hutto and colleagues (1986). All observers participated in a 1-week training session. Points were visited once each breeding season between mid-May and mid-July. All birds seen or heard during a 10-minute count period were recorded, and the distance to each was estimated (Hutto and Young 1999). Field observers generally began counts about 15 min after sunrise (after the pre-dawn chorus), and generally completed counts within four hours. Counts were not conducted on days with continuous rain or strong winds.
Field observers first determined whether a 100-m-radius circle around a survey point could be considered a homogeneous cover type. If so, further vegetation measurements were then taken on relatively continuous variables representing vegetation physiognomy and floristics (Hutto and Young 1999).
The selection of variables to measure was based largely on our understanding of avian ecology from the literature and personal observation. Vegetation variables involved structural characteristics of the vegetation at different layers as well as tree species composition. Emphasis was placed on variables that were of potential biological importance for one or more of the bird species analyzed, and could be collected quickly in the field by trained workers.
We estimated the tree species composition of the canopy layer because of the evidence that floristics may be important in habitat relationships (e.g. MacNally 1990b). Different tree species have different architecture and different invertebrate assemblages (e.g. Recher and colleagues 1991). Bird species forage non-randomly among plant species (Airola and Barrett 1985, Rotenberry 1985), and nests are often placed in some tree species preferentially over others (e.g., Martin 1992). General surveys of cover types (e.g. Hutto and Young 1999) have shown that many bird species are non-randomly distributed across stands of different tree species.
Because it had been determined qualitatively that the vegetation cover was homogeneous out to 100 m, we assumed that quantification of vegetation variables within a 30-m-radius circle (excluding the road corridor) was sufficient to represent the entire area. Therefore, all vegetation variables used in this report were estimated to 30 m, except for counts of large-dbh (>40 cm) trees (LGTREE), which were based on an 11.3-m-radius circle (excluding the road corridor). Ocular estimates of the following variables were conducted within the 30-m-radius circle (Hutto and Hoffland 1996, Hutto and Young 1999): HEIGHT--the typical height of the tree canopy layer; CANOPY--the percent cover of canopy trees (larger than saplings); SAPLING--the percent cover of sapling trees (between 5- and 10-cm dbh); SHRUB--the percent cover of tall shrubs (multi-stemmed woody plants greater than 1-m tall); BUSH--the percent cover of low shrubs (less than 1-m tall); and GROUND--the percent cover of grasses and forbs.
For tree species composition, we estimated the percent of the total canopy cover made up by each of the tree species indicated below (some associated species were lumped together): PIPO--Percent of canopy made up by ponderosa pine (Pinus ponderosa); PSME--Percent of canopy made up by Douglas-fir (Pseudotsuga menziesii); LAOC--Percent of canopy made up by Western Larch (Larix occidentalis); PICO--Percent of canopy made up by lodgepole pine (Pinus contorta); SPRFIR--Percent of canopy made up by spruce/fir (Picea engelmannii, Abies lasiocarpa); MESIC--Percent of total canopy cover made up by western redcedar (Thuja plicata), western hemlock (Tsuga heterophylla), and Grand Fir (Abies grandis); and DECID--Percent of total canopy cover made up by deciduous trees (Betula papyrifera, Populus tremuloides, and P. trichocarpa).
Each year, new field observers independently estimated the values of vegetation variables in conjunction with the collection of bird data (although tree species composition was estimated in 1994 only). Except when noted, the data from the three separate years were averaged for each point.
Points with more than one cover type within 100 m were excluded from the analyses, so the estimates of vegetation structure could reasonably be expected to represent an average of a relatively homogeneous area around the point. The bird data were also limited to 100 m, so that both the bird and vegetation data represented samples of the 3.14-ha (100-m-radius) area surrounding the point. Some authors recommend the use of a 50-m radius for bird data, but the effect of restricting count radius is not uniform across species (Wolf and colleagues 1995). Wide-ranging species with loud calls, such as the Common Raven (Corvus corax) and Pileated Woodpecker (Dryocopus pileatus), were detected within 50 m only 5-15% of the time. Small birds with soft or high-pitched songs, such as the Brown Creeper (Certhia americana) and Golden-crowned Kinglet (Regulus satrapa), were detected within 50 m 85-90% of the time. Some other species whose songs are unmistakable, such as the Olive-sided Flycatcher (Contopus cooperi) and Varied Thrush (Ixoreus naevius), were also identified at greater distances, perhaps due to observer confidence. Thus, it may be best to vary the cutoff radius for different species. Because between-species comparisons are not recommended for point counts (Wolf and colleagues 1995), the different radii should not be a concern. In the case of the Swainson's Thrush, only 35% of detections were within 50 m, whereas 80% were within 100m. The song carries well and is easily identified, so we used all detections within 100 m. To test the effect of this decision, however, we also constructed a model based on a 50-m-radius plot.
To reduce the confounding effects of very different cover types, some of which we know this species would not occur in, we modeled the habitat associations of the Swainson's Thrush within the subset of conifer forest cover types. We included all points with some conifer trees, ranging from 5-35 m tall and 1-80% canopy coverage. By restricting the data set, we are changing the question from the distribution of a bird species across a wide array of cover types to a more refined distribution within a subset of cover types.
Although different points on a transect can be in different cover types, and can be argued to be independent choices in habitat selection made by bird species with territory sizes of a couple hectares or less, the relative health of the local population, local meteorological conditions, and so forth, will always produce some dependence in the data within each transect. Nonetheless, we used individual points as sample units because (1) combining data from all points on a transect would create meaningless sample units with respect to vegetation variables, given that transects run through a series of different cover types; (2) given a mixture of cover types on each transect, and the elimination of points near edges, we included, on average, only 3.8 points per transect; and (3) our emphasis on the relative importance of variables, rather than strict rules for inclusion of variables in the model, made the sample size a less pressing issue, although it was still important relative to the ability of the data to support a model with many parameters. We also present models based on one point per transect.
For the main model of our example species (Swainson's Thrush), we pooled the three years of data by averaging the vegetation estimates over the three years, and by counting the presence of the species in any of the three years as a presence for that point. This method allowed inclusion of sites where the species was simply missed in some of the three years. Alternative approaches are addressed below.
The statistical importance of a variable in any modeling procedure depends on how close the mathematical form of the model is to the form of the true relationship. The simplicity of linear regression has led to its adoption as the typical method in wildlife habitat studies (Young 1996). However, a linear model assumes that a unit-unit relationship holds true for the entire range of an environmental attribute (Meents and colleagues 1983), so that if more is better then a lot more must be much better (Johnson 1981). However, simple niche theory assumes that organisms respond to most important resource gradients in a unimodal fashion. This has been standard procedure among plant ecologists for decades (e.g., Whittaker 1967, Austin 1976, ter Braak and Prentice 1988), but animal ecologists have been much slower to embrace it (Young 1996, Heglund and colleagues, this volume?; but see Meents 1983, Heglund and colleagues 1994, etc.). There is no particular reason why the relationship must be Gaussian or even symmetrical (Austin 1976); the mode may not even be the optimum (Austin 1980). We do not really know what the shape is likely to be in any particular case, so we used the simplest method possible to pick up at least some of any unimodal signal, while adding only one parameter, which was the addition of a quadratic term (ter Braak and Prentice 1988).
A significant quadratic term can result from nonlinear but monotonic relationships, such as asymptotes, as well as unimodal relationships. Such response curves would be expected if there were a threshold in the response, or if we did not have the complete gradient for the variable. In such cases, however, the linear regression would also show a strong relationship, so the quadratic term may not be necessary to determine the importance of a variable (although it may help improve predictions). The inclusion of the quadratic term is even more important in cases where linear regression indicates no relationship.
If a bird species is associated with one tree species, it is likely to show a quantitative response to others because the proportions are interdependent. One way around this might be an ordination procedure, although this may result in gradients due to overall productivity and/or forest structure rather than direct effects of tree species composition. We opted to use the direct variables to more easily interpret direct effects of tree species. We did not consider quadratic terms for the tree species variables because sparse data (many zeros) would make the additional parameter less supportable, and because such relationships would have little logical interpretation.
We visited point counts only once per year. Because single visits do not commonly produce multiple detections of any one bird species, we used logistic regression (Hosmer and Lemeshow 1989) to analyze the effects of vegetation variables on the presence or absence of each bird species at the points. Also, biases due to detectability and observer variability should be less pronounced in presence/absence data than in abundance data.
To begin the model-building process, we discarded variables that had exceptionally high p-values in simple regressions, or variables that made no biological sense for the particular species (in this case the Swainson's Thrush). We then used the Akaike Information Criterion (AIC) for model selection (Akaike 1974, McQuarrie and Tsai 1998). AIC incorporates the tradeoff between bias and variance as variables are added to a model, and it provides a straightforward comparison between models that does not depend on a hypothesis-testing framework (Burnham and Anderson 1998). It moves the emphasis away from p-values and arbitrary cutoff criteria (Johnson 1999), and extracts more information from the data regarding the relative strength of evidence for each variable (and model).
When choosing among models using statistical inference, it is best to work with only a few select models chosen a priori on biological grounds (Burnham and Anderson 1998). However, our data set was both correlative and unstructured, and we knew little about the expected relationships for many species, so we must considered this to be an exploratory analysis. The best method of variable selection for modeling such a data set has been the subject of much discussion. Stepwise selection methods do not necessarily identify the "best" model even from a statistical perspective (James and McCulloch 1981). On the other hand, all-possible-subsets procedures will inevitably lead to overfitting of the data, because the model thus chosen will be highly specific to the data at hand (Burnham and Anderson 1998). In fact, no method can produce the "true" biological model from correlative data, and some overfitting is perhaps inevitable. We chose the most influential variables by forward selection, with AIC as the selection criterion for each step. As inclusion of variables became more uncertain, we modified the procedures to more closely resemble an all-possible-subsets methodology. This allowed the comparison of many likely models, and embraced the idea of alternative models and model-selection uncertainty. Although we report the model thus chosen, the goal was not to produce a single final model, but to determine the strength of evidence for the inclusion of each variable (Burnham and Anderson 1998: 202).
Accuracy issues
We did not have independent data for testing the accuracy of our models, but we have analyzed the data in several different ways to get an indication of the robustness of the models in terms of the variables included. We performed a cross-validation procedure by splitting the full database in half. We sorted transects by latitude and longitude and selected every other transect for each subset of the data. We then built a logistic regression model for each subset, and compared the classification accuracy of each model when predicting the observed data for the other (test) subset relative to the training set. For a classification accuracy assessment that was independent of the cut-point threshold, we used receiver operating characteristic (ROC) plots (Swets 1988, Fielding and Bell 1997, Pearce and colleagues, this volume).
We also checked the consistency of results by building a model for each of the three years separately. Examining each year separately is not an independent validation, of course, because we sampled the same points, and in some cases the same individual birds returning the next year (or their philopatric offspring). However, the results from three consecutive years should be consistent if we expect the models to perform well on independent data. We used the same methods as above to build multiple logistic regression models for 1994, 1995, and 1996 separately, and compared the classification accuracy to that of the original 3-year model using ROC plots. We used the same data set for the vegetation variables, with averages across all three years, so that any year-to-year variability was due to the bird data only, whether from sampling error or from actual changes in occupation of sites.
Sample unit considerations
We tested the sensitivity of our results to pseudoreplication issues by redoing the analyses with one randomly selected point from each transect. We had the luxury of doing this because of the large sample size in our regional program. We selected two subsamples, each with one randomly selected point from each transect. The second subsample was selected without making the points from the first set available for selection (i.e. sampling without replacement). The 58 transects with only one available point were randomly divided between the two subsamples. This produced two subsamples of 263 points, each with only one point per transect, and with no points in common. Multiple logistic regression models were built for these data sets using the same methods as above, and the classification accuracy was compared to the original model using ROC plots.
Poisson regressionWhen we pooled the bird data from three years at each point, we obtained considerable variation in abundance for some bird species, which was lost when we converted the data to presence or absence for logistic regression (MacNally 1990a). A point where a Swainson's Thrush was detected in only one year (or was mistakenly identified) was given the same importance value as a point with many territorial thrushes singing every year. To better differentiate the relative use of the sites by this species, we reanalyzed the three-year data set using the summed abundances and Poisson regression (Jones and colleagues, this volume). Count data are more likely to follow a Poisson distribution than any other readily available distribution (but see White and Bennetts [1996] for a recommendation of the negative binomial distribution), and the method is fairly robust, requiring only that the variance in the data be proportional to the mean (McCullagh and Nelder 1989).
Regional-scale considerationsIn any study of habitat use, the set of "available" locations must be carefully chosen. If the species is not present in some areas for any reason other than the variables we have measured in the study, then it would be misleading to dilute the data with such absences, or "naughty naughts" (Austin and Meyers 1996), in potentially suitable habitat that is not occupied for other reasons (e.g. climate or landscape-scale factors, or current or historical dispersal barriers). If some measured vegetation variables also change across the same gradient, then it may look like those measured variables are controlling the distribution rather than the unmeasured factors that are truly limiting.
More than one scale is involved here. If data cover a large region, the geographic range of a species may not extend throughout the entire area. But even within a species' range there may be suitable habitat that is not occupied due to landscape-scale factors. In a study of local-scale factors, both of these problems might be addressed by using only those transects along which a particular bird species was detected. Those occupied transects are ones in which the range, landscape, and season are apparently appropriate for the bird species' presence. If landscape-scale factors are to be included in the models, then we would want to analyze all transects within the occupied range.
Choosing the best approach to this problem may be a subjective exercise. For example, we detected the Swainson's Thrush only rarely in south-central Montana (Fig. 2). However, because this area was well within the geographic range of the species (Montana Bird Distribution Committee 1996), we felt that it still would have been present in appropriate habitat. It therefore would be reasonable to use all occupied transects as our method to control for landscape for this species. However, because this abundant species was found on about 85% of the transects, this alternative procedure was not likely to produce different results. This method may thus be more useful for less common species. We decided to simply restrict the data based on geographic area. Because we were interested in the effect of the geographic distribution of larch on the importance of that variable in the habitat models, we restricted the data to the geographic range of the larch (all forests west of the Continental Divide except for the Bitterroot NF), which was also the area where Swainson's Thrushes were most common. We built a new logistic regression model using this subset of the data.
Another regional-scale consideration is a potential change in habitat use in different parts of a species' range. We did not pursue this question because, with this kind of correlative data, it would be very difficult to show that the inevitable differences between the models for two areas were due to actual biological differences in habitat selection, rather than different competitive environments or simply sampling error.
The final data set included a total of 1102 points on transects visited in each of three years. We considered the area around a point to be relatively homogeneous (no edges) if only one (559) or none (543) of the three observers thought otherwise.
Almost any data set involving multiple vegetation variables will include a number of intercorrelations among the predictor variables. The highest correlations among the predictor variables in our data were among canopy cover, canopy height, and number of large trees, especially the latter two measures of tree size (Table 1). The proportion of mesic conifer species (western redcedar, western hemlock, and grand fir) in the canopy was also highly correlated with those three variables, especially canopy cover. Sites with more ponderosa pine had the lowest average canopy cover. The greatest understory development was under canopies of larch or, secondarily, spruce/fir. The proportions of ponderosa pine and Douglas-fir were negatively related to understory. Because we had already combined the most important species associations (spruce/fir and cedar/hemlock/grand fir), most of the correlations among conifer species variables were negative. The largest correlation coefficient (r) was less than 0.5 (Table 1), so with our sample size it should be possible to tease apart the effects of all variables, at least to some extent.
Swainson's Thrushes were detected in at least one year at 555 of the 1102 points. Therefore, the categories of presence and absence were nearly equal for the main analysis of the three-year data set using logistic regression. The most important variables in this main model (logistic regression of three-year averages; Table 2) appeared to be understory cover consisting of both tall shrubs and conifer saplings, positive associations with larch and mesic tree species and, to a lesser degree, canopy cover.
The classification accuracy (at a cut-point of 0.5) of the main model was about 72%. When only the strongest variables were used (Canopy, Sapling, Shrub, Shrub2, Laoc, and Mesic), then the classification accuracy was still 71%.
When the data set was split in half for cross-validation, each half had an internal classification accuracy of about 73%. When each of the resulting models was used to predict presence in the other half of the data, the ROC plots (Fig. 3) indicated a classification accuracy for the this test set that was nearly as high as for the training set. In fact, the internal classification accuracy of the models for each half were similar to that for the main model (Fig. 4a).
When each year was analyzed separately, Swainson's Thrushes were detected on 353 of the 1102 points in 1994, 294 points in 1995, and 281 in 1996. There were some differences in the apparent importance of the vegetation variables in models for the three different years (Table 2), most notably for canopy cover and tree species composition. The internal classification accuracy for these models was not quite as good as that for the main model (Fig. 4b).
To determine the sensitivity of our results to the use of points as sample units, we also randomly selected two subsets of the data that consisted of single points from each of 263 transects. Swainson's Thrushes were detected on 128 and 134 of the 263 points in these two separate subsets. The models based on these two data sets were quite different from one another (Table 2), with more variables being included in the second model (including tree size). This second model was also the only model that did not include western larch as a tree species associate (although it would have if only positive tree associations were allowed). Although the internal classification accuracy for these models was better than that for the main model (Fig. 4c), validation of each model using the other subset as testing data gave relatively poor accuracy (Fig. 5).
The restriction of data to detections within a 50-m radius did not change the core variables of the model (Table 2). The data supported fewer minor variables, with a shift to understory variables rather than tree size. The internal classification accuracy for this model was not quite as good as that for the main (100 m) model (Fig. 4d).
The abundance of Swainson's Thrushes at the 555 points varied between 1 and 11 (sum of three visits; some high numbers brought the accuracy of the abundance data into question). More than one individual was detected at 348 points. Therefore, there was considerable variation in counts for use in a Poisson regression analysis. The resulting model was similar to that obtained by logistic regression (Table 2), although it was less likely to indicate nonlinear relationships and it did not include tree-size variables.
We detected the Swainson's Thrush on 493 of 749 points in the northwestern part of the region. The resulting logistic regression model was similar to that obtained for the full data set of 1102 points (Table 2), although without canopy Height, Sprfir, and the quadratic term for canopy cover.
The AIC method usually indicated additional variables beyond those included by traditional hypothesis testing with alpha = 0.05. Because of the many models we examined in this exploratory analysis, it is likely that there was some overfitting of the data when the best model was chosen according to AIC.
DISCUSSION
Two main goals of building regression models in habitat relationships are to identify biologically important variables, and to predict the occurrence of bird species at previously unsampled sites. The first goal, identifying environmental conditions that a bird species needs to be present and successful, is of obvious scientific interest. In addition, it is only through understanding the true biological processes involved in a species' distribution that we can hope to reach meaningful management recommendations. Determining the important variables can be difficult for a number of reasons. We know we have not measured many potentially important biological variables, such as food resource availability (Hutto 1990) or specific nest sites (Martin 1992). In addition, the biological importance of the measured variables cannot be directly confirmed from a correlative analysis. It is necessary to assume that the larger biological effects will show up as important in the statistical model, especially if several subsets of the data are examined, but there is no way to know how much of the observed effect is due to actual biological processes or to sampling error. This inherent model uncertainty should encourage us to emphasize the strength of evidence for each variable, rather than trying to decide which quantitative model is "correct." The observed evidence of relative importance must then be used to form hypotheses for further investigation.
In this study, the various regression models of Swainson's Thrush habitat relationships were fairly consistent in that they all included the same set of strongly influential variables (Table 2). The variables that appeared to be the weakest predictors of Swainson's Thrush occurrence in any one model were the same variables that were less consistently included in the other models. All of the variables chosen by AIC but not by hypothesis testing methods were in this category. In fact, a model with only the strongest and most consistent variables had nearly the same predictive ability as the full model. This suggests that the weaker variables were either biologically unimportant or inconsistently correlated with the true controlling variables. The increased resolution necessary to understand the possible effects of these variables would require a much larger sample size or a more intensive, controlled study.
This is a first attempt at getting a list of variables, more or less in order of statistical (but not necessarily biological) importance, within this data set. We can never be sure if the model reflects true biological relationships without confirming the results with independent or experimental data, but managers can still benefit from such a model because it helps focus future studies and provides a first approximation of the important controlling variables, which can aid in management decisions.
A first step for managers would be to look closely at the variables that were most consistently included in the models. Clearly, the understory is critical for the Swainson's Thrush. Both tall shrub and conifer sapling cover were included in every model, usually with the first or second strongest associations. Because shrub cover and sapling cover were only weakly correlated (r = + 0.06; Table 1), it seems that conifer saplings may provide an adequate substitute for this bird species as understory structure. Also, there appeared to be a clear threshold (asymptote) in the relationship of bird occurrence and understory cover (Fig. 4), indicating that 20-30% understory cover provided maximum benefit, as might be expected for a shrub-nesting species that forages more generally (Ehrlich and colleagues 1988). Because management practices tend to increase the amount of land with this level of understory cover, this species is not likely to be of management concern.
We do not know of any particular biological reason for larch to be important to the Swainson’s Thrush, and this demonstrates the ambiguity of exploratory analyses. Larch cover was correlated with shrub cover (r = 0.24), but both variables were strongly significant in most multivariate models, so it is difficult to know whether this was a true biological relationship or an artifact of confounding variables. The fact that larch is restricted to west of the Continental Divide was our main reason for limiting the bird data to this western region for one model. In this way we discovered that larch was still an important variable for the thrush within the tree's geographic range, so the apparent association between the bird and tree species was not an artifact of geography. Further study would be necessary to determine if the retention of larch in the landscape is as important for this species as it is for many cavity-nesting birds (McClelland 1977), but this may be a good example of a relationship that was not apparent using simple cover type distributions (Hutto and Young 1999).
The negative association of Swainson's Thrush occurrence with ponderosa pine could be due simply to a positive association with other tree species, or perhaps the thrush does not do well in that type of tree architecture. Alternatively, it may have more to do with ponderosa pine stands typically having low canopy cover or minimal understory. Although the multivariate analyses may have been able to tease these apart, some residual effect probably remained.
We did not have an independent data set with which to test our models. However, we used a number of internal validation and classification accuracy procedures to explore the robustness of our results. We assumed that the most consistently included variables were more likely to have some biological foundation. This is not only of scientific interest, but should also increase the usefulness of the results for predicting the presence of the Swainson's Thrush at new sites, and for estimating its probable response to management decisions.
The cross-validation procedure seemed to show that we have created relatively robust and useful models. Models based on each of the two halves of the data set not only had classification accuracies nearly as large as the full model (Fig. 4a), but the consistency of the models in predicting the other half of the data was encouraging. These results also suggest that doubling the sample size did not result in a greatly improved model.
The combination of three years of data appeared to improve the predictive power of the habitat model relative to the models based on single years. This is understandable given the sampling methods. A point count survey provides an incomplete sample of the birds in a given area. It is probably common for a species to be present but not detected. There is also true year-to-year variation in bird occupancy. It is important to design surveys to sample a representative cross-section of this variation (i.e. multiple years, places, etc.). In this respect, the Northern Region Landbird Monitoring Program may be unique. Samples were large in comparison with other controlled studies of habitat use, and data collection was repeated over several years. The results of these analyses suggest that improving the accuracy of data at each point may be more critical than increasing the total number of points. We are beginning to test these ideas by examining the accuracy of models for other bird species.
Vegetation variables are subject to measurement error and observer variability. There was considerable variation in the estimates by the three different observers at each point over the years. When the separate years were analyzed using the separate estimates of vegetation, rather than the 3-year average, there was a much greater difference between years than that shown in Table 2. This suggests that observer variability can be a serious problem, especially if vegetation is measured quickly by crews primarily trained to identify birds, or if only one year is available. We took the average of all three years for this reason, and it has also prompted us to subsequently collect more vegetation data at the points, using experienced forestry crews and additional plots.
A plot radius of 50 m is often recommended for comparison of bird abundance between different cover types because vegetation density can affect the detectability of individual birds. In addition, if both the bird and vegetation data were accurate, we would expect the 50-m-radius model to have greater classification accuracy because the bird data would be more tightly associated with the vegetation near the point. In this study, however, the 50-m-radius model was slightly less accurate then the main model using a 100-m radius. This suggests that a 50-m radius may have been insufficient to accurately represent occupancy in the stand. This is even more likely to be the case for less common species. We conclude that the 100-m-radius cutoff not only resulted in an adequate model, but it may be preferable in studies with only one or a few visits to a point, where we are most likely to have an incomplete inventory.
Most bird-habitat relationship models explain only a small proportion of the variance in bird presence or abundance (Maurer 1986, Morrison and colleagues 1987). This is due to a variety of factors that have often been mentioned (e. g. Rotenberry 1986, Wiens 1989), and most of these are probably exacerbated by the nature of large exploratory studies. It is important for us to both realize the limitations of the method and to design surveys and analyses to decrease the effects of these problems as much as possible.
In spite of the numerous reasons that regional-scale monitoring data might not be conducive to rigorous habitat analyses, we determined a suite of vegetation characteristics that were strongly correlated with the presence of Swainson's Thrush in forest stands (Table 2). We think it is very important that such data are used as fully as possible, as long as the results are not overinterpreted. Managers must be made aware of model uncertainties so that potential problems are not overlooked when final decisions are made (Conroy and Moore, this volume).
We may also wish to use these bird-habitat relationship models for the second main goal of building regression models -- predicting the likelihood of a particular bird species being present at new, unsurveyed sites. Such predictions would be more robust and less location-specific if the predictor variables were more relevant to biological processes (Austin and Meyers 1996), but this is not absolutely necessary for a useful model, as long as the new sites requiring prediction have the same correlational linkages between the measured surrogate variables and the true variables that influence bird occurrence. In any case, the expense of measuring predictor variables over wide regions may be prohibitive, unless remotely-sensed data can be used. There is little reason to develop models for region-wide prediction until we know what variables are likely to be available to managers over all target areas. We can then determine if models based on those variables are adequately robust for management needs.
Airola, D. A., and R. H. Barrett. 1985. "Foraging and habitat relationships of insect-gleaning birds in a Sierra Nevada mixed-conifer forest." Condor 87:205-16.
Akaike, H. 1974. "A new look at the statistical model identification." IEEE Transactions on Automatic Control AC 19:716-723.
Austin, M. P. 1976. "On non-linear species response models in ordination." Vegetatio 33:33-41.
Austin, M. P. 1980. "Searching for a model for vegetation analysis." Vegetatio 43:11-21.
Austin, M. P., and J. A. Meyers. 1996. "Current approaches to modelling the environmental niche of eucalypts: implications for management of forest biodiversity." Forest Ecology and Management 85:95-106.
Burnham, K. P., and D. R. Anderson. 1998. Model Selection and Inference: a practical information-theoretic approach. New York: Springer-Verlag.
Ehrlich, P. R., D. S. Dobkin, and D. Wheye. 1988. The Birder's Handbook: A field guide to the natural history of North American birds. New York: Simon & Schuster.
Fielding, A. H., and J. F. Bell. 1997. "A review of methods for the assessment of prediction errors in conservation presence/absence models." Environmental Conservation 24:38-49.
Heglund, P. J., J. R. Jones, L. H. Frederickson, and M. S. Kaiser. 1994. "Use of boreal forested wetlands by Pacific loons (Gavia pacifica Lawrence) and horned grebes (Podiceps auritus L.): relations with limnological characteristics." Hydrobiologia 279/280:171-83.
Hosmer, D. W., Jr., and S. Lemeshow. 1989. Applied Logistic Regression. New York: John Wiley and Sons.
Hutto, R. L. 1990. "Measuring the availability of food resources." Studies in Avian Biology 13:20-28.
Hutto, R.L., and J. R. Hoffland. 1996. "USDA Forest Service Northern Region Landbird Monitoring Project: Field Methods." In-house report.
Hutto, R. L., S. M. Pletschet,and P. Hendricks. 1986. "A fixed-radius point count method for nonbreeding and breeding season use." Auk 103:593-602.
Hutto, R. L., and J. S. Young. 1999. Habitat relationships of landbirds in the Northern Region, USDA Forest Service. USDA Forest Service General Technical Report RMRS-32. Ogden: Rocky Mountain Research Station.
James, F. C., and C. E. McCulloch. 1981. "Multivariate analysis in ecology and systematics: panacea or Pandora’s box?" Annual Review Ecology and Systematics 21:129-66.
Johnson, D. H. 1981. "The use and misuse of statistics in wildlife habitat studies." In The use of multivariate statistics in studies of wildlife habitat, edited by D. E. Capen. USDA Forest Service General Technical Report RM-87, 11-19. Fort Collins, Colo: Rocky Mountain Research Station.
Johnson, D. H. 1999. "The insignificance of statistical significance testing." Journal of Wildlife Management 63:763-772.
MacNally, R. 1990a. "An analysis of density responses of forest and woodland birds to composite physiognomic variables." Australian Journal of Ecology 15:267-275.
MacNally, R. 1990b. "The role of floristics and physiognomy in avian community composition." Australian Journal of Ecology 15:321-327.
Martin, T. E. 1992. "Breeding productivity considerations: what are the appropriate habitat features for management?" In Ecology and Conservation of Neotropical Migrant Land Birds, edited by J. M. Hagan and D.W. Johnston, 455-73. Washington: Smithsonian Inst. Press.
Maurer, B. A. 1986. "Predicting habitat quality for grassland birds using density-habitat correlations." Journal of Wildlife Management 50:556-66.
McClelland, B. R. 1977. "Relationships between hole-nesting birds, forest snags, and decay in western larch-Douglas-fir forests of the northern Rocky Mountains." PhD dissertation. Missoula, MT: University of Montana, 489 p.
McCullagh, P. and J. A. Nelder. 1989. Generalized linear models. Second edition. New York: Chapman and Hall.
McQuarrie, A. D. R., and C-L. Tsai. 1998. Regression and time series model selection. Singapore: World Scientific Publishing Company.
Meents, J. K., J. Rice, B. W. Anderson, and R.D. Ohmart. 1983. "Nonlinear relationships between birds and vegetation.' Ecology 64:1022-27.
Montana Bird Distribution Committee. 1996. P. D. Skaar's Montana Bird Distribution, Fifth edition. Special Publication No. 3. Helena, MT: Montana Heritage Program.
Morrison, M. L., I. C. Timossi, and K. A. With. 1987. "Development and testing of linear regression models predicting bird-habitat relationships." Journal of Wildlife Management 51:247-53.
Ralph, C. J., S. Droege, and J. R. Sauer. 1995. "Managing and monitoring birds using point counts: standards and applications." In Monitoring bird populations by point counts, edited by C. J. Ralph, J. R. Sauer, and S. Droege. USDA Forest Service General Technical Report PSW-149, pp. 161-168. Albany, CA: Pacific Southwest Research Station.
Recher, H. F., J. Majer, and H. A. Ford. 1991. "Temporal and spatial variation in the abundance of eucalypt canopy arthropods: the response of forest birds." Proceedings of the International Ornithological Congress 20:1568-1575.
Rotenberry, J. T. 1985. "The role of habitat in avian community composition: physiognomy or floristics?" Oecologia 67:213-17.
Rotenberry, J. T. 1986. "Habitat relationships of shrubsteppe birds: even 'good' models cannot predict the future." In Wildlife 2000: Modeling habitat relationships of terrestrial vertebrates, edited by J. Verner, M. L. Morrison, and C. J. Ralph, 217-21. Madison: University of Wisconsin Press.
Swets, J. A. 1988. "Measuring the accuracy of diagnostic systems." Science 240:1285-1293.
Ter Braak, C. J. F. and I. C. Prentice. 1988. "A theory of gradient analysis." Advances in Ecological Research 18:271-317.
White, G. C., and R. E. Bennetts. 1996. "Analysis of frequency count data using the negative binomial distribution." Ecology 77:2549-2557.
Whittaker, R. H. 1967. "Gradient analysis of vegetation." Biological Review 42:207-64.
Wiens, J. A. 1981. "Censusing and the evaluation of avian habitat occupancy." Studies in Avian Biology 6: 522-532.
Wiens, J. A. 1989. The ecology of bird communities, Vol. 1 - Foundations and patterns. New York: Cambridge University Press.
Wolf, A. T., R. W. Howe, and G. J. Davis. 1995. "Detectability of forest birds from stationary points in northern Wisconsin." In Monitoring bird populations by point counts, edited by C. J. Ralph, J. R. Sauer, and S. Droege. USDA Forest Service General Technical Report PSW-149, pp. 19-23. Albany, CA: Pacific Southwest Research Station.
Young, J. S. 1996. "Nonlinear bird-habitat relationships in managed forest of the Swan Valley, Montana." M.S. thesis. Missoula, MT: University of Montana.
TABLE 1. Non-parametric correlation coefficients (Kendall's tau-b) for bivariate comparisons of predictor variables.
Sapling |
Shrub |
Bush |
Ground |
Height |
Lgtree |
Pipo |
Psme |
Laoc |
Pico |
Sprfir |
Mesic |
|
Canopy |
+ 0.09 |
- 0.04 |
- 0.03 |
- 0.12 |
+ 0.35 |
+ 0.31 |
- 0.12 |
+ 0.02 |
- 0.01 |
- 0.04 |
- 0.06 |
+ 0.24 |
Sapling |
+ 0.06 |
+ 0.02 |
- 0.03 |
- 0.11 |
- 0.13 |
- 0.18 |
- 0.15 |
+ 0.15 |
+ 0.04 |
+ 0.10 |
+ 0.16 |
|
Shrub |
+ 0.47 |
+ 0.03 |
< 0.01 |
< 0.01 |
- 0.02 |
+ 0.03 |
+ 0.25 |
- 0.16 |
+ 0.08 |
+ 0.10 |
||
Bush |
+ 0.10 |
+ 0.01 |
- 0.02 |
< 0.01 |
< 0.01 |
+ 0.21 |
- 0.07 |
+ 0.09 |
+ 0.04 |
|||
Ground |
- 0.03 |
- 0.03 |
< 0.01 |
+ 0.03 |
+ 0.08 |
< 0.01 |
< 0.01 |
- 0.04 |
||||
Height |
+ 0.37 |
+ 0.04 |
+ 0.06 |
- 0.04 |
- 0.15 |
- 0.03 |
+ 0.23 |
|||||
Lgtree |
+ 0.10 |
+ 0.09 |
- 0.04 |
- 0.24 |
+ 0.02 |
+ 0.16 |
||||||
Pipo |
+ 0.09 |
- 0.13 |
- 0.24 |
- 0.22 |
- 0.15 |
|||||||
Psme |
- 0.05 |
- 0.24 |
- 0.31 |
- 0.23 |
||||||||
Laoc |
+ 0.03 |
+ 0.07 |
- 0.03 |
|||||||||
Pico |
+ 0.07 |
- 0.27 |
||||||||||
Sprfir |
- 0.12 |
TABLE 2. Order of selection for variables included in multiple regression models of the habitat relationships of the Swainson's Thrush, using AIC. Asterisks indicate variables included under traditional hypothesis-testing methods. The second row after some variables is for the quadratic term. All models used logistic regression except for the one labelled "Poisson". All models used the same vegetation data, with the variables averaged over three years. The first three models used accumulated data for abundance or presence of Swainson's Thrush over all three years. The models designated by dates used presence data from each of the three years separately. The "Tr A and B" models are based on two separate sets with one randomly chosen point per transect, and the last column is based on the subset of transects from west of the Continental Divide.
Variable |
Poisson N = 1102 |
Logit 100 m |
Logit 50 m |
1994 N = 1102 |
1995 N = 1102 |
1996 N = 1102 |
Tr A N = 263 |
Tr B N = 263 |
West N = 749 |
|
Canopy |
+ |
7* |
4* |
6* |
3* |
3* |
4* |
4* |
||
-/+ |
8* |
9* |
5* |
4* |
||||||
Sapling |
+ |
2* |
2* |
2* |
2* |
2* |
3* |
2* |
1* |
1* |
+/- |
5* |
2* |
||||||||
Shrub |
+ |
1* |
1* |
1* |
1* |
1* |
1* |
1* |
2* |
3* |
+/- |
6* |
3* |
4* |
6* |
4* |
7* |
||||
Bush |
+ |
7* |
7 |
11 |
7 |
5* |
||||
+/- |
8* |
8 |
12 |
8 |
||||||
Ground |
||||||||||
Height |
+ |
9 |
9 |
8* |
3* |
|||||
-/+ |
11 |
10 |
9* |
3* |
||||||
Lgtree |
- |
12* |
11 |
8 |
6* |
|||||
Psme |
- |
|||||||||
Laoc |
+ |
3* |
3* |
5* |
5* |
4* |
2* |
4* |
2* |
|
Pipo |
- |
10* |
10 |
6 |
7* |
|||||
Pico |
- |
6* |
||||||||
Sprfir |
+ |
13 |
7* |
|||||||
Mesic |
+ |
4* |
4* |
6* |
3* |
5* |
||||
Decid |
FIGURE 1. Map depicting the distribution of permanent landbird monitoring transects in northern Idaho and western Montana.
FIGURE 2. The geographic distribution of the Swainson's Thrush across all 1102 points used in the analyses. Closed circles indicate a presence of the Swainson's Thrush in any of the three years and open circles indicate absences in all years. Points within transects are nearly congruent in this depiction.
FIGURE 3. ROC plots showing the classification accuracy of the models based on each half the data set, in predicting the data for the same half (training data) and the other half (test data).
FIGURE 4. ROC plots comparing the internal classification accuracy (resubstitution) of different models to the main model (see text). a) accuracy of models for each half of the data compared with main model; b) accuracy of models for each year of data compared with main (3 yr) model; c) accuracy of models for each of two subsets with one point per transect, compared with main model; and d) accuracy of model based on 50-m radius compared with main (100 m) model.
FIGURE 5. ROC plots showing the classification accuracy of the models based on each of two subsets of data with one point per transect, in predicting the data for the same subset (training data) and the other subset (test data).
FIGURE 6. Occurrence of the Swainson's Thrush along an environmental gradient representing percent cover of tall understory vegetation (sum of tall shrub and conifer sapling variables). Absence = 0; Presence = 1; curve generated by LOWESS smoothing.