Re: Reporting Sightings
Ellen Paul (epaul@dclink.com)
Thu, 21 Jan 1999 06:37:55 -0500
> Hi Folks!
> 
> The reality is that millions of sets of data would tell us more then
> thousands. 
Before I am once again accused of criticizing Jim and Marshall, or their
efforts, or Field Notes, let me make this abundantly clear:  I think it
is absolutely great that they are doing what they are doing.  That
doesn't mean, however, that there isn't room for discussion and it also
doesn't mean that we can't question some of the assumptions and
statements that are made.  Good science requires that we examine these
assumptions, along with experimental methods and statistical analysis. 
If someone asserts that the data collected can be used in a scientific
manner, then it is subject to the same analysis as all scientific work.
So, as to the claim that millions of lines of data would tell us more
than thousands - No, it wouldn't.  For instance, how many people saw the
crossbills and siskins that showed up in Carroll County last winter? 
Probably hundreds.  What would hundreds of reports tell you that a few
(enough to confirm the sightings and identification) tell you?
Furthermore, are there really fewer robins this year than last, or just
fewer observer hours?  Or maybe the geographic covereage wasn't quite
the same as last year.  How do people report location?  You need a great
deal of uniformity in the way things are entered into each field.
Because the data set becomes huge, many of the protocols required
> by small numbers of observers using short periods of time [e.g. The Breeding
> Bird Survey] need not be used. The shear mass will produce significant
> results.
> 
I believe that the massive number of observations, like all observation
conducted without a question in mind, will produce interesting
information that will in turn, produce interesting research questions. 
In other words, the observations are the raw material that feeds the
hypothetico-deductive scientific process.  Patterns are seen that lead
to further investigations.
Example:  800,000 people report that at the same time the roosters crow,
the sun comes up.  From those data, all I can conclude is that the two
things happen at the same time.  I can't conclude that the sun comes up
because the roosters crow, nor can I conclude that the roosters crow
because the sun comes up.  Instead, having seen this interesting pattern
in the observations, I can formulate the following questions:  do the
roosters crow because the sun comes up, does the sun come up because the
roosters crow, or are they entirely independent events that just happen
to occur at the same time of day?
> Editing is also less problematic.
I disagree.  It isn't just misidentification that is a problem.  It's
all sorts of data entry problems.  Someone enters the time of day as
a.m. rather than p.m. or vice versa. Someone enters the wrong state
code.  Instead rarities if I enter the code for Arkansas when I intended
to use the code for Alaska.  Bingo.  Instant rarities.  Steller's Eider
in Little Rock. Or bird code.  Typos happen.
 Misidentification will become background
> noise. There are always dubious reports, but by-and-large, these are
> insignificant. In general, most birders act the same way: they stay in an area
> and bird until returns diminish. This is a kind of "normalization." There are
> many sorts of non-parametric statistical techniques that can be applied to the
> data.
It isn't just a question of which statistical techniques you can use. 
It's a question of what interpretation can be made of the results,
regardless of which test you use.  Virtually anything can be analyzed by
some statistical test or another, but that doesn't mean that the results
make any sense.  Statistical techniques can separate out statistical
error, but in the case of these data, the potential for
observer/reporter error is so large that it may swamp out the results. 
Again, all I would use it to look for is to find patterns, and even
then, I'd try to find other, correlative data before running off to
investigate a pattern that I had found in these data.
Real life example:  many, many reportings had been made of Kirtland's
Warbler in the Bahamas over many years.  There had also been a number of
specimens collected (OH NO! Not the dreaded collecting argument again!),
and several years of research using playback.  To determine habitat use
of these birds, researchers looked at the sight records, specimen data,
and playback results.  There was real concern that the sight records
were questionable, as apparently, some people confused wintering
Kirtland's with another species in the islands.  There is also no way to
verify sightings records.  So, the researchers regarded the sightings
records as only confirming the same patterns of habitat use that became
evident from the specimen data and playback data.
 
I realize you can't prevent abuse of the data by others, but the danger
is that someone will use these data to say things like "species A is
recovering nicely" or "species B is declining precipitously" when in
fact, the numbers resulted from the fact that certain areas weren't
covered or covered as thoroughly in a certain years.
Question:  did you and Marshall actually enter all those data from
photocopies yourselves?  Good grief!  How much info other than birder's
identification and species seen was entered?  
Ellen Paul
epaul@declink.com