> Hi Folks! > > The reality is that millions of sets of data would tell us more then > thousands. Before I am once again accused of criticizing Jim and Marshall, or their efforts, or Field Notes, let me make this abundantly clear: I think it is absolutely great that they are doing what they are doing. That doesn't mean, however, that there isn't room for discussion and it also doesn't mean that we can't question some of the assumptions and statements that are made. Good science requires that we examine these assumptions, along with experimental methods and statistical analysis. If someone asserts that the data collected can be used in a scientific manner, then it is subject to the same analysis as all scientific work. So, as to the claim that millions of lines of data would tell us more than thousands - No, it wouldn't. For instance, how many people saw the crossbills and siskins that showed up in Carroll County last winter? Probably hundreds. What would hundreds of reports tell you that a few (enough to confirm the sightings and identification) tell you? Furthermore, are there really fewer robins this year than last, or just fewer observer hours? Or maybe the geographic covereage wasn't quite the same as last year. How do people report location? You need a great deal of uniformity in the way things are entered into each field. Because the data set becomes huge, many of the protocols required > by small numbers of observers using short periods of time [e.g. The Breeding > Bird Survey] need not be used. The shear mass will produce significant > results. > I believe that the massive number of observations, like all observation conducted without a question in mind, will produce interesting information that will in turn, produce interesting research questions. In other words, the observations are the raw material that feeds the hypothetico-deductive scientific process. Patterns are seen that lead to further investigations. Example: 800,000 people report that at the same time the roosters crow, the sun comes up. From those data, all I can conclude is that the two things happen at the same time. I can't conclude that the sun comes up because the roosters crow, nor can I conclude that the roosters crow because the sun comes up. Instead, having seen this interesting pattern in the observations, I can formulate the following questions: do the roosters crow because the sun comes up, does the sun come up because the roosters crow, or are they entirely independent events that just happen to occur at the same time of day? > Editing is also less problematic. I disagree. It isn't just misidentification that is a problem. It's all sorts of data entry problems. Someone enters the time of day as a.m. rather than p.m. or vice versa. Someone enters the wrong state code. Instead rarities if I enter the code for Arkansas when I intended to use the code for Alaska. Bingo. Instant rarities. Steller's Eider in Little Rock. Or bird code. Typos happen. Misidentification will become background > noise. There are always dubious reports, but by-and-large, these are > insignificant. In general, most birders act the same way: they stay in an area > and bird until returns diminish. This is a kind of "normalization." There are > many sorts of non-parametric statistical techniques that can be applied to the > data. It isn't just a question of which statistical techniques you can use. It's a question of what interpretation can be made of the results, regardless of which test you use. Virtually anything can be analyzed by some statistical test or another, but that doesn't mean that the results make any sense. Statistical techniques can separate out statistical error, but in the case of these data, the potential for observer/reporter error is so large that it may swamp out the results. Again, all I would use it to look for is to find patterns, and even then, I'd try to find other, correlative data before running off to investigate a pattern that I had found in these data. Real life example: many, many reportings had been made of Kirtland's Warbler in the Bahamas over many years. There had also been a number of specimens collected (OH NO! Not the dreaded collecting argument again!), and several years of research using playback. To determine habitat use of these birds, researchers looked at the sight records, specimen data, and playback results. There was real concern that the sight records were questionable, as apparently, some people confused wintering Kirtland's with another species in the islands. There is also no way to verify sightings records. So, the researchers regarded the sightings records as only confirming the same patterns of habitat use that became evident from the specimen data and playback data. I realize you can't prevent abuse of the data by others, but the danger is that someone will use these data to say things like "species A is recovering nicely" or "species B is declining precipitously" when in fact, the numbers resulted from the fact that certain areas weren't covered or covered as thoroughly in a certain years. Question: did you and Marshall actually enter all those data from photocopies yourselves? Good grief! How much info other than birder's identification and species seen was entered? Ellen Paul epaul@declink.com