Identification Quality Experiment Update

Thanks to everyone who signed up for the iNaturalist Identification Quality Experiment.

We're still getting the kinks of the study out before kicking this study into full gear, but we do have a small sample of 1,156 expert IDs to start playing with. Comparing the taxon suggested by the expert with the taxon associated with the observation just before the expert's suggestion, three things can happen:
1) match (88%): the expert's suggested taxon matches or is more precise than the taxon previously associated with the observation (e.g. observation is ID'd as Taricha or Taricha torosa and expert suggests Taricha torosa).
2) maverick (8%): the expert's suggested taxon is sister (maverick) to the taxon previously associated with the observation (e.g. observation is ID'd as Taricha torosa and expert suggests Ensatina eschscholtzii).
3) too precise (4%): the expert's suggestion is less precise than the taxon previously associated with the observation (e.g. observation is ID'd as Taricha torosa and expert suggests Taricha).

Two issues have became clear in these early days:
1) Ambiguous subjects: Because expert IDs are made blind, context like the description that observers often use to describe the subject of their observation may be missing. That can lead to situations like this where the expert thought the subject was the Laurel Sumac rather than the California Brittlebush. We're considering functionality to make the subject of an observation more explicit, but for now, skip observations if the subject is ambiguous:

2) Ambiguous disagreement: We are assuming that an expert's ID is the most precise ID that ANYONE can make based on the evidence provided. Thus, if an expert ID's an observation to genus that was previously identified to species, we're interpreting this as though the previous identification was 'wrong' by being too precise. Here's an example:

But since the expert IDs are made blind, the experts can't see how precisely observations are already ID'd. In some case, the expert's intent with a coarser ID was not to explicitly disagree with the more precise ID. For example, here's an observation was ID'd to the subspecies level and the expert added an ID at the species level. This species level ID was not intended to explicitly disagree with the subspecific ID, but thats how the system is currently counting it (ie the community's ID was 'too precise').

We're working on functionality to make it possible to distinguish explicit disagreements (e.g. 'no one can ID this to species from the evidence provided') from the alternative (e.g. 'I can only ID this to genus, but someone else might be able to ID it to species'). But for now, if you think someone else might be able to provide a more precise identification than you could, skip the observation.

Thanks again for your help. If you skip observations as described above (those with ambiguous subjects or when others might be able to make a more precise ID than yourself), then please proceed with making IDs for the experiment. I'll check back in when we have improvements to deal with these situations.

I'll also check in with more updates on the analysis as we have more data. Please contact me if you have any questions, concerns.



@charlie @fabiocianferoni @vermfly @jennformatics @d_kluza @arachnojoe @cesarpollo5 @ryanandrews @aztekium @lexgarcia1 @harison @juancarlosgarciamorales1 @garyallport @echinoblog @jrwatkin68 @bryanpwallace @wolfgang_wuster @bobdrewes

Publicado el abril 5, 2017 11:19 TARDE por loarie loarie


awesome and thanks for involving me in this! It's been really interesting... and very revealing as to how much i do use context and comments and the user's suggested ID...

Publicado por charlie hace más de 7 años

Quite interesting! Thanks Scott.

Publicado por sambiology hace más de 7 años

I just had a case of ambiguous disagreement, from an ID I made prior to this study. In this case, I know that the person who photographed the specimen and originally made the ID is perfectly capable of making the ID from the actual specimen. I apparently added the ID unaware of WHO made the ID from the actual specimen.

I was only able to go on what I could see in the photo. The photo did not allow me to be as specific as his ID, but now that I know who made the original ID, I do not wish to contradict that ID. Rather than contradict it, I withdrew my own ID. (I can't myself confirm his ID from the photo.)

Publicado por arachnojoe hace más de 7 años

We now have a sample of 2552 Expert IDs and so far the proportions in the above piechart are unchanged. Some folks have noticed a bug in Windows 10 that we're working to fix

Publicado por loarie hace más de 7 años

Are there studies on error rate of conventional type 'professional' or academic sampling? I've always said that iNat research grade is as accurate as is a realistic assessment of 'professional' field work. The expertise is lower but at the end of a hot field day after trudging through a swamp, etc, the poor low-paid biotechs and interns get tired and frustrated and make mistakes too. And usually there are no photo vouchers. This seems to bear that out. Especially since the 'too precise' IDs aren't wrong per se and some of the 'maverick' IDs are going to turn out to be correct.

of course this is just the experts, do you have similar stats on non-expert IDs? Maybe you aren't able to release them yet and that's fine of course.... but I'm really curious.

Publicado por charlie hace más de 7 años

The stats in the pie-chart are accuracy of 'the crowd' assuming the experts are always correct. Which as you say is not always the case. If we have enough data we can also estimate statistics on how correct the experts are which would be interesting

Here's the only pseudo-relevant study I could find
they claim accuracy rates <60% and about the same for experts/non-experts. But kind of apples and oranges.

Publicado por loarie hace más de 7 años

oh, interesting.. I get it. I was looking at it backwards. Thanks!

Publicado por charlie hace más de 7 años

I was going to start looking again, when I read this in the sidebar: "If you think someone else could identify more specifically than you can, please abstain and just mark as reviewed."

That would prevent me from identifying any species. Someone who has ID'd extensively in their local area is going to know what species they have and what their morphs are. For example, I've been collecting Mecaphesa in Austin for maybe 6 years. I know what to expect here and when something is surprising. But when I see something similar elsewhere, where other species might be found, my confidence drops to zero, because I don't know the locality.

Contextual queues can also take something to species that a phone alone cannot. "This spider dropped out of a web that looked like such-and-such, and I've here photographed it on the ground."

Publicado por arachnojoe hace más de 7 años

"phone" => "photo"

Publicado por arachnojoe hace más de 7 años

In short, I'm not comfortable making IDs without all the relevant information. You've probably already missed much of the information I can offer this study, anyway, because I had already previously gone through and corrected the species that most people were inclined to name.

Publicado por arachnojoe hace más de 7 años

iNaturalist is great for groups whose field marks are well understood, such as butterflies, birds, herps, odonates. For other groups, IDs require expertise that is not available to the general public without extensive effort. The lack of information doesn't stop people from making IDs.

Publicado por arachnojoe hace más de 7 años

A clarification. Those I could take to species for a locality would get IDs, and I wouldn't ID anything else. Most spiders can only be taken to genus at this point. So I'd be examining at a lot of spiders and ignoring most of them. Not the best use of my time. Given all the information that might pertain, I'd be comfortable reporting genus, but not blind.

I'm thinking that the only thing we should be "blinded" from are other people's IDs. That way I would skip specimens posted by experts, because I wouldn't know if they took it farther than I could for their area.

Publicado por arachnojoe hace más de 7 años

one thing of note here @arachnojoe is that they are working on changing how the ID process works so you could add genus IDs without it registering as a disagreement. It seems like for spiders, it may be better for you to wait until that is implemented before participating in this. I really hope you stick around though! I'd be really excited to get any genus level ID of spiders, because I know very little about them and they are amazing. And of course i will return the favor with plants, but i can't offer much in Austin, you'd have to be in new England or California.

Publicado por charlie hace más de 7 años

Hi Charlie. Yes, that's exactly what I need.

I've been going through IDing Austin spiders for the city challenge, getting a better sense of things. It seems that the present experiment has the following requirement:

It must be possible to ID the specimen from photos to species in that locale, and the distinguishing information must exist in published form rather than just in the minds of local experts.

A prerequisite to IDing specimens to species is knowing what all the species of the genus look like, at least for those known to be in or near the locale. This just is not the case for most North American crab spiders. There are a few though, and prior to this experiment I systematically went through confirming other people's IDs on many of these. So unfortunately, I can't be helpful to this experiment.

Publicado por arachnojoe hace más de 7 años

Ok, I love the idea of this study, but am sorry to say that as an "expert" I made numerous errors (so you may need to throw out my data : ). Some errors were from accidentally selecting the wrong organism in the drop down, some due to my iPad cropping the right sides out of wide photos, and some because I was trying to stretch myself a little far without having read the part about skipping those that could be IDd further by others. Although most can be blamed on carelessness of some fashion, I think it is worth pointing out that all "experts" should be calibrated against known information, regardless of credentials.

Charlie eluded to this already on one of his posts, but it is so true. We professionals are prone to errors, sloppiness, laziness and being over zealous. Some "experts" may be careless while others may be so careful they are afraid to make any ID from a photograph. Some "experts" may not even really be an expert at all, while another without any so called expertise may be a taxonomic genius (I see this over and over on iNat). You may be the foremost in the world in your field, but just be prone to technologic errors. Every professional should both know this and be humble enough to submit to testing for this type of study.

My agency uses volunteers to do a lot of Citizen Science projects, but we calibrate both ourselves and volunteers so we have good comparable data. Again, I love the idea of this study and hope it is successful, which is why I would strongly recommend that you consider calibrating every one of your experts against a subset of known observations.

Publicado por rcurtis hace alrededor de 7 años
Publicado por loarie hace alrededor de 7 años

@loarie I just came across this post (not involved in the project) and something I've talked about locally is that as iNat grows, the number of IDs for an individual observation also grows (hopefully), and ultimately the accuracy of the 'crowd'. Right now my observations get an average of 1-2 IDs, but if there were 4 or 5 it would theoretically improve ID accuracy, as well as those obsv that make it to Research Grade. I'm sure you've considered this, but I just thought I'd mention it. This is an interesting project and I'm glad to see it being studied!

Publicado por kimberlietx hace alrededor de 7 años

@kimerlietx that will only happen if the number of IDs available grows faster than the number of observations. I could be wrong but I don't see that trend right now.

Publicado por charlie hace alrededor de 7 años

For what it's worth, i wanted to say that my erorr rate doing the blind test is a LOT higher than when I do normal IDs. Not having any of the comments or having the observer tell you what they thought the saw is brutal on my IDs. If i sit down and do 50 IDs with this, usually i have two or three that are wrong and I get called out on. (also annoy people without having comments along with it). Whereas if i do 50 normal IDs that doesn't happen. I don't think it is because I am agreeing with things I shouldn't because of suggestion, I think it's because it's wicked hard to do IDs without context. I've dialed my certainty way up before adding blind IDs and still feel like i'm struggling. (all of my IDs here are plants).

honestly without some sort of tag to go along with my IDs or something I think i am going to stop doing this. I feel like i am annoying the inat community and harming my 'reputation' by adding wrong IDs without explanation or offering dissenting IDs without being able to say why.

Publicado por charlie hace alrededor de 7 años

Agregar un comentario

Acceder o Crear una cuenta para agregar comentarios.