When To Say ‘Good Enough’

One of the most common questions I’m asked when helping a collaborator with an image analysis project is:

“How do I know when my analysis workflow is doing well enough at finding the objects or measuring the things I care about?”

Unfortunately, it’s also one of the hardest questions to answer!  In an ideal world, we’d be able to achieve perfect recognition and/or segmentation of our biological objects every time, and get out perfect data! Alas, biology is almost never so accommodating, even ignoring the effects of technical artifacts.

Ultimately, “what is the universal truth” and “how close to universal truth must we be for something to still be called true” are philosophical questions.  In analyzing finite samples of data, we are attempting to create a model of what we think is going on in the real world — however, as the statistician George Box said, “all models are wrong”. 

Good enough to be useful?

But Box also said later on: “all models are wrong, but some models are useful”. How right do our image analysis workflows have to be in order to be useful? As much as I wish this question had a simple answer, it’s pretty much always case dependent.  As scientists, we always want to report answers as accurately as we can; this drive, though, can sometimes make it hard to sense when we’re approaching a point of diminishing returns (there are more than a few hours of my life that might have been better spent watching a movie or having coffee with friends than improving a pipeline’s accuracy from 93% to 96%), or when we’re trying to analyze images that are ultimately so unsuitable we’d spend less time just retaking them.

If you, like me, find it hard to know when to set the keyboard down and walk away, here’s a rule of thumb: for every change you consider making to your analysis workflow (which I’ll refer to here as a “pipeline”, though it can be any way you process images) you should consciously weigh the following factors:

  1. How wrong is my current pipeline output?  
  2. How close to 100% accurate is it possible to get?
  3. How close to accurate do I need to be to assess the % or fold change I expect to see in this experiment?
  4. How important is this accuracy of this segmentation to my overall hypothesis?
  5. What else could I do in the time it will take me to make my pipeline maximally accurate?

Finding “good enough”

Here I’ll talk about two major aspects of image analysis (thresholding and object segmentation), discuss common pitfalls and how we try to get the “least wrong” answer, and discuss how we typically weigh the above factors in our lab.  While I’ll largely discuss these things in the context of CellProfiler, it’s worth noting these principles apply to all (classical) image analysis, with every software! While my examples below are of nuclei, these principles are generally applicable — from an organelle to an organism.

There are two major steps to segment objects. First, you determine the threshold of “signal” that distinguishes foreground from background — “signal” often refers to the amount of a fluorescent dye present, but a probability map from ilastik or FIJI’s WEKA plugin can also be a great input! 

In the example below, DAPI intensity has been thresholded at an algorithmically determined value (center), 0.5X that value (left), or 2X that value (right) — too low a value includes too much background, too high a value excludes parts of nuclei, so ideally we want to hit a “Goldilocks” value somewhere in the center.

fruit fly nuclei thresholded at intensity values of 0.03, 0.06, or 0.12.
fruit fly nuclei thresholded at intensity values of 0.03, 0.06, or 0.12.

Second, you need to determine how you will break the areas that have been called “background” and “foreground” into discrete objects — in our terminology, we often refer to this as “declumping”.  When what should have been called one object is broken into two or more objects, this is often called “oversegmentation” or “splitting”; when what should have been two or more objects is called only one objects, this is often referred to as “undersegmentation” or “merging”. 

Recent work from our group suggest that neural networks may be less prone to these sorts of errors than classical methods, but neural networks still do make both kinds of segmentation errors. 

adapted from Evaluation of Deep Learning Strategies for Nucleus Segmentation in Fluorescence Images, Cacicedo et al Cytometry A 2019. https://doi.org/10.1002/cyto.a.23863
adapted from Evaluation of Deep Learning Strategies for Nucleus Segmentation in Fluorescence Images, Cacicedo et al Cytometry A 2019. https://doi.org/10.1002/cyto.a.23863

Thresholding and declumping parameters are easy to determine for any given object, but can be hard to set globally for a whole image and especially hard for a whole experiment.  Let’s consider our 5 factors, in the context of segmentation:

1. How wrong is my current pipeline output?

If you have manually annotated ground truth, you can answer this quantitatively (in CellProfiler, this is the MeasureObjectOverlap module). While it’s usually not that hard to make manually annotated ground truth, it can take a VERY long time, so most people don’t bother for most experiments.  You can hand label a small test set, as an intermediate measure, but in most cases when we’re prototyping we assess this qualitatively.

In order to do this qualitative assessment, in our lab we typically try to look at the following:

  • Do I generally agree with most of the object segmentations from my analysis workflow? If not, the rest of the questions below likely don’t matter too much.
  • Overall across my experiment, do I have an approximately equal number of regions/images where the threshold chosen by the algorithm for this image  is a bit too low vs where the threshold chosen by the algorithm is a bit too high?
  • Overall across my experiment, do I have an approximately equal number of oversegmentations/splits and undersegmentations/merges?
  • Very important: Do both the second and third bullet points hold true for both my negative control images and my positive control (or most extreme expected phenotype(s) sample) images?

2. How close to 100% accurate is it possible to get?

To some degree, this depends on knowing your objects and/or your field a bit — if this is a thing that’s been studied by microscopy a lot, there are hopefully pretty good pipelines, if not there may not be.  If your objects are pretty “standardized” in their appearance, you’re more likely to have a higher possible accuracy than if they’re really variable. Good images are also critical here — garbage in, garbage out.

3. How close to accurate do I need to be to assess the % or fold change I expect to see in this experiment?

Do you expect the phenotypes you care about to be 20% different from negative control? 2000% different? How much variability do you expect, and how many samples will you have? Ultimately, this question can be answered with a power analysis and a few reasonable guesses.

4. How important is this aspect of my experiment to my overall hypothesis?

This is hard to put a number on, but qualitatively:

  • If you’re trying to test whether overexpressing GeneA causes cells to stop dividing, cell size (and therefore accurate cell borders) is probably really important!
  • If you’re trying to tell if overexpressing GeneA causes GFP-GeneB to be overexpressed (and GFP-GeneB is diffuse in the cytoplasm), a rough cell outline is probably sufficient since you really care more about the mean intensity of GFP-GeneB.  
  • If you’re trying to test if Drug123 causes mCherry-GeneC to translocate into the nucleus, the exact outlines of the nucleus are very important!
  • If you’re trying to test if Drug123 causes mCherry-GeneC to translocate into the mitochondria, the exact outlines of your nucleus are probably not that critical (but in that case, mitochondrial segmentation will be pretty important!). 

5. What else could I do in the time it will take me to make my pipeline maximally accurate?

You may want to spend a lot of time optimizing your pipeline if any or all of the following conditions are met:

  • You have a small number of samples
  • You have a pipeline that’s currently really wrong 
  • Your pipeline is really wrong in a way that might really obscure the features most important to testing your overall hypothesis (because it treats your negative and positive controls quite differently, because the most important structures aren’t identified accurately, etc)

In the case that all three are true, it might even be worth annotating some data by hand so that you can quantitatively track the ability of your pipeline to measure your most important object segmentations. 

Try to set yourself benchmarks ahead of time though — it might be worth spending 6 more hours or even 6 more days working on this, but will it really be worth 6 more weeks?

If your ultimate goal isn’t super dependent on precise segmentation, and your pipeline works pretty well on most cases (and equally well across your most phenotypically different cases), stop working on this, and go do other cool science (or non-science things)! 

Announcing CellProfiler 3.1.9

Hello all! It’s been a crazy last few months for the CellProfiler team, as we’ve been hiring some new members to the team and working hard on the transition to Python 3, which will bump us into CellProfiler version 4. Keep your eyes peeled for exciting content in the future!

We did want to bring you one last release in the 3.X series, though, especially for OSX 10.14 users who have been left without a useable build. This release also has some minor bugfixes in RelateObjects and IdentifySecondaryObjects.

As usual, you will find this new release (and links to all our old releases) on our releases page.

Thanks very much to Allen Goodman, Matthew Bowden, Anne Carpenter, Jan Eglinger, and GitHub user “cloudsforest” for their contributions on this release.

Announcing CellProfiler 3.1.8

Happy holidays to everyone- we here at the CellProfiler team got you a little end-of-year treat in the form of CellProfiler 3.1.8.  This is primarily a bugfix release, getting rid of some bugs in MeasureObjectIntensity, MeasureColocalization, ExportToSpreadsheet, CorrectIlluminationCalculate, and Smooth.  We’ve also updated how we package for Windows, so those of you who had JAVA_HOME issues with 3.1.5 (feel free to now unset that in your environment variables if it’s set to your 3.1.5 install!) should now experience much smoother sailing.  As usual, you will find this new release (and links to all our old releases) on our releases page.

Thanks very much to Allen Goodman, Matthew Bowden, Vito Zanotelli, and Christian Clauss for their contributions on this release.

On behalf of the whole CellProfiler team, may the season treat you well, and we wish you a happy end of 2018 and beginning of 2019!

Announcing CellProfiler 3.1

I’m excited to announce the release of CellProfiler 3.1.

Our focus for CellProfiler 3.1 was polishing features and squashing bugs introduced in CellProfiler 3.0. We also started laying down the foundation for our next release, CellProfiler 4.0, that will transition CellProfiler from Python 2 to Python 3, improve multiprocessing, and overhaul the interface.

There’re a few noteworthy changes that some users might enjoy like UTF-8 pipeline encoding, a simpler application bundle (that won’t require installing Java), and a variety of documentation improvements.

You can download CellProfiler 3.1 from the cellprofiler.org website. If you have feedback or questions, please let us know on the CellProfiler Forum message one of us on Twitter.

Of course this would not have been possible without the hard work of our software engineers and all our contributors- Allen Goodman, Claire McQuin, Matthew Bowden, Vasiliy Chernyshev, Kyle Karhohs, Jane Hung, Chris Allan, Vito Zanotelli, Carla Iriberri, and Christoph Moehl, take a bow!

The CellProfiler 2 User’s Guide to CellProfiler 3.0, Part II: Converting your pipelines

For those of you who’ve been with us for a long time though, the obvious next question after how to use the new test mode is will my old CellProfiler pipelines work in the new version? We feel the same way – the pipelines you’ve accumulated over the years are precious resources!  The good and bad news is that the answer is Yes, mostly. In order to facilitate the speedup and continue the process of streamlining the code, a few things had to go; we also removed some things we felt were causing “option fatigue” for the sake of user friendliness going forward.

Continue reading

Help! Why do my output images seem all black?

Double clicking on the output images produced by CellProfiler sometimes opens up a screen in your operating system’s default image viewer that looks all black. This can make it seem like your pipeline didn’t work or didn’t produce the right output. However, this can happen for a couple of reasons:

(a) If you’re exporting objects and have only a few objects in your image
(b) If you’re exporting 16-bit images

Continue reading

Help! Why does CellProfiler say it can’t find any valid image sets?

Defining the input to CellProfiler can be the hardest part of getting your pipeline set up and your analysis underway.  Incoming images are configured in the first 4 modules of CellProfiler – Images, Metadata, NamesAndTypes, and Groups – which offer lots of flexibility. But it’s sometimes confusing what each one does, and it’s not always obvious which ones you need for your experiment. Continue reading

Making it easier to run image analysis in the cloud: announcing Distributed-CellProfiler

There’s nothing more exciting than getting back a big batch of data from your automated microscope – finally, you have the results of your screen, your timelapse, or whatever you’ve spent the last weeks or months preparing.  That excitement can turn to sadness quickly though when you realize that neither your laptop nor the old general-use computer in the lab are up to analyzing thousands (or tens of thousands, or hundreds of thousands!) of images.  But, congratulations! You’ve reached an elite level of CellProfiler users when you outgrow processing on a single local computer. Continue reading