A CellProfiler Approach to Analyzing Tissue Data

Imaging tissue slices provides a wealth of data about the spatial composition and number of the various cell types that make up a tissue. Interactions among cells within a tissue are crucial to understanding the role of the inflammation that is triggered by the invasion of cancerous cells. The strength of the inflammatory response has been linked to the prognosis of certain cancers such as lymphoma.

Quantifying the spatial relationship among cells in the crowded environment of a tissue requires reliable segmentation of several cell types. In lymph node sections, cells have representatives from the immune system, epithelial tissue, connective tissue, and cancer. Quantifying the cell locations provides the ability to gauge the degree to which the cancer has invaded a tissue and how the immune system is interacting with the leading edge of a tumor.

The ability to precisely measure this relationship will give a deeper understanding of the progression of cancer and might yield new insight into when and how the immune system is involved. Ultimately the aim is to define various configurations of this interaction that are predictive of patient outcome or the likelihood of success for a given treatment, such as immunotherapy.

CellProfiler and Tissue Data

In collaboration with the Margaret Shipp and Scott Rodig labs, we developed a pipeline in CellProfiler that addresses unique challenges presented by imaging tissue slices. Consider the image of a representative tissue slice (below), which reveals a field of view with a high cell density. The nuclei of all cells are stained (blue). Two cell types have been stained that are of particular interest in Hodgkin’s Lymphoma: Reed-Sternberg cells (aka RS in green), and tumor-associated macrophages (aka TAM in red).

CellProfiler tissue analysis: Tissue slice with nuclei stained blue, Reed-Sternberg cells (RS in green), tumor-associated macrophages (TAM in red)

The greatest challenge in quantifying the spatial relationships among these cells, and the others surrounding them, is the identification of individual cell boundaries – a process known as segmentation. The density of the cells makes segmentation complicated as there is extensive overlapping between cell types, which is much greater than that seen even in dense monolayers of cultured cells. This overlapping stems from the fact that tissue slices reveal a plane from a 3D volume of cells from an excised portion of tissue. There are many cells that are not centered in this plane, i.e. their nuclei are not entirely captured within the slice. This increases the variety of nucleus size and intensity as some nuclei are only partially captured. Cytoplasmic regions of cells whose nuclei were not captured in the tissue slice can reside above or below fully captured nuclei; this increases the chance of mis-classifying cells. The positioning artifacts of nuclei described above are a complication to the analysis of a tissue image because most image analysis pipelines rely upon clear nucleus signal for seeding the segmentation of cytoplasmic regions.

In addition to the variety created by the mechanics of acquiring a tissue slice, tissue also contains more natural variety than cell lines. For example, RS cells are physically much larger than any of the neighboring cells. This size difference is a defining characteristic of this type of cell. Furthermore, other cells within a tissue have their own unique characteristics that add to the heterogeneity of size and shape. The two sources of variety mentioned thus far both complicate segmentation and quantification of the cells in a tissue slice.

We’ve developed a pipeline that addresses the challenges outlined above that are specific to tissue slices. The key innovation, as compared to pipelines that work well for monolayer cells,  is prioritizing cell types based upon the quality of the marker and their size, and identifying them sequentially. Below is an overview of the method and pipeline:

1. Identify the nuclei of all cells:

The nuclei can be identified from a nucleus stain such as DAPI. The segmentation of the nuclei will create a pool of “seeds”, or starting points, for the segmentation and classification of the various cell types within a tissue. Many of the nuclei will overlap, because the sectioning of a tissue captures cells through a volume. When a volume is projected into a 2D image, cells that are separated in Z will overlap. This can challenge segmentation of the nuclei. To improve results, the DAPI image is enhanced with a filter that strengthens the signal of round objects of a typical diameter using the EnhanceOrSuppressFeatures module.

CellProfiler tissue analysis: DAPI image enhanced with a filter EnhanceOrSuppressFeatures module.

2. Classify cells in the order from most certain to least certain:

Prior knowledge of the tissue and cells within is imparted to the pipeline through the ordering of the segmentation steps. Stains with good signal and markers that strongly highlight a particular cell type are segmented with higher certainty in comparison to stains that are noisy or less-specific. In addition to the quality of staining, features and aspects unique to a cell type will also increase the certainty of segmentation when segmented objects of low certainty are removed using the FilterObjects module. In combination with the ordering of modules, the cells with the highest confidence are segmented before cells with lower confidence, and the segmentations of cells with higher confidence helps guide the segmentation of cells with lower confidence. In this example, we identify the HRS cells first because they are the largest and the stain for HRS cells also gives the strongest signal.

CellProfiler tissue analysis: HRS cells identified

HRS cells’ nuclei are often fragmented. Using the nuclei as seeds leads to a fragmented segmentation of any given HRS cell. This is to be expected, so these fragmented regions are then “glued” together. Any two fragments of an HRS cell that touch are assumed to come from the same cell. This strategy works well with this cell type, because the spacing between HRS cells is generally large.

CellProfiler tissue analysis: HRS fragmented regions "glued" together

The same process is then used for TAM cells, which are also larger than average and have a strong staining signal.

CellProfiler tissue analysis: TAM cells "glued" together

3. Additive Masking:

The remaining cells that are not already accounted for by the regions covered by the larger HRS and TAM cell types are then classified based upon the strength of their staining. To prevent double-counting the same cell as two different cell types (when appropriate, that is), a mask is created step by step that prevents the next cell types on the list from being identified in space  already occupied by previously identified, more confident, cell types. First, candidate cells for a particular cell type are found by expanding the region defined as the nuclei to capture regions that include staining for the respective cell type.

CellProfiler tissue analysis: candidate cells found by expanding the region defined as the nuclei to capture regions that include staining for the respective cell type.

Then a mask, that is the sum of the areas occupied by upstream cell types, is applied to the candidate cells to remove those that have already been classified. What is left is then the cells that have gone unclaimed.

CellProfiling tissue analysis: mask applied to the candidate cells

The mask grows after each round of segmentation, incorporating the cell types found before. After the final cell type has been segmented and classified, the remaining cells are segmented and classified as “unknown”.

Finally, the (x,y) location of each cell is exported to a spreadsheet. This table of locations can be analyzed to describe the spatial relationship among cells using downstream software applications such as R or Matlab.

Announcing CellProfiler 3.1.9

Hello all! It’s been a crazy last few months for the CellProfiler team, as we’ve been hiring some new members to the team and working hard on the transition to Python 3, which will bump us into CellProfiler version 4. Keep your eyes peeled for exciting content in the future!

We did want to bring you one last release in the 3.X series, though, especially for OSX 10.14 users who have been left without a useable build. This release also has some minor bugfixes in RelateObjects and IdentifySecondaryObjects.

As usual, you will find this new release (and links to all our old releases) on our releases page.

Thanks very much to Allen Goodman, Matthew Bowden, Anne Carpenter, Jan Eglinger, and GitHub user “cloudsforest” for their contributions on this release.

Browser-based Apps for Data Visualization

Have you ever stumbled across some amazing data visualization tools that run entirely on a web browser (such as this and many others), and wished you could plug in your own data and visualize it? Or, as a biologist, you may know of a good analytic tool, but it either costs too much, requires programming expertise, or requires bundled installations of many other dependencies that might not be compatible with your system… And then you spend more time fixing the tool, than using it. In this blog post, we share how to use browser-based applications and perform tasks in multivariate data analysis and image processing to visualize data (like the one below!), with much less hassle.

Continue reading

Announcing CellProfiler 3.1.8

Happy holidays to everyone- we here at the CellProfiler team got you a little end-of-year treat in the form of CellProfiler 3.1.8.  This is primarily a bugfix release, getting rid of some bugs in MeasureObjectIntensity, MeasureColocalization, ExportToSpreadsheet, CorrectIlluminationCalculate, and Smooth.  We’ve also updated how we package for Windows, so those of you who had JAVA_HOME issues with 3.1.5 (feel free to now unset that in your environment variables if it’s set to your 3.1.5 install!) should now experience much smoother sailing.  As usual, you will find this new release (and links to all our old releases) on our releases page.

Thanks very much to Allen Goodman, Matthew Bowden, Vito Zanotelli, and Christian Clauss for their contributions on this release.

On behalf of the whole CellProfiler team, may the season treat you well, and we wish you a happy end of 2018 and beginning of 2019!

ScienceSnippets: Building communication skills and sharing what you love

Clearly communicating the impact of your research is one of the most important skills you need to develop as a scientist, and yet typically it is only taught by doing (and if you are lucky, feedback – especially critical feedback). Clear communication is important to get funding and resources for your work, to publish it, to entice collaborators, to impress colleagues and supervisors, … and to not be boring at parties when asked “So what do you do?”

Continue reading

Tricks for maintaining your CV/resume with Google Docs: easy to edit, immediately published

You’ve earned degrees, authored papers, mentored supervisees, and traveled far and wide to speak about your work… And ideally it’s nicely showcased in your resume or curriculum vitae (CV), all updated and ready to go. But, if you’re like most academics, your CV is a sorely outdated PDF and upon its request, you always find yourself scrambling to dig up recent accomplishments to prove you’ve not just been lounging around for the last 6 months (or years). And updating it requires locating an elusive latest version of a Word doc, editing HTML on your lab website, or compiling and PDF-ifying your LaTeX file. Continue reading

Announcing CellProfiler 3.1

I’m excited to announce the release of CellProfiler 3.1.

Our focus for CellProfiler 3.1 was polishing features and squashing bugs introduced in CellProfiler 3.0. We also started laying down the foundation for our next release, CellProfiler 4.0, that will transition CellProfiler from Python 2 to Python 3, improve multiprocessing, and overhaul the interface.

There’re a few noteworthy changes that some users might enjoy like UTF-8 pipeline encoding, a simpler application bundle (that won’t require installing Java), and a variety of documentation improvements.

You can download CellProfiler 3.1 from the cellprofiler.org website. If you have feedback or questions, please let us know on the CellProfiler Forum message one of us on Twitter.

Of course this would not have been possible without the hard work of our software engineers and all our contributors- Allen Goodman, Claire McQuin, Matthew Bowden, Vasiliy Chernyshev, Kyle Karhohs, Jane Hung, Chris Allan, Vito Zanotelli, Carla Iriberri, and Christoph Moehl, take a bow!

Annotating Images with CellProfiler and GIMP

Annotated image data is valuable for assessing the performance of an image processing pipeline and as training data for machine learning methods such as deep learning. When assessing the performance of a CellProfiler pipeline, for example a pipeline that segments nuclei, the annotated image data are used as the ground truth. The performance of the pipeline can be quantified by comparing the segmentation output to the ground truth and calculating a comparison metric, such as the Jaccard Index or F1 Score. Annotated images are also essential for deep learning applications as training data, for example see the 2018 Data Science Bowl; an in-depth discussion on how the Data Science Bowl images were annotated can be found on the Kaggle forum. Continue reading

The CellProfiler 2 User’s Guide to CellProfiler 3.0, Part II: Converting your pipelines

For those of you who’ve been with us for a long time though, the obvious next question after how to use the new test mode is will my old CellProfiler pipelines work in the new version? We feel the same way – the pipelines you’ve accumulated over the years are precious resources!  The good and bad news is that the answer is Yes, mostly. In order to facilitate the speedup and continue the process of streamlining the code, a few things had to go; we also removed some things we felt were causing “option fatigue” for the sake of user friendliness going forward.

Continue reading