Making it easier to run image analysis in the cloud: announcing Distributed-CellProfiler

There’s nothing more exciting than getting back a big batch of data from your automated microscope – finally, you have the results of your screen, your timelapse, or whatever you’ve spent the last weeks or months preparing.  That excitement can turn to sadness quickly though when you realize that neither your laptop nor the old general-use computer in the lab are up to analyzing thousands (or tens of thousands, or hundreds of thousands!) of images.  But, congratulations! You’ve reached an elite level of CellProfiler users when you outgrow processing on a single local computer.

Hopefully, your institution has access to a large server or cluster and an IT department that can help you get CellProfiler installed on it and your images processing at top speed.  If sadly that’s not true for you, we’ve been working on a tool that may help: Distributed-CellProfiler.

Distributed-CellProfiler takes advantage of Amazon Web Services (AWS), which allows you to upload and store files, rent out computing power, and much more.  This means that once your images are uploaded to the cloud, you can run your analyses from anywhere and don’t need to buy or maintain any hardware on your own.  Full instructions on what you’ll need, how to get started, and how to use it are on our wiki, but we know you may have some questions:

  • Is this free?  AWS does have a free tier of resources, but if you’re working on this scale you’re likely going to have to pay some amount of storage and computing costs.  The good news is that you only pay for what you use and you can ‘bid’ how much you’re willing to pay for the computer time, so you should be able to find an option that works for your budget.  You’re also saving money you would have had to spend to buy a big new computer or pay into a local cluster, and this has no upkeep time, fees, or hassle to worry about!
  • Won’t everyone see my data if I put it in the cloud?  Not at all!  You can configure your privacy settings however you like.
  • I’m not good at computers.  Will I be able to do this?  We think so!  You will have to install some things and work a bit from the command line, but we provide step-by-step instructions and helpful hints to get you started.  If you were able to learn your microscope’s software and how to make your CellProfiler pipeline, after investing a small amount of time you can definitely learn to do this too.
  • I have an idea for a cool addition to Distributed-CellProfiler.  What can I do?  Like everything else we make, Distributed-CellProfiler is free and open-source, so we welcome input and code contributions from the whole community.  Feel free to file a feature request or make your own fork of the code to add it yourself.  The more input we have from you, the better the software will become!

What else would you like to know?  Are there other ways you’ve found to process big image sets?

Notable Replies

  1. This looks great! I’m following the instructions on the wiki to get setup, but am having trouble on step 1.1. Specifically, for the first item to generate security credentials, one first needs to create a user in the Identity and Access Management (IAM) console and as I’ve just signed up for an AWS account, there are no users listed. When I try to create the user it first asks for “Access type” for which I choose “Programmatic access” (as opposed to “AWS Management Console access”). Then on the next screen, “Permissions” it asks to create a group and shows a large list of policies. I can’t skip the step of choosing the group as it gives a warning:

    You haven’t given this user any permissions. This means that the user has no access to any AWS service or resource.
    Consider returning to the previous step and adding some type of permissions.

    What set of AWS group policies does CellProfiler require to work?

  2. Yup, it’s that first one. We’d love your input on the steps taken- we had set up AWS for the lab long before DCP was even conceived, so while we did our best to archaeologically figure out which were the important things needed to get DCP running, in practice it’s always a bit hard to trace.

  3. jmc says:

    Hi, one comment regarding getting DCP to work is that once aws is set up (good luck!) how to actually start a job is spread out over the pages of the wiki and wasn’t clear to me initially. The readme doc in the code actually has a good synopsis of what to do. see: Distributed-CellProfiler/

  4. jmc says:

    Another comment related to the question about what “policies” CP needs to work. I think what is happening is CP is using “Roles” to carry out tasks. The wiki mentions that two roles need to be created and they will be invoked in the spot fleet request.
    I’ve been struggling with getting DCP to work with s3 and added “AmazonS3FullAccess” along the way.

    Here’s a screenshot of my IAM Roles pages for the two in question:

    <img src="/uploads/cellprofiler/original/2X/e/e232b0d1b59380e6c80f66effd86a72221302bb9.png" width=“630” height=“500”

Continue the discussion

2 more replies