Biology Cell by AI: Humans are skilled at comparing images and identifying patterns. They can organize a collection by color, ear size, face shape, etc. A quantitative comparison is also made, and the interesting thing about machines are that a machine can extract information from images whereas humans cannot.
Standford University’s Chan Zuckerberg Biohub scientists have now developed a machine learning algorithm to analyze quantitatively, compare and contrast images – in this instance, microscopy images of protein – without any prior knowledge. The algorithm, named “cytoself” is rich in detailed information about protein locations and functions within cells, as reported by Nature Methods. This capability could be used to speed-up drug discovery and screening by cell biologists.
“This is very thrilling – we’re using AI to solve a challenging problem that humans know,” Loic Royer, co-corresponding author, stated. “We could use this technology for different types of images in the future. It opens up so many possibilities.”
Cytoself not just demonstrates the power machine-learning algorithm, but it also has insights into cells, which are the basic building blocks to life, as well as into proteins, the molecular components of cells. Every cell has approximately 10,000 different types protein. “A cell has a much more spatially organized structure than we ever thought. Manuel Leonetti, a co-author of the study, stated that this is an important biological finding about the wiring of the human cell.
Like all tools created at CZ Biohub cytoself is free and open source. Leonetti said that he hopes it will inspire others to try similar algorithms to solve their image analysis issues.
Machines are Capable to Learn On Their Own
Self-supervised learning is what Cytoself represents. This means that humans do not teach the algorithm anything about the images of proteins. Hirofumi Kobayashi, the study’s lead author, said supervised learning requires you to teach each machine individually with examples. It is tedious and a lot of work. It can also introduce bias if humans teach it only the categories.
Kobayashi stated, “Manu [Leonetti] believed there was information already in the images.” “We wanted to see if the machine could find out what it did on its own.”
Keith Cheveralls, a software engineer of CZ Biohub, was also part of the team. They were surprised at the amount of information that the algorithm was able extract from the images.
Leonetti, a researcher in the development of tools and technologies to understand cell architecture, said that “the degree of detail involved protein localization was much higher than we would have thought.” “The machine converts every protein image into a mathematical vector. The machine then ranks images that appear similar. This allowed us to predict proteins that interact in the cell with high precision by simply comparing images. It was quite surprising.”
A Unique AI
Although there have been previous studies on protein images that used self-supervised and unsupervised models, self-supervised learning has never been applied so successfully to such a large set of images, which includes over 1,000,000 images, covering over 1,300 proteins, according to Kobayashi, an expert on machine learning, and high-speed imaging.
Leonetti led the OpenCell project at CZ Biohub to create images of the human cell. This included identifying the roughly 20,000 types of proteins in our cells. The first 1,310 proteins that they had characterized were published in Science. They included images of each protein (produced with a type of fluorescent tags) and maps of their interactions.
Cytoself, which provided quantitative and granular information about protein localization, was key to OpenCell’s success.
Royer stated, “The fundamental question of where a protein can locate in a cell – all possible places and all kinds of combinations – is fundamental.” Over the past decades, biologists have attempted to determine all possible locations and structures for proteins within cells. Humans have always looked at the data to do this.
How Much Are Human Biases and Limitations Affecting This Process?
Next, the team will track how small changes to protein localization can be used for different cell states. For example, to distinguish between a normal and cancerous cell. This could be the key to better understanding many diseases and facilitating drug discovery.
Kobayashi stated that screening drugs is essentially trial and error. “But with Cytoself, this is an enormous leap because you won’t need to perform experiments one-by-one with thousands of proteins. This low-cost method could help increase research speed.