A deep learning algorithm outperforms some board-certified dermatologists in diagnosis of skin cancer

Deep learning has been touted for its potential to enhance the diagnosis of diseases, and now a team of researchers at Stanford has developed a deep-learning algorithm that may make this vision a reality for skin cancer.*

A dermatologist uses a dermatoscope, a type of handheld microscope, to look at skin. Stanford AI scientists have created a deep convolutional neural network algorithm for skin cancer that matched the performance of board-certified dermatologists. (credit: Matt Young)

The researchers, led by Dr. Sebastian Thrun, an adjunct professor at the Stanford Artificial Intelligence Laboratory, reported in the January 25 issue of Nature that their deep convolutional neural network (CNN) algorithm performed as well or better than 21 board-certified dermatologists at diagnosing skin cancer. (See “Skin cancer classification performance of the CNN (blue) and dermatologists (red)” figure below.)

Diagnosing skin cancer begins with a visual examination. A dermatologist usually looks at the suspicious lesion with the naked eye and with the aid of a dermatoscope, which is a handheld microscope that provides low-level magnification of the skin. If these methods are inconclusive or lead the dermatologist to believe the lesion is cancerous, a biopsy is the next step. This deep learning algorithm may help dermatologists decide which skin lesions to biopsy.

“My main eureka moment was when I realized just how ubiquitous smartphones will be,” said Stanford Department of Electrical Engineering’s Andre Esteva, co-lead author of the study. “Everyone will have a supercomputer in their pockets with a number of sensors in it, including a camera. What if we could use it to visually screen for skin cancer? Or other ailments?”

It is projected that there will be 6.3 billion smartphone subscriptionst by the year 2021, according to Ericsson Mobility Report (2016), which could potentially provide low-cost universal access to vital diagnostic care.

Creating the deep convolutional neural network (CNN) algorithm

Deep CNN classification technique. Data flow is from left to right: an image of a skin lesion (for example, melanoma) is sequentially warped into a probability distribution over clinical classes of skin disease using Google Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million images over 1,000 generic object classes) and fine-tuned on the team’s own dataset of 129,450 skin lesions comprising 2,032 different diseases. (credit: Andre Esteva et al./Nature)

Rather than building an algorithm from scratch, the researchers began with an algorithm developed by Google that was already trained to identify 1.28 million images from 1,000 object categories. It was designed primarily to be able to differentiate cats from dogs, but the researchers needed it to differentiate benign and malignant lesions. So they collaborated with dermatologists at Stanford Medicine, as well as Helen M. Blau, professor of microbiology and immunology at Stanford and co-author of the paper.

The algorithm was trained with nearly 130,000 images representing more than 2,000 different diseases with an associated disease label, allowing the system to overcome variations in angle, lighting, and zoom. The algorithm was then tested against 1,942 images of skin that were digitally annotated with biopsy-proven diagnoses of skin cancer. Overall, the algorithm identified the vast majority of cancer cases with accuracy rates that were similar to expert clinical dermatologists.

However, during testing, the researchers used only high-quality, biopsy-confirmed images provided by the University of Edinburgh and the International Skin Imaging Collaboration Project that represented the most common and deadliest skin cancers — malignant carcinomas and malignant melanomas.

Skin cancer classification performance of the CNN (blue) and dermatologists (red).** (credit: Andre Esteva et al./Nature)

The 21 dermatologists were asked whether, based on each image, they would proceed with biopsy or treatment, or reassure the patient. The researchers evaluated success by how well the dermatologists were able to correctly diagnose both cancerous and non-cancerous lesions in more than 370 images.***

However, Susan Swetter, professor of dermatology and director of the Pigmented Lesion and Melanoma Program at the Stanford Cancer Institute and co-author of the paper, notes that “rigorous prospective validation of the algorithm is necessary before it can be implemented in clinical practice, by practitioners and patients alike.”

* Every year there are about 5.4 million new cases of skin cancer in the United States, and while the five-year survival rate for melanoma detected in its earliest states is around 97 percent, that drops to approximately 14 percent if it’s detected in its latest stages.

** “Skin cancer classification performance of the CNN and dermatologists. The deep learning CNN outperforms the average of the dermatologists at skin cancer classification using photographic and
dermoscopic images. Our CNN is tested against at least 21 dermatologists at keratinocyte carcinoma and melanoma recognition. For each test, previously unseen, biopsy-proven images of lesions are displayed, and dermatologists are asked if they would: biopsy/treat the lesion or reassure the patient. Sensitivity, the true positive rate, and specificity, the true negative rate, measure performance. A dermatologist outputs a single prediction per image and is thus represented by a single red point. The green points are the average of the dermatologists for each task, with error bars denoting one standard deviation.” — Andre Esteva et al./Nature

*** The algorithm’s performance was measured through the creation of a sensitivity-specificity curve, where sensitivity represented its ability to correctly identify malignant lesions and specificity represented its ability to correctly identify benign lesions. It was assessed through three key diagnostic tasks: keratinocyte carcinoma classification, melanoma classification, and melanoma classification when viewed using dermoscopy. In all three tasks, the algorithm matched the performance of the dermatologists with the area under the sensitivity-specificity curve amounting to at least 91 percent of the total area of the graph. An added advantage of the algorithm is that, unlike a person, the algorithm can be made more or less sensitive, allowing the researchers to tune its response depending on what they want it to assess. This ability to alter the sensitivity hints at the depth and complexity of this algorithm. The underlying architecture of seemingly irrelevant photos —  including cats and dogs — helps it better evaluate the skin lesion images.

Abstract of Dermatologist-level classification of skin cancer with deep neural networks

Skin cancer, the most common human malignancy, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions. Deep convolutional neural networks (CNNs) show potential for general and highly variable tasks across many fine-grained object categories. Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly, using only pixels and disease labels as inputs. We train a CNN using a dataset of 129,450 clinical images—two orders of magnitude larger than previous datasets—consisting of 2,032 different diseases. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: keratinocyte carcinomas versus benign seborrheic keratoses; and malignant melanomas versus benign nevi. The first case represents the identification of the most common cancers, the second represents the identification of the deadliest skin cancer. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists. Outfitted with deep neural networks, mobile devices can potentially extend the reach of dermatologists outside of the clinic. It is projected that 6.3 billion smartphone subscriptions will exist by the year 2021 and can therefore potentially provide low-cost universal access to vital diagnostic care.


  • http://www.nature.com/nature/journal/vaop/ncurrent/full/nature21056.html

Generative adversarial networks recover features in astrophysical images of galaxies beyond the deconvolution limit

Observations of astrophysical objects such as galaxies are limited by various sources of random and systematic noise from the sky background, the optical system of the telescope and the detector used to record the data. Conventional deconvolution techniques are limited in their ability to recover features in imaging data by the Shannon–Nyquist sampling theorem. Here, we train a generative adversarial network (GAN) on a sample of 4550 images of nearby galaxies at 0.01 < z < 0.02 from the Sloan Digital Sky Survey and conduct 10× cross-validation to evaluate the results. We present a method using a GAN trained on galaxy images that can recover features from artificially degraded images with worse seeing and higher noise than the original with a performance that far exceeds simple deconvolution. The ability to better recover detailed features such as galaxy morphology from low signal to noise and low angular resolution imaging data significantly increases our ability to study existing data sets of astrophysical objects as well as future observations with observatories such as the Large Synoptic Sky Telescope (LSST) and the Hubble and James Webb space telescopes.


Any telescope observation of astrophysical objects is limited by the noise in the image, driven by both the detector used and the sky background. Similarly, the observation is limited in angular resolution by the resolving power of the telescope (R ∼ λ/D) and, if taken from the ground, by the distortions caused by the moving atmosphere (the ‘seeing’). The total blurring introduced by the combination of the telescope and the atmosphere is described by the point spread function (PSF). An image taken by a telescope can therefore be thought of as a convolution of the true light distribution with this point spread function plus the addition of various sources of noise. The Shannon–Nyquist sampling theorem (Nyquist 1928; Shannon 1949) limits the ability of deconvolution techniques in removing the effect of the PSF, particularly in the presence of noise (Magain, Courbin & Sohy 1998; Courbin 1999; Starck, Pantin & Murtagh 2002).

Deconvolution has long been known as an ‘ill-posed’ inverse problem because there is often no unique solution if one follows the signal processing approach of backwards modelling (Magain et al. 19982007; Letawe et al. 20072008; Cantale et al. 2016). Another standard practice in tackling inverse problems like these is integrating priors using domain knowledge in forward modelling. For example, if the algorithm knows what a galaxy should look like or it knows the output needs to have certain properties such as being ‘sharp’, it will make more informative decisions when choosing among all possible solutions. In this paper, we demonstrate a method using machine learning to automatically introduce such priors. This method can reliably recover features in images of galaxies. We find that machine learning techniques can go beyond this limitation of deconvolutions – by training on higher quality data, a machine learning system can learn to recover information from poor-quality data by effectively building priors.


Our general method is agnostic as to the specific machine learning algorithm used. In this paper, we choose to use conditional generative adversarial networks (GAN), a state-of-the-art deep learning algorithm for image-to-image translation. In this work, we adopted a standard GAN architecture; therefore, we only briefly introduce GAN and interested readers can consult Reed et al. (2016) and Goodfellow et al. (2014) for details.

In the training phase, the GAN takes as input a set of image pairs – in our case, one image that is degraded (by this we mean: convolved with a worse PSF, or blurred, and with added noise) and the same image without such degradation. The GAN then tries to ‘learn’ to recover the degraded image by minimizing the difference between the recovered image and the non-degraded image. The function that measures the difference between the two images, which is often called the loss function, is often something simple such as the Euclid distance but can be a more sophisticated function. In the case of a GAN, this function is another neural network (hence the name adversarial) whose goal is to distinguish the recovered image from a non-degraded image. These two neural networks are trained at the same time. This allows the system to learn sophisticated loss functions automatically without hand-engineering.1 In the testing phase, the GAN takes a different set of degraded images and recovers them.

One remaining challenge is how to generate pairs of images with and without degradation for the training phase. In our framework, we take advantage of the centuries of study of the noise introduced by the telescope and the atmosphere to weakly supervise a GAN network by simulating the blurring process automatically. This allows us to easily harvest a large training set automatically without any human intervention. Furthermore, it allows us to automatically scale our system, and arguably achieve better quality, when future large-scale sky survey data from e.g. LSST (LSST Science Collaboration et al. 2009) or Euclid (Laureijs et al. 2011) are available. We outline the method in Fig. 1.


Figure 1.
Schematic illustration of the training process of our method. The input is a set of original images. From these, we automatically generate degraded images, and train a GAN. In the testing phase, only the generator will be used to recover images.

Schematic illustration of the training process of our method. The input is a set of original images. From these, we automatically generate degraded images, and train a GAN. In the testing phase, only the generator will be used to recover images.

We select a sample of 4550 galaxies from the Sloan Digital Sky Survey Data Release 12 (York et al. 2000; Alam et al. 2015) in the redshift range 0.01 < z < 0.02 and conduct 10× cross-validation for all of our experiments (each fold contains 4105 images for training and 455 for testing). We obtain the g, r and i-band images for these objects and process them using an asinh stretch (y = asinh(10 × )/3) to produce three-band RGB images. The transform to asinh stretch rather than keeping the linear data follows the best practice of making the range of input values for a neural network comparable across images. This step has been shown to help make the training procedure faster and easier in other applications (Sola & Sevilla 1997). We note that this process involved clipping extreme pixel values and in principle makes it impossible to fully recover the original flux calibration; exploring how to avoid this while not degrading the neural net training performance is an interesting project.

In order to test the performance of the GAN, we generate a grid of training sets from the galaxy sample. In each training set, we convolve the images with a Gaussian PSF with a full-width at half-maximum (FWHM) of FWHM = [1.4, 1.8, 2.0, 2.5] arcsec. The median seeing of the SDSS images is ∼1.4 arcsec so we explore images of effectively the same resolution all the way to a significantly worse seeing of 2.5 arcsec. After convolving the images with a Gaussian filter representing worse seeing, we adjust the noise level, first restoring it to that of the original image, and then increasing it so that σnew = [1.0, 1.2, 2.0, 5.0, 10.0]σoriginal to mimic shallower images. We train the GAN using open source code released by Reed et al. (2016) with TITAN X PASCAL GPUs. Training finishes in 2 h per setting per fold (200 h in total).


We evaluate our method both quantitatively and qualitatively. Quantitatively, we measure the peak signal-to-noise ratio (PSNR; Xu et al. 2014) of the blurred image and the recovered image. The PSNR is a popular quantitative measure for image recovery (Xu et al. 2014). It is defined as the ratio between the maximum possible power of a signal (original image) and the power of the noise (difference between recovered and original image). When the blurred image contains much noise, a case that is known to be a challenge for deconvolution-based approaches, our method can achieve a PSNR of 37.2 dB. Blind deconvolution (Bell & Sejnowski 1995)2 achieves a PSNR of 19.9 dB and Lucy–Richardson deconvolution (Richardson 1972; Lucy 1974)3 achieves a PSNR of 18.7 dB. We compare our and classic deconvolution approaches in Table 1. We also perform an experiment on the impact of the training set size, reported in Table 1 d: the PSNR achieved increases as we increase the training set size from five images to 2000.




Darwinian evolution and neural networks

‘Designless’ nanoscale logic circuits resemble Cognitive Science/Neuroscience

September 30, 2015

Illustration of a nanoparticle network (about 200 nanometers in diameter). By applying electrical signals at the electrodes (yellow), and using artificial evolution, this disordered network can be configured into useful logic circuits. (credit: University of Twente)

Researchers at the University of Twente in The Netherlands have designed and demonstrated working electronic logic circuits produced using methods that resemble Darwinian evolution and neural networks like the human brain.

In a radical “designless” approach, the researchers used a 200-nanometer-wide cluster of 20-nanometer gold nanoparticles. They applied a series of voltages to eight electrodes and determined the resulting set of 16 different two-input Boolean logic gates.

Artificial evolution

Instead of designing logic circuits with specified functions, as with conventional transistors, this approach works around — or can even take advantage of — any material defects.

To do this, the researchers used an artificial evolution model — one that runs in less than an hour, rather than millions of years. “Natural evolution has led to powerful ‘computers’ like the human brain, which can solve complex problems in an energy-efficient way,” the researchers note. “Nature exploits complex networks that can execute many tasks in parallel.”

“This is the first time that scientists have succeeded in realizing robust electronics with small dimensions that can compete with commercial technology.”

Schematic device layout of the disordered network of gold nanoparticles, separated by ~1 nm 1-octanethiols, in between eight titanium-gold electrodes, as shown in the scanning electron micrograph (top inset). The gold nanoparticles act as single-electron transistors (SETs) at low temperature (<15 K). (credit: S. K. Bose, et al./Nature Nanotechnology)

Conventional transistors are limited to a handful of atoms. It would a major challenge to produce chips in which the millions of transistors required have the same characteristics, according to the researchers from the Twente MESA+ Institute for Nanotechnology and the CTIT Institute for ICT Research. Current transistor designs are also limited by their energy consumption, which is reaching unacceptable levels.

According to University of Twente prof. Wilfred van der Wiel, the logic circuits they discovered currently have limited computing power. “But with this research we have delivered a proof of principle. By scaling up the system, real added value will be produced in the future. This type of circuitry uses much less energy, both in production and use.” The researchers anticipate a wide range of applications — for example, in portable electronics and in the medical world.

“By choosing a smaller nanoparticle diameter, and scaling down the electrode geometry accordingly, our network would not only further reduce area, but room-temperature operation would come into sight as well,” the researchers note in a paper in the journal Nature Nanotechnology.

Mimicking brain-like systems

The researchers also contrast their “designless” reconfigurable approach with massively parallel (but still design-constrained) architectures such as IBM’s TrueNorth brain-inspired chip.

“An especially interesting avenue to explore is the suitability of this system for advanced functionality that is hard (or expensive) to realize in a conventional architecture, such as pattern recognition by mimicking brain-like systems, or simulations of complex physical systems,” the researchers note in the paper. “Our evolutionary approach works around device-to-device variations at the nanoscale and the accompanying uncertainties in performance, which is becoming more and more a bottleneck for the miniaturization of conventional electronic circuits.”

Abstract of Evolution of a designless nanoparticle network into reconfigurable Boolean logic

Natural computers exploit the emergent properties and massive parallelism of interconnected networks of locally active components. Evolution has resulted in systems that compute quickly and that use energy efficiently, utilizing whatever physical properties are exploitable. Man-made computers, on the other hand, are based on circuits of functional units that follow given design rules. Hence, potentially exploitable physical processes, such as capacitive crosstalk, to solve a problem are left out. Until now, designless nanoscale networks of inanimate matter that exhibit robust computational functionality had not been realized. Here we artificially evolve the electrical properties of a disordered nanomaterials system (by optimizing the values of control voltages using a genetic algorithm) to perform computational tasks reconfigurably. We exploit the rich behaviour that emerges from interconnected metal nanoparticles, which act as strongly nonlinear single-electron transistors, and find that this nanoscale architecture can be configured in situ into any Boolean logic gate. This universal, reconfigurable gate would require about ten transistors in a conventional circuit. Our system meets the criteria for the physical realization of (cellular) neural networks: universality (arbitrary Boolean functions), compactness, robustness and evolvability, which implies scalability to perform more advanced tasks. Our evolutionary approach works around device-to-device variations and the accompanying uncertainties in performance. Moreover, it bears a great potential for more energy-efficient computation, and for solving problems that are very hard to tackle in conventional architectures.

New Device Could Greatly Improve Speech and Image Recognition

New Device Could Greatly Improve Speech and Image Recognition

Researchers have demonstrated pattern recognition using a magnonic holographic memory device

RIVERSIDE, Calif. (www.ucr.edu) — Researchers at the University of California, Riverside Bourns College of Engineering and the Russian Academy of Sciences have successfully demonstrated pattern recognition using a magnonic holographic memory device, a development that could greatly improve speech and image recognition hardware.

Pattern recognition focuses on finding patterns and regularities in data. The uniqueness of the demonstrated work is that the input patterns are encoded into the phases of the input spin waves.

Spin waves are collective oscillations of spins in magnetic materials. Spin wave devices are advantageous over their optical counterparts because they are more scalable due to a shorter wavelength. Also, spin wave devices are compatible with conventional electronic devices and can be integrated within a chip.

The researchers built a prototype eight-terminal device consisting of a magnetic matrix with micro-antennas to excite and detect the spin waves. Experimental data they collected for several magnonic matrixes show unique output signatures correspond to specific phase patterns. The microantennas allow the researchers to generate and recognize any input phase pattern, a big advantage over existing practices.

illustrations/photos of magnonic holographic devices

Clockwise, photo of the prototype device; schematic of the eight-terminal magnonic holographic memory prototype; collection of experimental data obtained for two magnonic matrixes.

Then spin waves propagate through the magnetic matrix and interfere. Some of the input phase patterns produce high output voltage, and other combinations results in a low output voltage, where “high” and “low” are defined regarding the reference voltage (i.e. output is high if the output voltage is higher than 1 millivolt, and low if the voltage is less than 1 millivolt.

It takes about 100 nanoseconds for recognition, which is the time required for spin waves to propagate and to create the interference pattern.

The most appealing property of this approach is that all of the input ports operate in parallel. It takes the same amount of time to recognize patterns (numbers) from 0 to 999, and from 0 to 10,000,000. Potentially, magnonic holographic devices can be fundamentally more efficient than conventional digital circuits.

The work builds upon findings published last year by the researchers, who showed a 2-bit magnonic holographic memory device can recognize the internal magnetic memory states via spin wave superposition. That work was recognized as a top 10 physics breakthrough by Physics World magazine.

“We were excited by that recognition, but the latest research takes this to a new level,” said Alex Khitun, a research professor at UC Riverside, who is the lead researcher on the project. “Now, the device works not only as a memory but also a logic element.”

The latest findings were published in a paper called “Pattern recognition with magnonic holographic memory device” in the journal Applied Physics Letters. In addition to Khitun, authors are Frederick Gertz, a graduate student who works with Khitun at UC Riverside, and A. Kozhevnikov, Y. Filimonov and G. Dudko, all from the Russian Academy of Sciences.

Holography is a technique based on the wave nature of light which allows the use of wave interference between the object beam and the coherent background. It is commonly associated with images being made from light, such as on driver’s licenses or paper currency. However, this is only a narrow field of holography.

Holography has been also recognized as a future data storing technology with unprecedented data storage capacity and ability to write and read a large number of data in a highly parallel manner.

The main challenge associated with magnonic holographic memory is the scaling of the operational wavelength, which requires the development of sub-micrometer scale elements for spin wave generation and detection.

The research was supported in part by: The Center for Function Accelerated nanoMaterial Engineering (FAME), which is funded with $35 million from the Semiconductor Research Corporation (SRC), a consortium of semiconductor industry companies; the Defense Advanced Research Projects Agency; and the National Science Foundation under the NEB2020 Grant ECCS-1124714.


Tel: (951) 827-1287
E-mail: sean.nealon@ucr.edu
witter: seannealon

ARCHIVED UNDER: Science/Technology, , , ,

New Device Could Greatly Improve Speech and Image Recognition