Due Diligence

Combating compression

Kaoru Sakabe
May 1, 2017

I love reading blog posts about what the next generation will never experience because of changes in technology. Isn’t it crazy that, at one point in time, data storage used to mean using floppy disks holding less than 1 MB? Or connecting to the internet involved using a phone line to dial into a server? Nowadays, cloud-based servers and online collaborative programs allow researchers to share large amounts of raw data quickly. External hard drives that can store terabytes of data can be purchased cheaply.

Read the entire collection

With the decreased cost of storing digital data and the ability to rapidly share files electronically, minimizing file size should no longer be a factor when deciding in what format to store your data. What you should keep in mind is that your electronic data should be stored in a universal format that does not alter its original information in any way, thus preserving your high-quality image data. In other words, you should be saving your files in a way that uses a lossless compression. For images, your go-to should be TIFF, or tag image file format, and not JPEG, or joint photographers expert group format. Although there are other lossless file types, such as RAW, BMP or PNG, ideally you should save as a TIFF, because it is uniformly supported across different software platforms.

Because disk space and transfer speed were great limitations many years ago, scores of authors chose to save their images as JPEG files. But beware: JPEG files can compromise your hard-earned data. Technically, JPEG is not a file format but rather a method that specifies how the image will be compressed. You will see the extension JPG or JPEG when you save files this way, but there is no difference between these two extensions. When an image is saved using JPEG compression, it is broken up into 8x8 pixel blocks, and a transformation then is applied to each block independently of the rest of the image to reduce the file size. This transformation also separates the color information from the brightness and discards more of the color information. Ultimately, JPEG is a lossy compression method (see the Due Diligence column in the January issue of ASBMB Today), which means that every time you save the file, you are discarding information. I’ll demonstrate the reasons why you should avoid this format, and hopefully I can convince you to avoid using JPEGs altogether.

First, saving as a JPEG fundamentally alters the image in a way that cannot be restored. Take, for example, the original TIFF image shown in Figure 1. In last month’s column, we discussed how informative histograms can be. Looking at the histogram of the TIFF image, we can see that the image contains many white pixels, some black pixels and a few pixels of various shades of gray. For JPEGs, high quality means little compression and a larger file size; low quality means high compression and smaller file size. Saving the same image as a JPEG at different quality levels introduces pixels that were not present in the original, creating a distorted image.

Figure 1. Saving your data as a JPEG changes the pixels in your image.Figure 1. Saving your data as a JPEG changes the pixels in your image.

Now how does this translate into a scientific image? In Figure 2, I’ve taken a TIFF image and saved it at three different JPEG qualities. Visually, there doesn’t appear to be a huge difference between the TIFF and the high-quality JPEG; however, if you analyze the image with a surface-plot analysis, you’ll notice appreciable differences between the two images. As you compress the image further, blocks start to appear, the background looks less like a real experiment and the bands seem pasted in. These artifacts occur especially in areas of high contrast, such as a dark band on a clean background.

(Left) Figure 2. JPEG compression introduces artifacts (Right)Figure 3. Repeatedly saving as a JPEG introduces artifacts.(Left) Figure 2. JPEG compression introduces artifacts
(Right) Figure 3. Repeatedly saving as a JPEG introduces artifacts.

Another issue is that each time a JPEG image is saved, the compression is applied. Repeatedly saving an image during editing can introduce artifacts. For example, in Figure 3, I’ve taken an image of a dividing cell and saved it 100 times at maximum quality. By the 100th save, several anomalies have appeared, and it no longer looks the same as the original. While this exercise is almost certainly an exaggeration of what’s happening in the lab, it illustrates that each time you save in the JPEG format, you are changing your data.

Finally, remember that by snapping the picture of the cell or scanning your film, you are recording the results of your experiment. Saving the image in a lossy file format, such as JPEG, distorts the actual results you obtained. Don’t get stuck assembling a figure with muddled data. By saving your image initially in a lossless format, such as TIFF, you will be doing your due diligence in preserving your data.

Kaoru SakabeKaoru Sakabe is the data integrity manager at the ASBMB.

Enjoy reading ASBMB Today?

Become a member to receive the print edition monthly and the digital edition weekly.

Learn more
Kaoru Sakabe

Kaoru Sakabe is the data integrity manager at the ASBMB.

Get the latest from ASBMB Today

Enter your email address, and we’ll send you a weekly email with recent articles, interviews and more.

Latest in Science

Science highlights or most popular articles

Bertrand Coste and the pressure receptor
Feature

Bertrand Coste and the pressure receptor

Jan. 27, 2022

“It was almost one of those ‘I can’t believe you’re doing this’ kind of projects.” The search for a protein that senses pressure. Part of a series on the 2021 Nobel Prize in physiology or medicine.

David McKemy and the cold receptor
Feature

David McKemy and the cold receptor

Jan. 26, 2022

“This is the nature of doing bench work. We all go through those periods when stuff’s not working.” How persistence unlocked the cold-sensitive receptor TRPM8. Part of a series on the 2021 Nobel Prize in physiology or medicine.

How the Julius lab found that an ion channel senses heat
Feature

How the Julius lab found that an ion channel senses heat

Jan. 25, 2022

“Holy cow, this is why hot peppers are hot.” How researchers established that the capsaicin receptor also recognizes heat. Part of a series on the 2021 Nobel Prize in physiology or medicine.

Michael Caterina and the capsaicin receptor
Feature

Michael Caterina and the capsaicin receptor

Jan. 24, 2022

Being scooped left a postdoc with a toolkit for hunting receptors — and a daring new project. First in a series on the 2021 Nobel Prize in physiology or medicine.

Genetic analysis hints at why COVID-19 can mess with smell
News

Genetic analysis hints at why COVID-19 can mess with smell

Jan. 23, 2022

People with variants near smell-related genes may have a higher risk of losing smell or taste.

The coronavirus may cause fat cells to miscommunicate, leading to diabetes
News

The coronavirus may cause fat cells to miscommunicate, leading to diabetes

Jan. 22, 2022

COVID-19 patients with high blood sugar had low levels of a hormone made by fat.