Tuesday, June 28, 2011

Data Compression

What is Data Compression:
data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use.
What is a data compression ratio?
Data compression ratio, also known as compression power, is a computer-science term used to quantify the reduction in data-representation size produced by a data compression algorithm. The data compression ratio is analogous to the physical compression ratio used to measure physical compression of substances, and is defined in the same way, as the ratio between the compressed size and the uncompressed size: [1]
 {\rm Compression\;Ratio} = \frac{\rm Compressed\;Size}{\rm Uncompressed\;Size}
Why is data compression an important technique for the online world?
It reduces the amount of data stored on a hard disk or transmission of bandwidth. 
What is the difference between Lossy and Lossless data compression?
A lossy data compression method is one where compressing data and then decompressing it retrieves data that may well be different from the original, but is "close enough" to be useful in some way. Lossy data compression is used frequently on the Internet and especially in streaming media and telephony applications. These methods are typically referred to as codecs in this context. Most lossy data compression formats suffer from generation loss: repeatedly compressing and decompressing the file will cause it to progressively lose quality. This is in contrast with lossless data compression.

Types of lossy compression

There are two basic lossy compression schemes:
  • In lossy transform codecs, samples of picture or sound are taken, chopped into small segments, transformed into a new basis space, and quantized. The resulting quantized values are then entropy coded.
  • In lossy predictive codecs, previous and/or subsequent decoded data is used to predict the current sound sample or image frame. The error between the predicted data and the real data, together with any extra information needed to reproduce the prediction, is then quantized and coded.

In some systems the two techniques are combined, with transform codecs being used to compress the error signals generated by the predictive stage.

Lossless data compression make use of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the compressed data. Lossless data compression is used in many applications. For example, it is used in the popular ZIP file format and in the Unix tool gzip. It is also often used as a component within lossy data compression technologies. The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any known lossless method, while still meeting the requirements of the application.

Lossy methods are most often used for compressing sound, images or videos. The compression ratio (that is, the size of the compressed file compared to that of the uncompressed file) of lossy video codecs are nearly always far superior to those of the audio and still-image equivalents. Audio can be compressed at 10:1 with no noticeable loss of quality, video can be compressed immensely with little visible quality loss, eg 300:1. Lossily compressed still images are often compressed to 1/10th their original size, as with audio, but the quality loss is more noticeable, especially on closer inspection.

When a user acquires a lossily-compressed file, (for example, to reduce download-time) the retrieved file can be quite different from the original at the bit level while being indistinguishable to the human ear or eye for most practical purposes. Many methods focus on the idiosyncrasies of the human anatomy, taking into account, for example, that the human eye can see only certain frequencies of light. The psycho-acoustic model describes how sound can be highly compressed without degrading the perceived quality of the sound. Flaws caused by lossy compression that are noticeable to the human eye or ear are known as compression artifacts.
Lossless compression algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely, but nevertheless perfectly. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be followed by the letter 'z' is very small.

Another kind of compression, called lossy data compression, is possible if some loss of fidelity is acceptable. For example, a person viewing a picture or television video scene might not notice if some of its finest details are removed or not represented perfectly. Similarly, two clips of audio may be perceived as the same to a listener even though one is missing details found in the other. Lossy data compression algorithms introduce relatively minor differences and represent the picture, video, or audio using fewer bits.

Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression. However, lossless data compression algorithms will always fail to compress some files; indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already will therefore usually result in an expansion, as will attempts to compress encrypted data.

In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, which for example always removes the last byte of a file, will always compress a file up to the point where it is empty

Which method offers the greatest compression ratio?
The method that offers the greatest compression ratio is lossy compression. The compression ratio of lossy video codecs are generally better than audio and still-image.
The video can be compressed at a 100:1 ratio 
Audio can often be compressed at a 10:1 
Still images are also often compressed at a 10:1 ratio.
The compression rate of lossy compression is about 5-6% while the rate of lossless compression ranges from 50 60%.
.
Why is human psychology and perception an important factor in methods of Lossy compression?
This is because in lossy compressions, very little of the information from a file is still there. Without the deception of how the human eye sees the difference of information, the file would be very different from the original. If we allow our mind to do so would be able to the errors in a lossy compression.
Name Lossy file formats for audio, still image and video format.
Audio:
MP3 - .mp3
MPEG-1 - .mp2
WMA - .wma

Still Image:
CPC - .cpc
DjVu - .djvu .djv
JPEG - .jpg .jpeg

Video:
DV - .dv .dif
MPEG-1 - .mpg .mpeg .mp2 .mp3 .mpa
MPEG-2 - .mpg .mpeg .mp2 .mp3 .m2v



 

 
 

No comments:

Post a Comment