Audio, Video and Image Processing: Refined Notes: November 2012

Thursday, 29 November 2012

Lecture 8: Video Processing

Video Processing

Scottish inventor John Logie Baird (1846-1946) was a pioneer in video processing, inventing the television, of which a working version was first devised in 1926. His methods for displaying a stream of video were by using mechanical picture scanning with an electronic transmitter and receiver.

Baird with his invention, the first television

An old idea referred to as Persistence of Vision suggested that an after image persisted in the retina for one twenty-fifth of a second on the retina. This has since been dispelled and is now regarded as a myth as it's no longer considered true that a human perceives image as the result of persisted vision.

Describing motion perception is probably more accurate and relatable by the following two definitions: Phi phenomenon: The optical illusion of the perception of continuous motion between separate objects viewed in rapid succession and beta movement: The optical illusion that fixed images appear to move, despite the image not moving.

When a user is exposed to images at a rate of more than four a second, to the human eye the impression of movement is given off.

Storage

For videos, copious amounts of storage is needed for certain types of files. The largest type that springs to mind would be Uncompressed HD Videos. Storing one could typically use approximately 1 gigabyte every 3 seconds. This estimation is based on the file needing 3 bytes per pixel, with a resolution of 1920 x 1090 by 60 frames per second. This totals roughly 373.2 megabytes per seconds.

Even in today's world of technological advancements, that amount of storage is far too bulky and awkward. In order to combat this, there are a large variety of compression algorithms and standards which can significantly reduce data usage in the storage, processing, streaming and transmission of videos.

Video Processing Vital Terminology:

Bit rate: How many bits per second are used to represent the video portion of the file. These vary vastly, typically from 300-8000 kbps. As expected, the lower bit rate represents a lower quality of video, similarly to sound files.

Interlacing or Progressive Video:

Interlaced video is the best method to make use of limited bandwidth for video transmission, especially when analogue transmissions were prominent. The viewer is tricked by the receiver, drawing the odd number lines twenty five times per second. After this, the even lines appear in the next frame and the process is repeated. Progressive video does not do this, avoiding interlacing and as a result appears much sharper.

Resolution:

Resolution is the number of pixels by pixels needed to represent an image. At first, in the analogue days, video was represented by a resolution of 352 x 240 in North America and 352 x 288 in Europe. Advancements mean that today, high definition television can be represented by 1920 x 1200 pixels. The movement away from analogue signals has meant televisions can now produce blu-ray quality and at times even double up as a computer monitor.

This image shows a scale of definition, with the highest to the left

Video File Formats

MPEG-1:

Development started in 1988 and finalised in 1992, when the first MPEG-1 decoder became available
It could compress video to 26:1 and audio to 6:1
The format was designed to compress VHS quality digital video and CD audio without comprising quality as far as possible.
It is currently the most widely compatible lossy compression format in the world
It is part of the same standard as the MP3 audio format
MPEG-1 video and layer I/II audio can now be used in applications legally for free, as the patents expired in 2003, meaning royalties and license fees were no longer applicable

MPEG- 2

Work began on this format in 1990, before MPEG-1 was written
Its intentions were to extend the MPEG-1 format and use high bitrates (3-5 Mbits per second) to provide broadcast quality video in full.

MPEG- 4

A patented collection of methods made to define compression of audio and video, creating a standard for a number of audio and video codecs. (coder/decoder)
Shares many features with MPEG-1 and MPEG-2, whilst enabling 3D rendering, Digital Rights Management and other interactive features

Quick Time

Was first on the scene in 1991, produced by Apple, beating Microsoft by a year in their attempts to add a video format to windows.
Approved in 1998 by the ISO as the basis of the MPEG-4 file format

AVI (Audio Video Interleave)

First seen in 1992, implemented by Microsoft as part of its Video for Windows technology.
It takes the form of a file container, allowing synchronized audio and video playback
Sometimes files can appeared stretched or squeezed and lose definition due to the files not containing aspect ration information which can lead to them being rendered with square pixels.
However, certain types of software such as VLC and MPlayer have features which can solve problems with AVI file playback.
Despite being an older format, using AVI files can be beneficial. Their continuation means they can be played back on a wide range of systems, with only MPEG-1 being better in that respect.
The format is also very well documented, both by its creators Microsoft and various third parties.

WMV (Windows Media Video)

Made by Microsoft with various proprietary codecs
Files tend to be wrapped in Advanced Systems Format (ASF) and are not encoded
The ASF wrapper is frequently responsible for supporitng the Digital Rights Management
Files can also be placed inside an AVI container whilst based in Windows Media 9
Can be played on PCLinuxOS while using software including the aformentioned VLC and MPlayer

3GP

Two similar formats 3GPP (container for GSM phones such as T-Mobile) and 3GPP2 (container for CDMA phones such as Verizon)
3GPP files frequently carry a 3GP file extension, whereas 3GPP2 files carry 3G2 extensopms
3GP/3G2 store video files using an MPEG-4 format. Some mobile phones use MP4 to represent 3GP.
This method decreases bandwidth and storage whilst still trying to deliver a high quality of video.
Again, mentioning VLC and MPlayer, help Linux operating systems support 3GP files, they can be encoded and decoded with the method known as "FFmpeg".

FLV (Flash Video)

File container used to transmit video over the internet
Used by household internet names such as YouTube, Google, Yahoo and MetaCafe
While this format is open format, the codecs used in production are generally patented

Video Quality versus Speed of Access

The more compression, the more information is lost, leading to more distortion in picture

Algorithms that compress video still have problems with content which can be unpredictable and detail, the prime example being live sports evenets. Automatic Video Quality Assessment could provide a solution to this.

Sunday, 25 November 2012

Lab 4: Creating Signals

This lab is a demonstration on how to generate signals using Cool Edit Pro. The object of the task was to create a wave and then mix it with other waves to alter the shape of it. Firstly, I created the initial wave at 8 bit.

Harmonic Number	1	2	3	4	5	6	7	8	9
Amplitude relative to Fundamental	1	0	1/3	0	1/5	0	1/7	0	1/9
Amplitude (dB)	0		-9.54		-13.9		-16.9		-19.1
Frequency	400		1200		2000		2800		3600

The task was then to create a wave for the fundamental tone (the first one) and then for the third, fifth, seventh and ninth harmonics. The duration of the wave is 0.2 seconds. Images of each of the five waves are below in order. As you can see below, visually the images are decreasing in height and becoming more dense (compact).

The above table shows the theory of how the waves take such shape. The increase in frequency (number of cycles per second) with each harmonic explains why the waves become more compact. With more frequency, more cycles are completed per second and as a result the image shows more instances of the waves.

The decrease in height visible below is relative to the amplitude. The amplitude is the volume of noise that the sound wave is generating, this is measured in decibels. In Cool Edit Pro, the amplitude is depicted in the height of the wave. The decrease in height relates directly to the Amplitude dB section of the table. As you can see, the third harmonic sees a decrease in decibels from 0 to -9.54 from the fundamental. This is worked out by using a logarithm with this formula:

In chronological order, displayed below are the fundamental, the third harmonic, the fifth harmonic, the seventh harmonic and the ninth harmonic.

After creating waves for the fundamental and the third, fifth, seventh and ninth harmonic, the next task was to mix the waves together. The end result saw the wave was beginning to appear as in a square shape as shown below. Mixing the waves together is done using the mix paste option in Cool Edit Pro. You simply highlight the wave or part of the wave that you wish to use, click file then copy, open another wave, then click file, followed by mix paste.

The second part of the lab required a task similar to above, a mix of the below:

Harmonic Number	1	2	3	4	5
Amplitude relative to Fundamental	1	1/2	1/3	1/4	1/5
Amplitude dB	0	-6.02	-9.54	-12.04	-13.9
Frequency	400	800	1200	1600	2000

I then generated the 2nd and 4th harmonic and began to mix the waves together. As you can see they were beginning to take shape of a mountain by the end.

Lab 6: Image Manipulation

Welcome to my lab on image manipulation. The chosen software to perform this Lab is Adobe Fireworks. My first step was to download the image I was instructed to use in this process and open it in the fireworks software.

The image appeared as seen below. It was an image of a church with a backdrop of a blue sky at a sideways angle. In this blog I am going to explore the various different ways, using fireworks, that I can edit the appearance of this image.

Firstly, I identified that the angle the image was at was unhelpful and inconvenient to the viewer. For the viewer to look at the image in its natural state they would have to turn sideways. Thankfully, in fireworks, there are features which can help you get round this. I was able to rotate the image by 90 degrees, which made a lot easier, clearer and user friendly. There are various other angles you can rotate images by in fireworks, in both clockwise and anti-clockwise directions, but I decided that 90 degrees was the most fit for purpose for this lab.

After I had edited the image to show it at a better angle, my next task was to show some of the custom filters that fireworks has to offer. The two I decided to work on were motion blur and invert colours. Firstly, motion blur creates the illusion that the image was taken at high impact, such as by someone on the move. The result of this is that the detail of the image is highly compromised, as you can see below. The time on the clock is more difficult to make out and the individual bricks aren't quite as visible.

The next filter I chose to try out was inverting the colours. This operation makes the colours in the photograph transform so they are complete opposites of each other. This leaves the image looking far from natural.

After this I decided to edit the brightness of the photograph to see the results from that. The brightness is set zero in its default state, so I first decided to edit the image to so that it was at its minimum brightness, which on the Fireworks package is -100. The reduction in brightness gave the image the appearance that it was taken just prior to sunset, as it dulls the sky in a way that gives it a night-time feel.

My next step was to try the complete opposite of this, by increasing the brightness to maximum level, which expectedly is 100, given minimum level is -100. This transformation as shown below makes the image appear as if it was taken in broad daylight, at the time where the sun was at it's most prominent. Although the first image appears as if it was taken in daylight, this one increases that effect dramatically as is shown by the clearly visible change in shades of the blue sky.

My next step was to edit the contrast of the image. I first gave it the maximum contrast. This virtually removes all aspects of the image and gives it an animated, almost cartoon like vibe.

Next I went for the opposite effect by reducing the contrast as far as possible. Unlike the brightness, it was not possible for me to reduce the contrast as dramatically as -100 as that simply just makes the image appear totally grey and doesn't give as good an indication of the influence of reduced contrast as setting it to -85 like I did below. This shows how a reduction in contrast really makes the images dull, the sky becomes darker and less visible, as do all of the elements involved.

For me, there are various reasons these types of filtering could be used for the processing of digital images. The image may have been taken at an undesired time of day or in undesired weather conditions. Editing the brightness can give the image the appearance that it was taken at a different time of day and in different amounts of sunlight. By the same effect, reducing the contrast can aid this too, due to its dulling effects. Adding contrast can also aid the process, to an extent, but excessively doing so makes the image appear unnatural, although it could be argued this could be another potential use for filtering, as the user may be trying to obtain an unnatural and animated effect from the image. Filtering could also be used to accentuate a particular part of an image.

Thursday, 22 November 2012

Lecture 6: Digital Image Processing

The Benefits of Digital Image Processing

Allows greater license to edit images, its scales and chemicals.
Allows the user scope to experiment with images, with its flexibility allowing an environment for various changes.
It is a significantly enhanced product in comparison to traditional darkroom photography offer more options to enhance, transform and manipulate images

Digital Camera Imaging Systems

An image capture system contains a lens and a detector, which is often a charged coupled device (CCD). This is a linear or matrix array of photosensitive electronic elements. A traditional film frame's measurements are normally 36 x 24 mm, while typically a CCD array is six times smaller, measuring 6 x 4 mm. As a result of the reduction in frame, a digital camera's lens system must be of a sufficient quality, to allow the condensation of the image to an area 36 times smaller.

Digital Camera Image Capture

On an area array sensor, thousands of microscopic photocells are placed on a grid, these analyse small portions of the image, formed by the lense system, to create picture elements by sensity light intensity.

Sensor Spatial Resolution

"Pixelization" occurs when the resolution of the sensor array is too low, giving a blurry effect. Increasing the number of cells in the sensor array increases the resolutions of the captured image. Sensor devices today tend to have more than one million cells.

Digital Camera Colour

Filters are placed over the photocells to capture images in a combination of red, green and blue. Each area assigned eight bits numbers, giving them 256 values for colours. Typically the range is from 0-255. Each colour is a combination of red, green and blue. Red for example is 255-0-0, green is 0-255-0 and blue is 0-0-255.

Shades of green, blue and red which aren't quite as vibrant can be achieved by reducing the value, such as changing the above red example from 255-0-0 to 128-0-0 would make the colour of red roughly half as strong.

Achieving other colours which are not red, green or blue is achieved by combining at least two of the red, green and blue options. For example. the colour purple is a combination of red and blue.

Digital Camera Optics

Before the light collected by the lens is focused on to the sensor array, it is passed through an optical low-pass filter, which serves to:

Exclude any picture data, beyond the sensor's resolution
Compensate for false coloration caused by drastic changes to colour contrast
Redruced infrared and other sources of non visible light, which may disturb the imaging process carried out by the server

Moire Prevention and Removal

Moire is a repetitive pattern of wavy lines or circles which can appear on objects in digital captures.
It tends to happen when the patter of the imaging chip in the camera matches the fibers or fine parallel details in an object.
Some cameras incorporate anti-aliasing fitlers, which slightly blur tiny details of objects although others don't as it may compromise the level of image sharpness.
Regardless of whether the filters exist, digital cameras have the ability to create more.

Digital Image Fundamentals

Digital images are called bitmaps or raster-scan and are composed of an array (grid or matrix) of smaller units called pixels (picture elements)
Every pixel in the digital image is a uniform patch or colour, but when on the display screen it is a phosphor dot or stripe, consisting of a mixture of red, blue and green

The Pixel

The smallest digital image element manipulated by image processing software
They are individually coloured but as a result of their finite size, the colouring of a subject is only approximate.

Bit Map Graphics

A bit-mapped colour image is represnted in a digital memory as an ordered array of groups of bits. Each group codes colour for single pixels on the screen, meaning each pixel requires 24 bits - 8 for red, 8 for green and 8 for blue.

If the resolution of the file is 640 x 480, with each pixel being represented by 24 bits, the image size would be as follows:

640 x 480 x 24 = 7372800 bits - approximately 7.4MB

Dynamic Range

In a visual scene, the dynamic range is typically the number of colours or shades of grey represented. However, in a digitised image it is fixed as the number of bits used to represent each pixel in an image. This determines the maximum number of colours or shades of grey in the image pallette, which is formed by the specific colours used.

Bit Depths:

1 bit depth: Only has two values, black or white. A process named half tone can help simulate the colour grey by the way it spaces the black and white pixels.
8 bit depth (grey): This bit depth can represent, 256 (2 to the power of 8) shades of grey
8 bit depth (colour): Similar to the directly above, except it can represent 256 colours rather than shades of grey
24 bit depth: Known as true colour, 8 bits are used to represent each of the three additive primary colours (red, green and blue) and each pixel can represent over 16 million (2 to the power of 24). It also removes any countering which is visible at inferior bit depths.

Colour Palette:

A system palette is used when a computer system predetermines the palette and the colours, for example 256 in an 8 bit image, are used for all images. The image's appearance can be aided by selecting the 256 colours most appropriate to that image. However, this adaptive palette can cause problems, when multiple images are attempting to displayed. One palette has to be chosen and stuck to, regardless of how appropriate it is. It may be advisable to use foresight when choosing a palette.

An optimised palette is better to use than a non-optimised palette. The colours are more natural appropriate to life than in a non optimised palette. There is also less contouring, making the image appear more clear. The colours used in a non-optimised palette tend to neglect natural effects such as shading and using an optimised palette makes the image appear more altogether realistic.

The Four Categorisations of Digital Image Processing:

Analysis: Operations provide information such as colour count and intensity.
Manipulation: Content altering operations such as cropping and colour changing.
Enhancement: Improving quality, such as better contrast or heightening images
Transformation: Alter geometry, such as rotation

Processing Digital Images:

Firstly, the image is converted from analogue to digital (digitisation) and placed in a frame buffer, from here the digital image processing operations take place in the computer before it is passed back out to a frame buffer, following on from this the colours (red, green and blue) are looked up and they are each converted back to analogue and then the image is displayed.

Histogram:

A histogram is a graph which analyses intensity levels of an image. The graph ranges from 0 to 255 in a typical 8 bit scale. An image with good contrast and dynamic range shows a full use of the intensity ranges, an image with good contrast shows some vacant intensities and a low contrast image shows a high number of vacant intensities. Histograms tend to showcase the fact that pixel intensity is either high or low, it is very rarely in the middle.

Transformation:

Digitial Image Processing allows rotation and free rotation of images. Rotation is changing the position by a 90 degree angle or a number of 90 degree angles. This is achieved by remapping pixel positions in the rows and columns. Free rotation is achieved by moving an image by an angle of your choice, which often changes the shape of the image, the interpolation works out an appropriate colour value for each pixel position in the outputted image.

Manipulation: block fill is achieved by selecting an area to change and pixel addresses are tested and modified.

Enhancement: Filtering puts a kernel to use, moving over the image in pixel by pixel steps. At each of these steps, the elements of the kernel multiply the current pixel value and are then tallied up to achieve new pixel output value. Depth can be useful to accentuate a particular part of an image as it blurs the surroundings. The smaller the depth, the more accentuated the subject becomes and the more blurry the surroundings become. Motion blur can also be added to images to give the effect than an image was taken at a time of high movement.

Tuesday, 20 November 2012

Lab 2: Exploring Cool Edit Pro

Welcome to the review of my first lab session in Audio, Image and Video Processing. This blog will be all about putting the theory explained in the other labs. Cool Edit Pro is the application we will be using to edit sound and the purpose of this lab was to get acquainted with the features of the program.

The first step was to download wav files to edit. A quick internet search took me to a website called http://www.wavsource.com which provides internet using with free wav files to download, mainly clips from films.

The task specified to identify four or five, upto 150KB in size. I did accordingly, with the largest of my five being an extract from Apocalypse Now which was 99KB in size and also to note their duration, which were as follows:

Back to the Future "you can achieve anything" - 5 seconds
Braveheart "they may take our lives" - 5 seconds
Apocalypse Now "insane" - 30 seconds
Cast Away "Wilson" - 12 seconds
Shrek "Singing" - 5 seconds

I opened up the Cast Away file and played it. In Cool Edit Pro, the sound waves appeared as follows:

By looking carefully, you can see a yellow cursor at the start of the wave. This can be moved by the user. If it is moved and you click play, then the playing will commence from the where the yellow cursor starts. The below image depicts the yellow wave starting further on.

As you can see the yellow cursor is now stationed further along the wave. Another way of noticing that the sound is set to only be partially played is the number underneath the sound. The first image is "0:00.000" whereas the second is "0:06:790" meaning that upon clicking play the sound will be played from roughly 6.8 seconds until the end.

This part of the file can be saved on its own by highlighting the area using the mouse (similar to how you would in a word processing package) and then clicking File > Save Selection. It is crucial to give it at least a slightly different file name, so your first file does not become overwritten. I simply added "_2" to the file name whilst saving, making it "cast_away_wilson_2.wav".

Here is a result of saving the section. The wave now appears in a different shape, and the total duration has been trimmed from 12 seconds to under five seconds.

The task then recommended to try the standard edit commands in cool edit pro:

Copy
Paste
Delete
Trim
Select Entire Wave
Undo

The above wave incorporates the features, cut and paste. By highlighting a wave similarly to how I mentioned before and right clicking, I clicked "cut" I then chose the area I wished the section to now play at. For simplicity reasons, I chose to add it to the end of the file, right clicking and then selecting "Paste", you can audibly hear the difference, as the end of the clip contained music which gave a "fade-out" atmosphere. The edited wave hears the music fade-out, before playing the desired section from the start. I then tested the undo function, clicking "Edit > Undo Paste" followed by "Edit > Undo Cut" which reverted the image to its original state.

I then tested the "Copy" command, taking the same section that I previously "Cut" and again placing it on the end of the file. This time, as I chose "Copy" and not "Cut" instead of the sound being moved to the end of the wave, it's repeated at the start of the wave, now playing twice, increasing the duration from twelve seconds to roughly sixteen. The wave appears as such:

I then attempted the "trim" function next. This function crops a wav file, leaving only the highlighted part and making the rest of the file disappear. This can be handy if you are looking to extract a small part of a file, a single word perhaps. This is how my attempt at trimming appears:

The final command was "select entire wave". This one was fairly self explanatory, it just highlights the entire wave. This can be done by using the edit menu, or by pressing "ctrl" and "a" which is almost universal in computer programs for meaning "select all".

The next task was to "File > Open Append" two other sounds onto one which was originally loaded. Upon doing this, the system highlights to you with red and blue toggle functions where each sound starts and finishes. This basically adds the two files that were imported through the "open append function" to the file which is already open.

By default, Cool Edit Pro measures time in seconds, but I'm going to experiment with a different time measurement. This is done by clicking "View/Display Time", which gives you a list of options. I chose to change from "decimal" (seconds) to "samples"

The waveform itself does not change, but the units of measurement do. In samples, they range from 0 to roughly 55000, where as in decimal the range is from 0 to roughly 50.

Next I explored the different options available in the vertical axis. Below are screenshots of each of the settings for the vertical axis, which depicts the volume of the sound at each point of the wave. The difference in values is visible on the right hand column next to the wave.

1. Sample Values (Default)

2. Normalised Values (Ranging from -1 to 1)

3. Percentage (0% being the minimum, 100% being the maximum)

4. Decibels (dB standard measurement)

For the next part of the exercise, I opted to open another wav file. This part of the exercise required select parts and zooming in and out. The wave appeared as follows without any alterations

Zooming in on a certain section, by using the buttons underneath the wave leaves it looking like this:

This zooms in on the first word in the clip. Words are recognisable in Cool Edit Pro as the parts with high amplitude. Parts with prolonged sections of low amplitude are gaps in speech. Although I was able to determine where the first word would start and finish. I double checked by playing the sound back.

If you zoom in far enough you can see each individual sample as shown below. After exploring this, I zoomed out to give the image its original state.

I then went on to try some effects with the sound, which were as follows:

Invert- brings the amplitude closer together and cut's down on harsh sound between words.

Reverse- plays the sound backwards.

Silence- removes the amplitude of the sound, meaning it appears as silence.

Modifying the amplitude of an image is also a feature of Cool Edit Pro. In its original state, the peak amplitude of the waveform is 100%, by clicking Effects > Amplitude > Constant Amplitude and choosing the 6dB cut option, it decreases the amplitude, making the peak 50%. Both the images are shown below, the original followed by edited one. Having played the second one back, the sound is not quite as loud. Afterwards I boosted the amplification by 3db, which showed a peak of approximately 70% and then added 3dB once again to bring it back to it's original state. The third image is also shown below. The user can normalise the amplitude by clicking Effects/Amplitude/Normalise and you can select a percentage to normalise to.

Fade In/Out

Two other effects of Cool Edit Pro are too "Fade In" the amplitude and to "Fade Out". The former means the sound is edited in a way in which it starts off quietly and increases towards the end. "Fade Out" is the opposite, it's starts loudy but quietens down as the sound progresses. Both are shown below, "Fade In" first, followed by "Fade Out". Notice that in the first one, the higher maximum amplitude is towards the end of the wave, whereas in the second it is fairly close to the beginning.