Wednesday, 12 December 2012

Lab 5: Audio Signal Processing

The purpose of this lab was to download a sound, entitled "speechtone.wav" and edit it to improve the sound. On first listen, there was a high pitched sound interfering with the clip and the speech was very muffled and incoherent. In cool edit pro, the speech appeared like this in its original state:



I was then left with the following wave, which by this point had made the speech far more coherent but the high pitch sound still remained. The improvement in the sound was achieved by applying various notch filters at 440Hz and 2dB.


I then continued to add notch filters as before but supplemented this by adding a hiss reduction, as well as lower tones and upper tones. There was a massive drone at the start of the sound and at the end, so I decided to significantly reduce the amplitude of both of these parts, by three decibels at a time, until they were drowned out sufficiently. The wave now appeared like this:


The next part of the lab was to make the speaker in the sound appear to be angry. I aimed to do this by making the beginning of each sentence louder, boosting the amplitude by 6 decibels. I also added a 6 decibel boost to last section of speech so the sound seemed to maintain being angry. I increased the angry tone, making it appear as if the speaker had raised their voice by also adding in an amplitude feature named fade in, and gave it a 10 decibel setting.



The next part of the lab was to make it sound like the sound had taken place in a church. I did this by adding a "reverb" delay effect and choosing the "large empty hall" option. As the name suggests, it gives the effect of being in an empty hall, which a church can often sound like. The speech sounds more bellowing and as if it may produce an echo. 


Finally, I had to incorporate a bell sound into the file. I downloaded a wav file from the internet which was a repeated church bell. I opened this file up in Cool Edit Pro and then selected "Copy" and "Mix Paste". After I performed this action the bells continued long after the speech, so I decided to cut that bit out, so the file wasn't too prolongued. The bells being added in not only gave the sound the desired effect, but also helped filter out the slight remaining high pitched sound. The final shot looked like this:







Thursday, 6 December 2012

Lab 7: Video Processing

The first step of the lab was to download video files, which would be used in the lab. The purpose of this lab is to show our understanding of how to edit video files. The particulars of the task is to make a video which is exactly one minute long, adding appropriate background music and aiming to incorporate themes of drama and cuteness.

Having analysed all of the videos which were suitable to edit, I carefully considered how I could make the video appear "cute" and "dramatic". All of the footage was about ducks and swans and by reputation swans are more aggressive so I decided to create the drama by trying to replicate a game of "hide and seek" with the swan coming to find the ducks. 

The video had to be exactly a minute long and consist of footage from at least three of the files we had to download. The first of the three was a clip of various ducks scurrying away from swans across a pond. I managed to supplement this by the second one being a clip of a solitary swan looking around, as if to suggest it was looking to find where the ducks may be hiding. I moved from this with my final clip which showed a duck swimming into a bushed area in the water, creating the effect that if he was in this area he would be invisible to the onlooking swan.

Finally, I added music to make the video more complete. I choose piano music which had a frequency which created a dramatic effect, but a light hearted vibe to it to maintain the cuteness  requested.

Another thing I considered throughout the lab was to make sure that when I was changing to a scene from a different video, the transition appeared as natural as possible and gave the user the impression that the footage was all from the one video.

The video can be viewed at the below link on YouTube:

http://www.youtube.com/watch?v=LseWTfAahXQ

Tuesday, 4 December 2012

Lecture 7: Looking at Light, Part Two

Light:


Light is a form of energy detected by a light, which depending on certain factors can appear in the form of a wave or a stream of particles. As we have learned previously, sound needs a medium to travel through, but this is not the case with light, it can travel from the sun through space to reach earth.

Having previously covered longitudinal and transverse waves regarding sound, light falls into the second category, where the waves resemble a ripple in water. The vibrations occur at right angles to the direction of movement from the source which created the wave. Waves of light form what is referred to as a radio wave and when the eye recognises this wave it takes the part of the radio receiver. Light which is visible to humans tend to occur between 400 THz to 750 THz (Terahertz are Hertz multiplied by 10 to the power of 12). The range of frequencies appear in a vast range of colours, typically low frequencies appear red and high frequencies appear violet with various other stages in between. The colour white appears as a mixture of many different colours.

Velocity of Light



In a vacuum, light has a constant speed of 300,000 kilometres per second, with it being slighty less and air and down to approximately 200,000 in glass. The halt in speed in glass means it is useful for lenses. Light travels about a million times faster in air than speed does.

To travel a 6m length, you can measure the time taken by the following formula.

Time taken = Wavelength/Velocity

= 6 / (300000000)
= 20 ns

The frequency of a light wave is the number of complete cycles per second, which is independent from the medium it is travelling through. The formula for working out Velocity is to multiply the frequency by the wavelength. It is the same formula discussed at length in lecture one.

A 500 THz lightwave has a wavelength in air.

Wavelength = Velocity (300 x10 to the power of 6) / Frequency (500 x10 to the power of 12)

= 600 mm

The below chart shows the frequencies at which colours are prevalent, so we can assume this wave will be orange. It would still appear the same if the medium to change from air to glass, as  like previously mentioned, the frequency is independent to the medium.



White light, originating from the sun, it's a combination of all of the colours in the visible spectrum. Some colours have more presence than others and this varies greatly from source to source. 



The above diagram shows a Visible Light Spectrum, showing which kind of waves have which levels of frequency from the low frequency radios and microwaves, moving towards the higher frequency X-rays and gamma rays. Below is another diagram which gives an indication on frequency for each independent colour on the spectrum. The variation in colours of visible light have different frequencies.



The majority of light sources don't radiate single colour light, often referred to as monochromatic. However, a rare source for this is the yellow street light (sodium), giving off the effect of total colour-blindness, aided by the incorporation of shades of grey.

The brightness of light is measured by units called candela and human. These are more prevalently used by scientists rather than photographers, who use light exposure meters measured against a scientific standard. Object brightness is an extremely biased quality and is very much dependent on reflectance, colour and surroundings as well as the possible different state of the onlooking eye.

Environmental Effects On Light:

Both the atmosphere and surrounding objects affect light ways by the following means:


  1. Transmission
  2. Reflection
  3. Absorption
  4. Scattering
  5. Refraction
A large fraction of the incident light is transmitted by transparent or translucent objects. For example, the reflection of the surface surroundings are what is revealed in glass, were there none it would be invisible. Reflecting glass on to an object can make the shape appear differently. 

Light intensity varies:

The intensity of light received from a source varies inversely as the square of the distance R from the source as 1/R squared. This means that a light reflected from an object will have one twenty-fifth of the intensity at a distance of five metres from observation, than it would from one metre.

Editing the colours and the contrast of an image can affect the perception of distance. For example as the amount of blue in an image is increased and the amount of contrast is decreased, an image will appear further and further away.

On the occurrence of energy travelling between mediums, typically some is passed while some is reflected. Reflections from a flat boundary appear like a mirror, but reflections from a curved surface tends to be focused to a point, line or area.

Specular reflection normally takes a mirror like form, whereas diffuse light occurs as a result of light scattering in various directions, adjacent objects typically colour each other through their reflective light, despite occasionally casting shadows on each other. An example of diffuse light would be the sun giving a piece of clothing a brighter illusion.

It is often asked why the sky is blue, when the air around us appears to have no visibility. Various factors in the air - molecules, vapour and dust particles - scatter sunlight throughout the air and short wavelength light is far more common than long wavelength light and as previously described, blue light occurs at short wavelengths, with long wavelength light going more towards red light. The sky appears more red upon the sun setter as the dust and vapour are travelling a longer path, through denser air, which compromises much of the blue colour, which appears to be scattered away. As a result of this, it becomes evident how a user can simulate a sunset effect on an image, as the image will be comprised of blue, green and red colours, it makes sense to increase the red at the expense of the other two, which will be further aided by being reduced, to reach the desired effect.

Refraction is when light gets bent being passed between mediums and occurs due to the fact that the speed of light varies in different material. When light takes an angled approach at hitting a boundary, it must decrease or increase in speed before it passes through and reaches the new material.













  












































Monday, 3 December 2012

Lecture 4: Speech

Human Speech:



A learned communication system. Consists of three major components- voice, articulation and language skills.  Voice (also known as phonation) is the sound resulting from the movement of air through the vibration of the vocal cords of the voice box (larynx). The vocal chords consist of an elastic tissue, used for connections, which is covered by folds of mucous membrane. Vibration can occur by air passing in or out the lungs, it is possibly apt to liken it to the reed of a harmonica in this respect. The pitch that the vibrations take place add can be modified by muscles that see the vocal cords shortened and tightened for high pitched tones and loose and longer for low frequency tones. Approximately, the frequency range of speech is 80 Hz to 8 KHz, meaning the highest pitch of high quality speech is one hundred times higher than the lowest pitch.

There is a difference between speech and singing. A properly trained singer can produce a wider range of sounds than are produced in normal speech. They also aim to control their breath and have more regulated tension in their vocal chords. Yodelling for example is something that wouldn't be achievable by singing as it constantly alternates between high and low registers.

Articulation is when the larynx modifies air flow and the production of phonemes (basic speech sounds) is complete, the phonemes are then combined to form words of a language. The movement of tongue, lips, lower jaw and soft palate articule speech, interrupting and shaping the voiced and unvoiced airflow. There are approximately 40 phonemes for the English language which are classified in four categories - vowels, nasals, plosives and fricatives. This is specific to the English language, it may vary for others.

Phoneme Types:

Vowels: Generated from oscillatory excitation of the vocal track. In this process, the articulators remain static and the sound radiates from the mouth. Vowels are acoustically served by the first three of four vocal tract harmonic resonances, of which the correct terminology for is formants.

Nasals: When sound is radiated from the nasal cavity due to the raising of the tongue and the lowering of the velum (soft palate), nasals are generated:

Plosives: Produced by the sudden burst of pressure, resulting from the front of the vocal tract being shut by the tongue or the lips.

Frictatives: Similar to plosives, but these are formed by the partial restriction of the vocal tract rather than the full closure of it.

Time Domain Features:

Speech showcases bursts of localised and differentiated activity and by rule is never stationary.  Despite there being no gaps between words in speech, there can still be silence periods between them, the word speech itself containing two. This can sometimes aid the diagnosis of noise characteristics. Amplitude can very in speech, this can be referred to as a modulation process. Speech modulation tends to range between 0.25 Hz to 25 Hz, with the standard peak being between 3-6 Hz.

Voice Production Summary:

Lungs take the role of air reservoir and bellows, which act to force air between the vocal cords of the larynx. Unvoiced sounds form when the cords are relaxed and fail to vibrate, this can be simplified as a silence.

A Model of Voice Production:

This is often known as signal generation and can be used to create synthetic speech. Learned patterns in the brain interact with the nervous system, who in turn interact with local musculature,  which generates signals using the vocal chords, power supply using the lungs and resonators and articulators to process the signals.

Assumptions of Linear Predictive Coding

The vocal tract's characteristic mean that shape can not be changed at a particularly quick rate, current sound can often be found to be derived from the sound produced shortly before, approximately 25 ms. This means over this short period of time, new speech can be predicted to an extent, meaning the possibility of a recursive digital filter, measured in Hertz. 

Coding advantages are that as well as generating synthetic speech, the LPC approach has a coding advantage as the parameters to specify content in a speech frame is significantly less than the number of samples contained in that particular frame. This aids reduction of storaged requirement for digitised speech signals and reduces bandwidth for communication links in mediums such as mobile phone and internet systems.