The fact is that the real world is continuous (Quantum mechanics is far beyond the scope of our subject!) and the Digital world is not. More precisely time, distance, and pressure are all continuous quantities. They can take an infinite number of values and are infinitely divisible.
When you perform an analog recording you stay in the continuous world. When recording to vinyl for example, the pressure level versus time is mapped on the depth of the vinyl versus the position on the groove which are both continuous quantities (no muss, no fuss). When you perform a digital recording however, you have to map the time on memory cells and the pressure level as an integer value, which are two discrete quantities.
To achieve this, we have to sample the signal. This simply means taking a finite number samples of the original signal, and converting it into an integer to be stored in memory. The sampling rate is then the number of equally spaced samples that we take for one second of signal, and the bit depth is the number of bits used for encoding each sample in memory.
For example, a sampling rate of 48 kHz and a bit depth of 16 bits means that we take 48 000 equally spaced samples per second of signal, and each of those samples get an integer value between -32 768 and 32 767. For signed integer the min and max values are given by:
min = -2^(bit_depth − 1)
max = 2^(bit_depth − 1) − 1
For the sampling rate the answer can be derived from the Shanon theorem: If an analog signal has no frequency components higher than a given frequency, then if you sample it at two time this frequency you can exactly reconstruct the original analog signal, passing the sampled signal through a perfect lowpass filter cutting at this frequency. As the human ear can not perceive audio signals above 22 kHz, if you want to exactly reproduce all the frequencies below 22 kHz, you have to use a sampling rate two times greater, which yields 44 kHz, and you must be absolutely sure not to have frequency components higher than 22 kHz in the analog signal that you are sampling.
Bit depth is a matter of the audible range we can actually hear. The human ear can perceive sound amplitude between 0 dB SPL and 130 dB SPL ( i.e. loud enough to seriously screw up your ears!!!).
If a piece of music is between 10 and 100 dB SPL, it yields a dynamic range of 90 dB (= 100 − 10).
Hence, a bit depth of 16 yields: 20 × log(2^(16 − 1)) = 90.3 dB (great!)
This is why standard CD quality (44 100 kHz, 16-bit) is supposed to be enough to perfectly reproduce any piece of music.
The bit depth will not affect the CPU performance, in a computer all operations are made on 32-bit integer or 32-/64-bit float, the CPU and the floating point unit are designed for that. Thus, you'll have no problems dealing with 24-, or even 32-bit floating point files.
Sampling rate is a different story altogether though. It will literally take twice as much CPU power to process audio at a sampling rate of 96 kHz as it would to process the same file at 48 kHz, simply because you have twice the processing to do.
Based on what I've said above, it should be obvious that 44100 kHz, 16-bit should be enough for recording or playing back digital audio in a majority of applications, as only truly professional gear is able to really achieve a dynamic range greater than 90 dB, and a frequency response wider than DC to 22 kHz. I'm speaking specifically here of microphones, ADC, DAC, loudspeakers, etc. However there are some reasons which may cause professionals to choose more accurate formats. In the past when all the digital devices worked in fixed point, the problem was that every operation along the signal path such as mixing, filtering, adding effects, etc resulted in a slight loss of dynamic range, as well as numerical distortion due to the computation errors.
For example, let's say we have a 16-bit signal (1244) and attenuate it by a factor 10. We then boost the result (124) by a factor 2. We receive a result of (248) which is incorrect. 1244 divided by five (10 ÷ 2) yields 249 and not 248. Here is the numerical error. If you do the same with a signal having a value less than 5 you will get 0. Here is the dynamic lost.
This clearly shows that in this situation your final mix will not have a 90 dB dynamic range. The solution is then to increase the bit depth in order to compensate for the loss in the mix and maintain a final result with a dynamic range greater than 90 dB, and finally converting it in 16-bit should we want to burn it to CD. Note that this problem is avoided if all elements of your mixing chain work in 32-bit fixed point or in 32-bit floating point, as is the case with almost all music production software at this point. Increasing the bit depth from 16 to 24 bits will (only) provide you a theoretical dynamic range of 130 dB for your input and/or output signal. This will make you absolutely sure that the digitization of the signal in your studio will not result in a loss of dynamic range.
For sampling rate, these days the quality of the ADC and DAC have been greatly improved. I'm speaking mainly of oversampling techniques in which the implementation of antialias filtering is no longer an issue. In the past these filters had to have a very sharp roll off, making them difficult to implement, and often generated audible artifacts. But now I think it's safe to say that good converters are able to reproduce audio with virtually no audible artifacts through the entire frequency range, from DC to half of the sampling rate. So, I would say that you have to use a sampling rate higher than 48 kHz, only if you want to record spectral components higher than 24 kHz.
In conclusion, when we talk about 24/96 and the dynamic and frequency range associated, it must be clear that we're talking about the dynamic and frequency range of the numeric signal. But numeric signal doesn't exist before being tracked (recorded), and is totally useless if not converted to analog, amplified, and reproduced through speakers. If you want to know the dynamic and frequency range that your studio can provide, you have to take into account every devices in the signal path. I have taken a look at the technical specifications of high quality 24-bit A/D converters on the net and the majority have a dynamic range around 100 dB. As they are 24-bit, they could theoretically provide a dynamic range of 130 dB, but in the end, the quality is limited by that of their analog components.
Almost all the soft synths I know do there internal processing on 32-bit floats. That is to say that they will not have any loss of quality due to internal computation, because 32-bit floats give a much higher precision than either 16- or 24-bit integers. For the sampling rate it is more vicious. In fact there are several things that you have to handle with care when developing a soft synth. I'm thinking especially of band-limited interpolation (for pitch shifting/time stretching), band-limited wave form generation (for generating analog like wave-forms) and analog filter implementation. These operations can generate artifacts if you do not handle them correctly, or oversimplify your algorithm. These problems mainly arise in the high frequencies. For example, a VCO can generate an alias above a certain frequency, or a resonant filter can have a biased Q factor in the higher frequencies. Oversimplification can be responsible for muddy signals as well. In these cases, increasing the sampling rate will push these artifacts to a higher frequencies, making them less audible, and the signal cleaner. But I don't think that it is wise to use a simplified algorithm ansd simply increase the sampling rate to hide it's artifacts, I prefer to make synths which generate as few artifacts as possible even at low sampling rates.
I don't know. I've never had the chance to be in an ideal enough listening condition that I could hear it. The only relevant experience I had was making A/B comparisons between professional DAT and professional analog recorders, I wasn't allowed to see my results, but the overall result of the tests were that people couldn't tell the difference.
I'm certainly not the right person to ask about that. I suspect that sound quality is not the only reason for changing the formats... So now we're at 24-bit/192 kHz; well... I think that it's quite a challenge for our ears! [Not to mention our dog's ears! —ed.]
More seriously, it seems quite convenient to measure the quality of things with one or two numbers, but we all know that it is more complicated than that. I should say that if you want to measure the quality of your converters, it's probably wiser to do it based on the price rather than bit depth/sampling rate. As a conclusion, if you want to have good results, the best advice is to only use high quality components, and whenever possible, avoid bit depth and sampling rate conversions.