如何使用Java Sound中的音频样本数据?




我正在使用< code>javax.sound.sampled进行回放和/或录音,但我想对音频做些什么。


如何访问音频样本数据以使用 Java 声音执行此操作?


这就是您从当前播放的声音中获取实际样本数据的方式。另一个出色的答案将告诉您数据的含义。除了我的Windows 10机器YMMV之外,我还没有在其他操作系统上尝试过它。对我来说,它拉取了当前系统默认的录音设备。在Windows上,将其设置为“立体声混音”而不是“麦克风”以获得播放声音。您可能必须切换“显示禁用设备”才能看到“立体声混音”。

import javax.sound.sampled.*;

public class SampleAudio {

    private static long extendSign(long temp, int bitsPerSample) {
        int extensionBits = 64 - bitsPerSample;
        return (temp << extensionBits) >> extensionBits;

    public static void main(String[] args) throws LineUnavailableException {
        float sampleRate = 8000;
        int sampleSizeBits = 16;
        int numChannels = 1; // Mono
        AudioFormat format = new AudioFormat(sampleRate, sampleSizeBits, numChannels, true, true);
        TargetDataLine tdl = AudioSystem.getTargetDataLine(format);
        if (!tdl.isOpen()) {
        byte[] data = new byte[(int)sampleRate*10];
        int read = tdl.read(data, 0, (int)sampleRate*10);
        if (read > 0) {
            for (int i = 0; i < read-1; i = i + 2) {
                long val = ((data[i] & 0xffL) << 8L) | (data[i + 1] & 0xffL);
                long valf = extendSign(val, 16);
                System.out.println(i + "\t" + valf);





    因此,我将列举< code>AudioFormat的类型。编码并描述如何自己解码它们。这个答案不会涵盖如何编码它们,但它包含在底部的完整代码示例中。编码大多只是解码过程的逆过程。




    此处显示的是采样并量化为 4 位的正弦波:

    (请注意,在 two 的补码表示中,最正的值比最负的值小 1。这是一个需要注意的小细节。例如,如果您正在剪辑音频并忘记了这一点,则正剪辑将溢出。


    为了解码 PCM 样本,我们不太关心采样率或通道数,所以我在这里不会说太多。通道通常是交错的,因此,如果我们有一个数组,它们将像这样存储:

    Index 0: Sample 0 (Left Channel)
    Index 1: Sample 0 (Right Channel)
    Index 2: Sample 1 (Left Channel)
    Index 3: Sample 1 (Right Channel)
    Index 4: Sample 2 (Left Channel)
    Index 5: Sample 2 (Right Channel)



    • 字节 [] 字节;音频输入流读取的字节数组。
    • 浮点[]样品;我们要填充的输出示例数组。
    • 浮样;我们目前正在处理的示例。
    • 长温度;用于常规操作的临时值。
    • 国际 i;字节数组中当前示例数据的起始位置。



    sample = sample / fullScale(bitsPerSample);

    其中全量程为 2位每次采样

    < code>byte数组包含拆分的样本帧,并且都在一行中。这实际上非常简单,除了所谓的字节顺序,即每个样本包中< code >字节的顺序。


      24-bit sample as big-endian:
     bytes[i]   bytes[i + 1] bytes[i + 2]
     ┌──────┐     ┌──────┐     ┌──────┐
     00000000     00100111     00001111
     24-bit sample as little-endian:
     bytes[i]   bytes[i + 1] bytes[i + 2]
     ┌──────┐     ┌──────┐     ┌──────┐
     00001111     00100111     00000000

    它们持有相同的二进制值;但是,< code >字节的顺序是相反的。

    • 在big-endian中,较重要的<code>字节
    • 在little-endian中,不太重要的<code>字节


    连接< code >字节并将它们放入我们的< code>long

    1. 按位和每个字节与掩码0xFF(即0b1111_1111)以避免自动升级byte时的符号扩展。(charbyteshort在对它们执行算术运算时会提升为int。)另请参见值是什么


    long temp;
    if (isBigEndian) {
        temp = (
              ((bytes[i    ] & 0xffL) << 16)
            | ((bytes[i + 1] & 0xffL) <<  8)
            |  (bytes[i + 2] & 0xffL)
    } else {
        temp = (
               (bytes[i    ] & 0xffL)
            | ((bytes[i + 1] & 0xffL) <<  8)
            | ((bytes[i + 2] & 0xffL) << 16)


    这也可以概括为一个循环,在这个答案底部的完整代码中可以看到。(请参见< code>unpackAnyBit和< code>packAnyBit方法。)



    int bitsToExtend = Long.SIZE - bitsPerSample;
    float sample = (temp << bitsToExtend) >> bitsToExtend.


    为了理解这是如何工作的,下面是符号扩展 8 位到 16 位的关系图:

     11111111 is the byte value -1, but the upper bits of the short are 0.
     Shift the byte's MSB in to the MSB position of the short.
     0000 0000 1111 1111
     <<                8
     1111 1111 0000 0000
     Shift it back and the right-shift fills all the upper bits with 1s.
     We now have the short value of -1.
     1111 1111 0000 0000
     >>                8
     1111 1111 1111 1111




    回想一下上一节(如何将字节数组强制转换成有意义的数据?)我们使用了< code>b


    for (int i = 0; i < bytes.length; i++) {
        int sample = (bytes[i] << 8) // high byte is sign-extended
                   | (bytes[i + 1] & 0xFF); // low byte is not
        // ...


      < li >无符号值0对应于最负的有符号值。 < Li > 2 < sup >位的无符号值示例


    float sample = temp - fullScale(bitsPerSample);



    实际上,浮点PCM通常是IEEE 32位或IEEE 64位,并且已经标准化为< code> 1.0的范围。可以使用实用程序方法< code > Float # intBitsToFloat 和< code > Double # longBitsToDouble 获取样本。

    // IEEE 32-bit
    float sample = Float.intBitsToFloat((int) temp);
    // IEEE 64-bit
    double sampleAsDouble = Double.longBitsToDouble(temp);
    float sample = (float) sampleAsDouble; // or just use double for arithmetic


    您可以将 A 法则概念化,μ法则,就好像它们是浮点格式一样。这些是 PCM 格式,但值的范围是非线性的。


    对于两者,压缩数据均为 8 位。标准A律在解码时为13位,μ在解码时为14位;但是,应用该公式可生成 ±1.0 的范围


    1. 由于涉及数据完整性的原因,一些位被标准反转以进行存储。
    2. 它们存储为符号和大小(而不是2的补码)。
    3. 该公式还需要±1.0的范围,因此必须缩放8位值。


    temp ^= 0xffL; // 0xff == 0b1111_1111

    (请注意,我们不能使用 ~,因为我们不想反转整型的高位。


    temp ^= 0x55L; // 0x55 == 0b0101_0101



    1. 检查是否设置了符号位。
    2. 如果是,清除符号位并否定数字。
    // 0x80 == 0b1000_0000
    if ((temp & 0x80L) != 0) {
        temp ^= 0x80L;
        temp = -temp;


    sample = temp / fullScale(8);



    sample = (float) (
        (1.0 / 255.0)
        (pow(256.0, abs(sample)) - 1.0)


    float signum = signum(sample);
    sample = abs(sample);
    if (sample < (1.0 / (1.0 + log(87.7)))) {
        sample = (float) (
            sample * ((1.0 + log(87.7)) / 87.7)
    } else {
        sample = (float) (
            exp((sample * (1.0 + log(87.7))) - 1.0) / 87.7
    sample = signum * sample;

    下面是< code > simple audio conversion 类的完整示例代码。

    package mcve.audio;
    import javax.sound.sampled.AudioFormat;
    import javax.sound.sampled.AudioFormat.Encoding;
    import static java.lang.Math.*;
     * <p>Performs simple audio format conversion.</p>
     * <p>Example usage:</p>
     * <pre>{@code  AudioInputStream ais = ... ;
     * SourceDataLine  line = ... ;
     * AudioFormat      fmt = ... ;
     * // do setup
     * for (int blen = 0; (blen = ais.read(bytes)) > -1;) {
     *     int slen;
     *     slen = SimpleAudioConversion.decode(bytes, samples, blen, fmt);
     *     // do something with samples
     *     blen = SimpleAudioConversion.encode(samples, bytes, slen, fmt);
     *     line.write(bytes, 0, blen);
     * }}</pre>
     * @author Radiodef
     * @see <a href="http://stackoverflow.com/a/26824664/2891664">Overview on Stack Overflow</a>
    public final class SimpleAudioConversion {
        private SimpleAudioConversion() {}
         * Converts from a byte array to an audio sample float array.
         * @param bytes   the byte array, filled by the AudioInputStream
         * @param samples an array to fill up with audio samples
         * @param blen    the return value of AudioInputStream.read
         * @param fmt     the source AudioFormat
         * @return the number of valid audio samples converted
         * @throws NullPointerException if bytes, samples or fmt is null
         * @throws ArrayIndexOutOfBoundsException
         *         if bytes.length is less than blen or
         *         if samples.length is less than blen / bytesPerSample(fmt.getSampleSizeInBits())
        public static int decode(byte[]      bytes,
                                 float[]     samples,
                                 int         blen,
                                 AudioFormat fmt) {
            int   bitsPerSample = fmt.getSampleSizeInBits();
            int  bytesPerSample = bytesPerSample(bitsPerSample);
            boolean isBigEndian = fmt.isBigEndian();
            Encoding   encoding = fmt.getEncoding();
            double    fullScale = fullScale(bitsPerSample);
            int i = 0;
            int s = 0;
            while (i < blen) {
                long temp = unpackBits(bytes, i, isBigEndian, bytesPerSample);
                float sample = 0f;
                if (encoding == Encoding.PCM_SIGNED) {
                    temp = extendSign(temp, bitsPerSample);
                    sample = (float) (temp / fullScale);
                } else if (encoding == Encoding.PCM_UNSIGNED) {
                    temp = unsignedToSigned(temp, bitsPerSample);
                    sample = (float) (temp / fullScale);
                } else if (encoding == Encoding.PCM_FLOAT) {
                    if (bitsPerSample == 32) {
                        sample = Float.intBitsToFloat((int) temp);
                    } else if (bitsPerSample == 64) {
                        sample = (float) Double.longBitsToDouble(temp);
                } else if (encoding == Encoding.ULAW) {
                    sample = bitsToMuLaw(temp);
                } else if (encoding == Encoding.ALAW) {
                    sample = bitsToALaw(temp);
                samples[s] = sample;
                i += bytesPerSample;
            return s;
         * Converts from an audio sample float array to a byte array.
         * @param samples an array of audio samples to encode
         * @param bytes   an array to fill up with bytes
         * @param slen    the return value of the decode method
         * @param fmt     the destination AudioFormat
         * @return the number of valid bytes converted
         * @throws NullPointerException if samples, bytes or fmt is null
         * @throws ArrayIndexOutOfBoundsException
         *         if samples.length is less than slen or
         *         if bytes.length is less than slen * bytesPerSample(fmt.getSampleSizeInBits())
        public static int encode(float[]     samples,
                                 byte[]      bytes,
                                 int         slen,
                                 AudioFormat fmt) {
            int   bitsPerSample = fmt.getSampleSizeInBits();
            int  bytesPerSample = bytesPerSample(bitsPerSample);
            boolean isBigEndian = fmt.isBigEndian();
            Encoding   encoding = fmt.getEncoding();
            double    fullScale = fullScale(bitsPerSample);
            int i = 0;
            int s = 0;
            while (s < slen) {
                float sample = samples[s];
                long temp = 0L;
                if (encoding == Encoding.PCM_SIGNED) {
                    temp = (long) (sample * fullScale);
                } else if (encoding == Encoding.PCM_UNSIGNED) {
                    temp = (long) (sample * fullScale);
                    temp = signedToUnsigned(temp, bitsPerSample);
                } else if (encoding == Encoding.PCM_FLOAT) {
                    if (bitsPerSample == 32) {
                        temp = Float.floatToRawIntBits(sample);
                    } else if (bitsPerSample == 64) {
                        temp = Double.doubleToRawLongBits(sample);
                } else if (encoding == Encoding.ULAW) {
                    temp = muLawToBits(sample);
                } else if (encoding == Encoding.ALAW) {
                    temp = aLawToBits(sample);
                packBits(bytes, i, temp, isBigEndian, bytesPerSample);
                i += bytesPerSample;
            return i;
         * Computes the block-aligned bytes per sample of the audio format,
         * using Math.ceil(bitsPerSample / 8.0).
         * <p>
         * Round towards the ceiling because formats that allow bit depths
         * in non-integral multiples of 8 typically pad up to the nearest
         * integral multiple of 8. So for example, a 31-bit AIFF file will
         * actually store 32-bit blocks.
         * @param  bitsPerSample the return value of AudioFormat.getSampleSizeInBits
         * @return The block-aligned bytes per sample of the audio format.
        public static int bytesPerSample(int bitsPerSample) {
            return (int) ceil(bitsPerSample / 8.0); // optimization: ((bitsPerSample + 7) >>> 3)
         * Computes the largest magnitude representable by the audio format,
         * using Math.pow(2.0, bitsPerSample - 1). Note that for two's complement
         * audio, the largest positive value is one less than the return value of
         * this method.
         * <p>
         * The result is returned as a double because in the case that
         * bitsPerSample is 64, a long would overflow.
         * @param bitsPerSample the return value of AudioFormat.getBitsPerSample
         * @return the largest magnitude representable by the audio format
        public static double fullScale(int bitsPerSample) {
            return pow(2.0, bitsPerSample - 1); // optimization: (1L << (bitsPerSample - 1))
        private static long unpackBits(byte[]  bytes,
                                       int     i,
                                       boolean isBigEndian,
                                       int     bytesPerSample) {
            switch (bytesPerSample) {
                case  1: return unpack8Bit(bytes, i);
                case  2: return unpack16Bit(bytes, i, isBigEndian);
                case  3: return unpack24Bit(bytes, i, isBigEndian);
                default: return unpackAnyBit(bytes, i, isBigEndian, bytesPerSample);
        private static long unpack8Bit(byte[] bytes, int i) {
            return bytes[i] & 0xffL;
        private static long unpack16Bit(byte[]  bytes,
                                        int     i,
                                        boolean isBigEndian) {
            if (isBigEndian) {
                return (
                      ((bytes[i    ] & 0xffL) << 8)
                    |  (bytes[i + 1] & 0xffL)
            } else {
                return (
                       (bytes[i    ] & 0xffL)
                    | ((bytes[i + 1] & 0xffL) << 8)
        private static long unpack24Bit(byte[]  bytes,
                                        int     i,
                                        boolean isBigEndian) {
            if (isBigEndian) {
                return (
                      ((bytes[i    ] & 0xffL) << 16)
                    | ((bytes[i + 1] & 0xffL) <<  8)
                    |  (bytes[i + 2] & 0xffL)
            } else {
                return (
                       (bytes[i    ] & 0xffL)
                    | ((bytes[i + 1] & 0xffL) <<  8)
                    | ((bytes[i + 2] & 0xffL) << 16)
        private static long unpackAnyBit(byte[]  bytes,
                                         int     i,
                                         boolean isBigEndian,
                                         int     bytesPerSample) {
            long temp = 0;
            if (isBigEndian) {
                for (int b = 0; b < bytesPerSample; b++) {
                    temp |= (bytes[i + b] & 0xffL) << (
                        8 * (bytesPerSample - b - 1)
            } else {
                for (int b = 0; b < bytesPerSample; b++) {
                    temp |= (bytes[i + b] & 0xffL) << (8 * b);
            return temp;
        private static void packBits(byte[]  bytes,
                                     int     i,
                                     long    temp,
                                     boolean isBigEndian,
                                     int     bytesPerSample) {
            switch (bytesPerSample) {
                case  1: pack8Bit(bytes, i, temp);
                case  2: pack16Bit(bytes, i, temp, isBigEndian);
                case  3: pack24Bit(bytes, i, temp, isBigEndian);
                default: packAnyBit(bytes, i, temp, isBigEndian, bytesPerSample);
        private static void pack8Bit(byte[] bytes, int i, long temp) {
            bytes[i] = (byte) (temp & 0xffL);
        private static void pack16Bit(byte[]  bytes,
                                      int     i,
                                      long    temp,
                                      boolean isBigEndian) {
            if (isBigEndian) {
                bytes[i    ] = (byte) ((temp >>> 8) & 0xffL);
                bytes[i + 1] = (byte) ( temp        & 0xffL);
            } else {
                bytes[i    ] = (byte) ( temp        & 0xffL);
                bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
        private static void pack24Bit(byte[]  bytes,
                                      int     i,
                                      long    temp,
                                      boolean isBigEndian) {
            if (isBigEndian) {
                bytes[i    ] = (byte) ((temp >>> 16) & 0xffL);
                bytes[i + 1] = (byte) ((temp >>>  8) & 0xffL);
                bytes[i + 2] = (byte) ( temp         & 0xffL);
            } else {
                bytes[i    ] = (byte) ( temp         & 0xffL);
                bytes[i + 1] = (byte) ((temp >>>  8) & 0xffL);
                bytes[i + 2] = (byte) ((temp >>> 16) & 0xffL);
        private static void packAnyBit(byte[]  bytes,
                                       int     i,
                                       long    temp,
                                       boolean isBigEndian,
                                       int     bytesPerSample) {
            if (isBigEndian) {
                for (int b = 0; b < bytesPerSample; b++) {
                    bytes[i + b] = (byte) (
                        (temp >>> (8 * (bytesPerSample - b - 1))) & 0xffL
            } else {
                for (int b = 0; b < bytesPerSample; b++) {
                    bytes[i + b] = (byte) ((temp >>> (8 * b)) & 0xffL);
        private static long extendSign(long temp, int bitsPerSample) {
            int bitsToExtend = Long.SIZE - bitsPerSample;
            return (temp << bitsToExtend) >> bitsToExtend;
        private static long unsignedToSigned(long temp, int bitsPerSample) {
            return temp - (long) fullScale(bitsPerSample);
        private static long signedToUnsigned(long temp, int bitsPerSample) {
            return temp + (long) fullScale(bitsPerSample);
        // mu-law constant
        private static final double MU = 255.0;
        // A-law constant
        private static final double A = 87.7;
        // natural logarithm of A
        private static final double LN_A = log(A);
        private static float bitsToMuLaw(long temp) {
            temp ^= 0xffL;
            if ((temp & 0x80L) != 0) {
                temp = -(temp ^ 0x80L);
            float sample = (float) (temp / fullScale(8));
            return (float) (
                (1.0 / MU)
                (pow(1.0 + MU, abs(sample)) - 1.0)
        private static long muLawToBits(float sample) {
            double sign = signum(sample);
            sample = abs(sample);
            sample = (float) (
                sign * (log(1.0 + (MU * sample)) / log(1.0 + MU))
            long temp = (long) (sample * fullScale(8));
            if (temp < 0) {
                temp = -temp ^ 0x80L;
            return temp ^ 0xffL;
        private static float bitsToALaw(long temp) {
            temp ^= 0x55L;
            if ((temp & 0x80L) != 0) {
                temp = -(temp ^ 0x80L);
            float sample = (float) (temp / fullScale(8));
            float sign = signum(sample);
            sample = abs(sample);
            if (sample < (1.0 / (1.0 + LN_A))) {
                sample = (float) (sample * ((1.0 + LN_A) / A));
            } else {
                sample = (float) (exp((sample * (1.0 + LN_A)) - 1.0) / A);
            return sign * sample;
        private static long aLawToBits(float sample) {
            double sign = signum(sample);
            sample = abs(sample);
            if (sample < (1.0 / A)) {
                sample = (float) ((A * sample) / (1.0 + LN_A));
            } else {
                sample = (float) ((1.0 + log(A * sample)) / (1.0 + LN_A));
            sample *= sign;
            long temp = (long) (sample * fullScale(8));
            if (temp < 0) {
                temp = -temp ^ 0x80L;
            return temp ^ 0x55L;

