Voice-over-IP(VoIP) software is a type of widely spread and pervasive software. However, its drawback is ignored that VoIP transmits information along with voice, such as keystroke sounds, which reveals what someone is typing on a keyboard. In general, circumvention of cryptograhic-based data protection techniques requires comprising one of the end-hosts to capture plain-text before it is encrypted. A more convenient way to capture plain-text before encryption without comprising a system is eavesdropping on unintentional leakage of physical emanations that often happen during the regular devices’operations, including electromagnetic , visual, tactile and acoustic emanations. And I/O peripherals (e.g., keyboards, mice, touch-screens, and printers) become convenient targets for physical eavesdropping attacks because I/O peripherals directly leak information on the unencrypted input or output text. The exploitation of keyboard acoustic emanations have already been proved effective in reconstructing the typed input and learn what a victim is typing via analyzing the sound produced by the keystrokes. Keystrokes are recorded either directly, using microphones, or by exploiting various sensors(e.g., accelerometers). Once collected through eavesdropping, the audio stream is typically using techniques like supervised and unsupervised machine learning or triangulation to fully or partially reconstruct victim’s input. In the past years, all proposed keyboard acoustic eavesdropping attacks required a comprised(i.e., controlled by the adversary) microphone near the victim’s keyboard requiring physical access, therefore strongly limit applicability of such attacks and reduce real-world feasibility. Recent proposals called Skype&Type attack(or S&T attack for short) relaxed the physical proximity requirement by exploiting VoIP applications to move the adversary in a remote-setting scenario. Launching such attacks premise to the observation that people involved in VoIP calls often engage in secondary activities, many of which involve using the keyboard(e.g., entering a password). VoIP software automatically acquires all acoustic emanations including those of the keyboard and faithfully transmits them to all other parties involved in the call as well. Hence, this provides opportunities for one or more possible parties malicious to determine what the user typed based on keystroke sounds. Such an adversary is realistic inasmuch as it is not always the case that two parties engaged in a VoIP call have mutual trust, e.g., when between lawyers on opposite sides of a legal case or negotiations for different parties. Additionally, the pervasiveness of VoIP software provides an attacker with a huge attack surface, thereupon it is hard to achieve desirable defense performance with previous approaches. Considering Microsoft Skype alone, one very popular VoIP software, the number of active monthly user is about 300 million. This conveys enough audio information to reconstruct the victim’s input——keystrokes typed on the remote keystroke. The aforementioned facts, to a certain extent, have eavesdropping on keyboard inputs become an active and popular area of research. In this paper, the authors present and assess a new keyboard acoustic eavesdropping attack involved VoIP, called Skype&Type(S&T). Unlike previous attacks that assume a stronger adversary model, S&T is more practical and feasible in many real-world settings, without requiring physically close to the victim(either in person or with a recording device) and precise profiling of the victim’s typing style and keyboard. Besides, S&T can work a very small amount of leaked keystrokes which are likely during a VoIP call. The experiments show that S&T attains top-5 accuracy of 91.7% in guessing a random key pressed by the victim, and S&T is effective with many different recording devices (such as laptop microphones, headset microphone, and smartphones located in proximity of the target keyboard), diverse typing styles and speed. In particular, S&T achieves a higher attack success rate when the victim is typing in a known language. The contributions of this paper are concluded as follows:(1) the authors demonstrate S&T attack based on remote keyboard acoustic eavesdropping over VoIP software, with the goal of recovering text by the users during a VoIP call with the attack and random text as well, such as randomly generated passwords or PINs. (2) S&T attack is highly accurate with minimal profiling of the victim’s typing style and keyboard and remains quite accurate even if no profiling is available to the adversary, ergo is more feasible and applicable to real-world settings under realistic assumptions. (3) Extensive experiments show that S&T works well with different common and inexpensive recording devices on a great variety of typing styles and speed, and is also robust to VoIP-related issues, such as limited available bandwidth that degrades call quality, as well as human speech over keystroke sounds. (4) Based on the insights from the design and evaluation phases of this work, the authors propose a countermeasure to S&T and similar attacks that exploit spectral properties of keystroke sound. Their countermeasure is transparent and does not severely input the quality of the voice during the call, and is able to disrupt spectral features——making previous data collected by an adversary useless. The novel contributions of this work, compared to the preliminary version, lie in a greatly extended experimental evaluation in improvements to the performance of S&T and propose countermeasure.