CocoaAsyncSocket 文档2:常见陷阱

汪坚
2023-12-01

原文:https://github.com/robbiehanson/CocoaAsyncSocket/wiki/CommonPitfalls

Common Pitfalls - Don’t Be A Victim

Over the years we’ve noticed that many issues arrise from general confusion about the TCP protocol. Arm yourself with knowledge so you don’t lose time in the future.

常见的陷阱-不要成为受害者

多年来,我们已经注意到,许多问题一般都是对TCP协议的理解混乱。用知识武装自己,这样你就不会在未来的时候浪费时间。

TCP is a stream


The TCP protocol is modeled on the concept of a single continuous stream of unlimited length. This is a very important concept to understand, and is the number one cause of confusion that we see.

What exactly does this mean, and how does it affect developers?

Imagine that you’re trying to send a few messages over the socket. So you do something like this (in pseudocode):

socket.write("Hi Sandy.");
socket.write("Are you busy tonight?");

How does the data show up on the other end? If you think the other end will receive two separate sentences in two separate reads, then you’ve just fallen victim to a common pitfall! Gasp! Read on.

TCP does not treat the writes as separate data. TCP considers all writes to be part of a single continuous stream. So when you issue the above writes, TCP will simply copy the data into its buffer:

TCP_Buffer = "Hi Sandy.Are you busy tonight?"

and then proceed to send the data as fast as possible. And in order to send data over the network, TCP and other networking protocols will be required to break that data into small pieces that can be transmitted over the medium (ethernet, WiFi, etc). In doing so, TCP may break apart the data in any way it sees fit. Here are some examples of how that data might be broken apart and sent:

"Hi San" , "dy.Ar" , "e you " , "busy to" , "night?"
"Hi Sandy.Are you busy" , " tonight?"
"Hi Sandy.Are you busy tonight?"

The above examples also demonstrate how the data will arrive at the other end. Let’s consider example 1 for a moment.

Sandy has issued a socket.read() command, and is waiting for data to arrive. So the result of her first read might be “Hi San”. Sandy will likely begin to process that data. And while the application is processing the data, the TCP stream continues to receive the 2nd and 3rd packet. Sandy then issues another socket.read() command, and this time she gets “dy.Are you “.

This highlights the continuous stream nature of TCP. The TCP protocol, at the developer API level, has absolutely no concept of packets or separation of data.

But isn’t this a major shortcoming? How do all those other protocols that use TCP work?

HTTP is a great example because it’s so simple, and because most everyone has seen it before. When a client connects to a server and sends a request, it does so in a very specific manner. It sends an HTTP header, and each line of the header is terminated with a CRLF (carriage return, line feed). So something like this:

GET /page.html HTTP/1.1
Host: google.com

Furthermore, the end of the HTTP header is signaled by two CRLF’s in a row. Since the protocol specifies the terminators, it is easy to read data from a TCP socket until the terminators are reached.

Then the server sends the response:

HTTP/1.1 200 OK
Content-Length: 216

{ Exactly 216 bytes of data go here }

Again, the HTTP protocol makes it easy to use TCP. Read data until you get back-to-back CRLF. That’s your header. Then parse the content-length from the header, and now you can simply read a certain number of bytes.

Returning to our original example, we could simply use a designated terminator for our messages:

socket.write("Hi Sandy.\n");
socket.write("Are you busy tonight?\n");

And if Sandy was using AsyncSocket she would be in luck! Because AsyncSocket provides really easy-to-use read methods that allow you to specify the terminator to look for. AsyncSocket does the rest for you, and would deliver two separate sentences in two separate reads!

TCP是一种流(没有数据边界)


TCP协议是一种,无限的,连续的,单一的,流的概念。这是一个非常重要的概念,是我们所看到的混淆的头号原因。

这是什么意思,以及如何影响开发人员呢?

想象一下你试图通过Socket发送消息。然后你这样做(伪码):

socket.write("Hi Sandy.");
socket.write("Are you busy tonight?");

数据如何显示在另一端?如你认为对方会分两次接收到两个独立的句子,你就被坑了……

TCP并不把写操作作为分开的数据。TCP认为写的是一个连续的流的一部分。所以当你的执行写操作,TCP仅仅是将数据复制到缓冲区:

TCP_Buffer = "Hi Sandy.Are you busy tonight?"

然后尽可能快的发送数据。为了在网络上发送数据,TCP和其他网络协议将需要分解成小块,这样数据可以在不同传输的介质(以太网,WiFi,等)上传输。在这样做时,TCP会按照它认为合适的任何方式分解数据。下面的例子将说明数据是如何被分解的:

"Hi San" , "dy.Ar" , "e you " , "busy to" , "night?"
"Hi Sandy.Are you busy" , " tonight?"
"Hi Sandy.Are you busy tonight?"

上面的例子也演示了数据将如何到达远端。让我们思考一下例子1。

sandy已经执行了socket.read()命令,并等待数据的到来。所以她第一次读的结果可能是“Hi San”。sandy将会开始处理数据。当应用程序处理数据,TCP流继续接收第二和第三的数据包。sandy就执行另一个Socket.read()命令,这一次她收到“dy.Are you”。

这里强调了TCP流的连续性。TCP协议,在开发者API级别,完全没有数据包或数据的分离的概念。

但这不是一个显著的缺点吗?他协议如何使用TCP的工作的呢?

HTTP是一个伟大的例子因为它很简单,因为很多人都知道它。当一个客户端连接到服务器并发送一个请求时,它会以一种非常特殊的方式进行。它发送一个HTTP header,每一行的Header用一个CRLF终止(回车,换行)。如下:

GET /page.html HTTP/1.1
Host: google.com

此外,对HTTP Header结束的标志是两个连续的CRLF。因为协议指定了消息边界,它易于从一个TCP Socket读取数据直到达到数据边界。

然后服务器发送响应:

HTTP/1.1 200 OK
Content-Length: 216

{ Exactly 216 bytes of data go here }

这里,HTTP协议让TCP协议更容易使用。读取数据直到你得到连续的回车换行符。这是你的Header。然后从Header中解析内容长度,现在你可以简单地按照自己读数据。

回到我们最初的例子,我们可以简单地使用一个为我们的消息指定的消息边界:

socket.write("Hi Sandy.\n");
socket.write("Are you busy tonight?\n");

如果sandy是使用asyncsocket她很幸运!因为asyncsocket提供易于使用的阅读方法,允许你指定要查找的消息边界。asyncsocket会分两次阅读,两个单独的句子!

Writes


What happens when you write data to a TCP socket? When the write is complete, does that mean the other party received that data? Can we at least assume the computer has sent the data? The answer is NO and NO.

Recall two things:

  • All data sent and received must get broken into little pieces in order to send it over the network.
  • TCP handles a lot of complicated issues such as resending lost packets, and providing in-order delivery so information arrives in the proper sequence.

So when you issue a write, the data is simply copied into an underlying buffer within the OS networking stack. At that point the TCP software will begin its magic, which consists of all the cool stuff mentioned earlier such as:

  • breaking the data into small pieces such that they can be sent over the network
  • ensuring that lost pieces get properly resent
  • ensuring that your data arrives at the remote destination in the proper order
  • watching out for congestion in the network
  • employing fancy algorithms to accomplish all of this as fast as possible

So when you issue the command, “write this data” the operating system responds with “I have your data, and I will do everything in my power to deliver this to the remote destination.”

BUT… how do I know when the remote destination has received my data?

And this is exactly where most people run into problems. A good way to think about it is like this:

Imagine you want to send a letter to a friend. Not an email, but the traditional snail mail. You know, through the post office. So you write the letter and put it in your mailbox. The mailman later comes by and picks it up. You can rest assured at this point that the post office will make every effort to deliver the letter to your friend. But how do you know for sure if your friend received the letter? I suppose if the letter came back with a “return to sender” stamped on it you can be certain your friend didn’t receive it. But what if it doesn’t come back? Is it enough to know that it made it into your friend’s mailbox? (Assume this is a really, really important letter.) The answer is no. Maybe it never leaves the mailbox. Maybe his roommate picks it up and accidentally throws it away. And if the roommate was responsible and left the letter on your friends desk? Would that be enough? What if your friend was on vacation and your letter gets lost in a pile of junk mail? So the only way to truly know if your friend received the letter is when you receive their response.

This is a great metaphor for sockets. When you write data to a socket, that is like putting the letter in the mailbox. The operating system is like the local mailman that comes by and picks up the letter. The giant post office system that routes the letter toward its destination is like the network. And the mailman that drops off your letter in your friends mailbox is like the operating system on your friends computer. It is then up to the application on your friends computer to read the data from the OS and process it (fetch the letter from the mailbox, and actually read it).

So how do I know when the remote destination has received my data? This is not something that TCP can tell you. At best, it can only tell you that the letter was delivered into their mailbox. It can’t tell you if the application has read that data and processed it. Maybe the application on the remote side crashed. Or maybe the remote user quit the application before it had a chance to read the data. Or maybe the remote user experienced a power outage. Long story short, it is up to the application layer to answer this question if need be.

写入


当你给Socket写入数据会发生什么?当写入完成时,这是否意味着另一方收到了数据?我们至少可以假设计算机已经发送了数据吗?答案是否定的。

回两件事:

  • 所有发送和接收的数据必须被分解成小片段以便将其发送到网络中。
  • TCP处理很多复杂的问题,如重发丢失的数据包,保证信息按序到达。

因此,当你运行写操作,数据被简单地复制到一个底层的缓冲区内的操作系统网络协议栈。在这一点上的TCP软件将开始它的魔力,它由所有很酷的东西构成,前面提到的如:

  • 把数据分成小块,以通过网络发送
  • 确保丢失的片段得到正确的转发
  • 确保数据按序到达
  • 监测网络拥堵
  • 用算法来尽快完成以上事情

因此,当你发出命令,“写这个数据”的操作系统的响应“我有你的数据,我将尽我的力量,把数据给远程目的地。”

但是……我怎么知道远端已经收到我的数据?

这正是大多数人遇到问题的地方。你可以这样来思考:

想像你想给朋友发一封信。不是电子邮件,而是传统的蜗牛邮件(通过邮局)。所以你写这封信,把它放在你的邮箱里。邮递员过来拿起信封。你可以放心,在这一点上,邮政局将尽一切努力,把信交给你的朋友。但是你怎么知道你的朋友收到了这封信?我想如果信被发回,信封上印着“回发件人”的话,你可以肯定你的朋友没有收到它。但是,如果它没发回呢?这足以让它到达你朋友的邮箱吗?(假设这是一封非常重要的信。)答案是否定的。也许它永远不会离开邮箱。也许他的室友会把它捡起来,不小心把它扔了。如果是室友,把信放在你的朋友桌上?那就足够了吗?如果你的朋友在休假,你的信会在一堆垃圾邮件中丢失,你会怎样?所以唯一的办法,是收到你朋友的回复,才能真正知道你的朋友收到了信。本文由B9班的真高兴发布在CSDN博客

这是一个很好的比喻。当你将数据写入一个Socket时,那就好像把信放在邮箱里一样。操作系统就像拿起信的邮递员。邮政局系统将信的目的地路由到它的目的地,就像是网络。邮差投递你的信在你的朋友的邮箱,就像你朋友的电脑操作系统。你的朋友计算机上的应用程序,从操作系统和处理它的数据读取(取的信,从邮箱,并实际上读它)。

所以我怎么知道远端已经收到我的数据?这是不是TCP可以告诉你。它只能告诉你,这封信被送到他们的邮箱里。如果应用程序读取数据并处理它,它就不能告诉你。可能在远端应用程序崩溃。或者在它有机会读取数据之前,可能远程用户退出应用程序。或者可能远程用户体验停电。长话短说,这里需要应用层来回答这个问题(应用来返回)。

 类似资料: