当前位置: 首页 > 工具软件 > Tinn-R > 使用案例 >

R用户在尝试学习Python时会感到沮丧的八件事(而不是十件事)

彭正谊
2023-12-01
by Andy Nicholls
通过安迪·尼科尔斯(Andy Nicholls)

When speaking with clients and other R users at events such as LondonR and EARL I’ve noticed an increasing trend in people looking to learn some python as the next step in their data science journey.  At Mango most of our consultants are pretty happy using either language but as an R user of 12 or so years I’ve only ever dabbled with python.  Recently however I found myself having to learn quickly and so I thought I’d share some of my observations.

在诸如LondonR和EARL之类的活动中与客户和其他R用户交谈时,我注意到越来越多的人希望学习一些python作为他们数据科学之旅的下一步。 在Mango,我们的大多数顾问对使用这两种语言都非常满意,但是作为R用户12年来,我只涉猎python。 但是最近我发现自己必须快速学习,所以我想分享一些观察结果。

Before you stop reading I should say that I am fully aware that there are many blog posts covering the high level pros and cons of each language.  For this post I thought I’d get down to the nitty gritty.  What does an R user really experience when trying to pick up python?  In particular what does an R user that comes from a statistics background experience?

在您停止阅读之前,我应该说我已经完全意识到,有许多博客文章涵盖了每种语言的高级利弊。 对于这篇文章,我认为我会陷入困境。 R用户尝试拾取python时会真正体验到什么? 特别是R用户来自统计背景经验是什么?

Personally I found eight (I wanted 10 but python is too good) and here they are:

我个人发现了八个(我想要十个,但是python太好了),它们是:

  1. Lack of Hadley. So there is a Wes but there is a lot of duplication in functionality between packages.  To start with you import statistics and find the mean function only to find it has been re-written for pandas.  Later you find that everyone has their own idea on the best way to implement cross-validation.  All very confusing when you start out.  This brings me on to:
  2. Plotting. I had heard a lot of good things about matplotlib and seaborn but ggplot2 is streets ahead (IMHO).  I would even go as far as to say that ggplot2 has a shallower learning curve.
  3. IDEs. Hats off to RStudio for changing the R world when it comes to IDEs.  I remember a time before RStudio when the R GUI, StatET and Tinn-R were the norm.  How things have improved.  Sadly, python is not quite there yet.  As an RStudio user I opted for Spyder.  It’s OK but the script editor needs some work.  The integration in Jupyter Notebook seems much better when I chat with colleagues but I’m just not a big fan of notebooks.
  4. Namespaces. I’ve lost count of the number of times I’ve told trainees on an intro to R course that masking very rarely trips you up as a user (unless you’re building packages it really doesn’t).  Let’s just say that in python you have to be careful.  Bring too much in and you’ll overwrite your own objects and cause chaos.  This means you bring in things as and when you need them.  Having to explicitly import OS utilities in order to change the working directory and so on is frustrating.  That said, python’s capabilities are a little better than R in this area.
  5. Object Orientation. I’ve grown to love R’s flexible S3 classes with lines like:
  1. 缺乏哈德利 。 因此,有一个Wes,但软件包之间的功能有很多重复。 首先,您需要导入统计信息并查找均值函数,才能找到已为熊猫重写的均值函数。 稍后,您会发现每个人对于实现交叉验证的最佳方法都有自己的想法。 刚开始时,一切都很混乱。 这使我想到:
  2. 绘图 。 我听说过很多有关matplotlib和seaborn的好东西,但是ggplot2位于前方的街道(IMHO)。 我什至可以说ggplot2的学习曲线较浅。
  3. IDEs 。 向IDE致敬的RStudio旨在改变R世界。 我记得在RStudio出现之前,R GUI,StatET和Tinn-R是标准。 情况如何改善。 可悲的是,python还不存在。 作为RStudio用户,我选择了Spyder。 可以,但是脚本编辑器需要一些工作。 当我与同事聊天时,Jupyter Notebook中的集成似乎要好得多,但我不是笔记本的忠实拥护者。
  4. 命名空间 。 我已经不记得我在R课程入门中告诉学员的掩饰次数了,很少有人以用户身份蒙蔽您(除非您构建的确不是)。 我们只是说在python中,您必须要小心。 投入太多,就会覆盖自己的对象并引起混乱。 这意味着您可以在需要时随身携带物品。 必须显式导入OS实用程序以更改工作目录等等,这令人沮丧。 也就是说,在这方面,python的功能比R更好。
  5. 对象 定向 。 我已经开始喜欢R的灵活S3类,其代码如下:

> x <- 5

> class(x) <- "just_made_this_up"

> x

[1] 5

attr(,"class")

[1] "just_made_this_up"

In python I am never quite sure what methods exist for an object and when to just go functional.  You also really have to know about classes to work with python effectively whereas a casual R user can get by without even knowing that R has a class system.

在python中,我永远不确定对象存在什么方法以及什么时候开始起作用。 您还必须真正了解有效使用python的类,而随意的R用户甚至可以在不知道R具有类系统的情况下轻松完成工作。

  1. Reliance on R. On my recent project I was using the best of the statistical capabilities in python.  First off I should say that it’s basically all there (except for stepwise GLMs for some bizarre reason).  However, although I’ve always known that most of the statistical modelling capabilities in python have been ported from R the documentation is pretty lazy and most of it just points you at the R documentation.  The example datasets are even the same!  Speaking of the documentation.
  2. Help documentation. I can only speak for the more popular packages in the two languages but the R documentation is much more plentiful and generally contains a lot more examples.
  3. Zero-based arrays. I couldn’t write a list without this coming up.  I do love it when smug coders that have developed in other languages tell me that R is the exception here by indexing from 1.  However, as a human being I count from 1 and this will always make more sense to me.  Ending at n-1 is also confusing.  Compare:
  1. 依赖R。 在我最近的项目中,我使用了python中最好的统计功能。 首先,我应该说基本上所有的东西都存在(出于某些奇怪的原因,逐步GLM除外)。 但是,尽管我一直都知道python中的大多数统计建模功能都是从R移植过来的,但是该文档是相当懒惰的,并且大多数只是指向R文档。 示例数据集甚至是相同的! 说到文档。
  2. 帮助文档 。 我只能说两种语言中比较流行的软件包,但是R文档丰富得多,并且通常包含更多示例。
  3. 从零开始的数组 。 没有这个,我无法写出清单。 当用其他语言开发的自鸣得意的编码器告诉我R在这里是例外,因为从1开始进行索引编制。我确实喜欢它。但是,作为人类,我从1开始计数,这对我来说总是更有意义。 以n-1结尾也令人困惑。 比较:

# R

x = seq(2,10, by = 2)

x[1:3] # Select first 3 elements

[1] 2 4 6

# Python

x = list(range(2,11, 2))

x[0:3] # Select first 3 elements

[2, 4, 6]

What I was impressed by was how extensively the statistical capabilities in R have been ported to python (I wasn’t expecting the mixed modelling or survival analysis capabilities to be anything like that in R for example).  However, as an existing R user there really is no point in switching to python for statistics.  The only benefit would be if you were using python for, say, extensive web-scraping and you wanted to be consistent.  If that’s your reason though then let me point you towards Chris Musselle’s blog post, “Integrating Python and R Part II – Executing R from Python and Vice Versa”.  And don’t forget that you can also just use rvest.

令我印象深刻的是R中的统计功能已广泛移植到python中(我并不期望混合建模或生存分析功能会像R中一样)。 但是,作为现有的R用户,切换到python进行统计确实没有意义。 唯一的好处是,如果您正在使用python进行广泛的网络抓取,并且希望保持一致。 但是,如果这是您的原因,那么让我指出Chris Musselle的博客文章“ 集成Python和R第二部分–从Python和Vice Versa中执行R ”。 并且不要忘记,您也可以只使用rvest。

翻译自: https://www.pybloggers.com/2016/10/eight-not-10-things-an-r-user-will-find-frustrating-when-trying-to-learn-python/

 类似资料: