updateterew.blogg.se

Ttps://cran.r-project.org
Ttps://cran.r-project.org








ttps://cran.r-project.org
  1. #Ttps://cran.r project.org full#
  2. #Ttps://cran.r project.org code#

Tidyverse stack traces have become kind of horrible. I think it's worth it for what they have achieved. even though I personally have been burned a couple times by all the changes. So personally I don't count it against them that they didn't get it 100% right the first time. The dplyr API has only evolved so much because Hadley set insanely high standards for how powerful and intuitive it should be. The join API is much more reasonable now and it supports non-equijoins, which for many people could be the decider vs dplyr just by itself. data.table made some truly WTF decisions in its early versions and has backtracked on all of it. I will also admit that the latest data.table tutorials suggest a lot of improvement over time. The creator of data.table clearly comes from that background and wrote it for those kind of workloads. I would probably prefer data.table to dplyr in that use case as well.

#Ttps://cran.r project.org code#

It's my hypothesis that pretty much everyone who loves data.table is a finance/trading type person who as you say needs to quickly write tons of throwaway exploratory code to analyze large stock price datasets or the like. Obviously not everyone needs the speed, but a lot of us do! And as the h2o benchmarks show, data.table is still quite a bit faster than dplyr. Execution speed on medium sized data is important, because a few extra seconds on every run matters a lot when you are running 500 micro-batches of analysis code a day. Less typing is good, because you're trying to move as fast as possible to explore hypotheses.

ttps://cran.r-project.org

Rapid iteration for you personally is the point.

#Ttps://cran.r project.org full#

The verbosity is obviously a benefit if you're having to read someone else's code and are not interested in learning a DSL just to understand what columns are being filtered on or dropped or whatever.īut if you are doing data analysis full time and are writing thousands of lines of throwaway EDA code a week, most of it only to be seen by yourself, the concision and speed that data.table offers is basically second to none, in any language. If you are coming from a non-programming context and picking up R for the first time, no doubt tidyverse is the way to do that. I think you are also maybe assuming everyone has the same use-case as you for data manipulation libraries.

ttps://cran.r-project.org

However I would hardly count it as a victory when late in the game you have to change the API for some core data manipulation functions because you made them too confusing the first time around. So it's great to see the ecosystem evolving toward better usability. I did go check out what's new in the tidyverse after your comment and was pleased to see new functions like pivot_wider and pivot_longer replacing the extremely confusing mess of spread and unite.

ttps://cran.r-project.org

The best of both worlds - an optimizable query and one-action-at-a-time syntax - can be achieved with a lazy system like Apache Spark or dtplyr. NOT because it's more understandable, because it isn't. The reason data.table has all these independent knobs is because it wants you to cram your entire query into a single command, so it can optimize the query more easily and squeeze every drop of performance. (And yes I know precisely what is going on in the data.table version, I just think it's ugly and illustrates my point about composability and legibility extremely well.) You have to put DT inside itself? What is. To illustrate, let's look at what you have to do in data.table in order to achieve the equivalent of a grouped filter in dplyr (from the dtplyr translation vignette):Ĭompared to the simple, declarative feel of the dplyr, there's a lot of weird stuff going on in the data.table version. It's only simple if all you do is filter, group by, and summarize. Doing data manipulation one action at a time in a piped sequence is easiest to reason about because the state right before you apply a new operation is always clear.ĭata.table, on the other hand, is a fancy clever gadget with many knobs and buttons you have to turn and press just so to get the desired result.










Ttps://cran.r-project.org