The choice of programming language for different machine learning task (both for academic purposes and industry purposes) are made to achieve optimization of the algorithm in terms of execution speed.
For a machine learning task in which the algorithm is largely sequential like deep learning, the bottleneck is the sequential part of the computation. To deal with this situation, an implementation of the algorithm in a programming language which has the fastest average speed (in this case, C programming language) will yield lowest execution time possible when well programmed.
When the algorithm of the machine learning task contains substantial non sequential components, the execution time of the non sequential parts of the algorithm can be reduced in proportion to the amount of parallel execution achieved. Thus, for highly parallable algorithms such as TF-IDF in which non sequential computation is the bottleneck, an acceleration can be achieved by allocate parallel computations to different CPU hardware threads / CPUs. The requirement of distributed computation raises the problem of maintainability. To be specific, the problem is how to conveniently maintain the computation cluster when the computation nodes may not be hosted in the same OS environment and same CPU type. Luckily, JVM based programming language solved this problem beautifully by abstract the computation environment away from the OS and the physical computer.