There is a rising demand for inference from data coupled with the arrival of new software and hardware technologies. Seth Brown evaluates the current availability of tools across different statistical languages, both domain-specific varieties such as R and Stata, and general purpose computing languages like Python. He writes that we are on the cusp of great innovation and the development of better tools will only hasten this progress, but there must be careful consideration over the proprietary limits of these programmes to ensure they meet our future needs.
I’ve been thinking about the future of data analysis lately and which statistical language du jour will rise to prominence. I’m using the term statistical as a catch all adjective to encompass statistics, machine learning, and other types of data analysis and inference. On one side, there are languages built for doing statistics, which have some rudimentary programming capabilities, and, on the other side, there are languages built for programming, which have rudimentary statistical capabilities. This schism requires statisticians and scientists to be fluent in multiple languages, impairs the development of better tools, leads to feature duplication across languages, and generates needless technical debt.
The burgeoning demand for a deeper understanding of our world through data is highlighting the need for better tools. Lowering the friction of data analysis workflows by closing the schism between existing language paradigms is a critical step towards the development of better tools. A contemporary statistical language is needed that can bridge this divide and provide an efficient, modern data analysis workflow. Continue reading