I dislike the term “big data” because it implies a class of problems that are immune from normal statistical considerations. I will admit that certain concepts such as the p-value become meaningless when you have millions of observations. But other concepts, like selection bias become even more important for big data.
Anyway, I now have a second publication that is directly tied to the big data movement.
It is a chapter on the Centers for Disease Control and Prevention for the Encyclopedia of Big Data. This is a Springer project, and I’m not sure what form it will end up taking. If the work is not trapped behind a pay wall, I will put a link up here.
I have a second chapter under review for this project, but have not heard yet whether it will be published as well.
A couple of years ago, I wrote a chapter, R for Big Data Analysis, for a book, Big Data Analysis for Bioinformatics and Biomedical Discoveries, edited by my colleague (and soon to be my boss) Shui Qing Ye. This book was published by Chapman and Hall in 2015.
You can find an earlier version of this page on my blog.