About

About

I'm a Computational Journalist at ProPublica, where I use data science and code to do investigative journalism. Before this, I was a Machine Learning Engineer at Atrium, where I wrote software to automatically extract and analyze legal terms from contracts.

I've worked on stories that have:

exposed online disinformation by the Chinese government, both within China and abroad;
identified undercounts in the Covid-19 death toll in the early days of the pandemic;
found concerning errors in algorithms used to surveil students in the name of school safety;
exposed the business practices of ransomware recovery companies.

My stories have been appeared in the New York Times, Wired, and the Guardian, among other publications.

My 2017 blog post that uncovered more than a million astroturf comments for FCC neutrality regulations unexpectedly went viral. It was covered in the Washington Post, Fortune, and engadget, and I was also invited on to Science Friday to explain my work. I’ve also been interviewed about my data science work in the New York Times and Forbes.

My Experience

Before working a in newsroom, I’ve had experience in small startups, large international law firms and in government. The common thread tying together each of those roles has been my ability to quickly analyze, contextualize and synthesize data and clearly communicate the results.

I have a law degree from Columbia Law School, where I was also the Editor-in-Chief of the Columbia Science and Technology Law Review, and a bachelors degree in systems engineering from the University of Waterloo, specializing in cognitive science.

My Data Science Philosophy

I believe in using models that are reproducible and, to the extent possible, interpretable. I use deep learning only when appropriate to the problem at hand, not as a first resort.

I believe in style guides (for both natural and programming languages), but not in blindly following rules at the expense of broader principles (for both natural and programming languages).

I believe in writing DRY code (and thinking about why), and I believe that data science projects for fun should have whimsical names. :-)

Most of all, I believe in paying attention to the details, in being clear about my assumptions, in being aware of the limitations of statistics, and in doing great work that matters.

Get in touch

I'd love to hear your feedback or questions about my work. Shoot me an email if you have an opportunity in mind or an idea for collaboration. You can reach me at jeff.kao at propublica.org, at +1 646 789 5351 on Signal, or through the contact page.

Jeff Y. Kao

Data Scientist, Journalist, Language Nerd

About