About

 
 I took the above picture at sunrise last year in Death Valley National Park. You can barely make out my wife in the photo -- a tiny black speck about 1/3 of the width in from the right margin.

I took the above picture at sunrise last year in Death Valley National Park. You can barely make out my wife in the photo -- a tiny black speck about 1/3 of the width in from the right margin.

About

I'm a Computational Journalist at ProPublica. Previously, I worked as a Machine Learning Engineer at Atrium. I'm most passionate about work involving natural language processing; however, I am equally comfortable solving data problems using old-school statistical models, modern data science techniques or deep learning.

Recently, I've built systems that:

  • automatically extract structured data out of legal documents;

  • cluster and analyze millions of comments submitted to the FCC;

  • use machine learning predictions to give context on news spread through social media;

  • predict NBA players' stats based on past performance.

My blog post about the FCC project unexpectedly went viral! I got some media coverage in the Washington Post, Fortune, and engadget. I was also invited on to Science Friday to explain my work! I’ve also been interviewed about my data science work in the New York Times and Forbes.

My Experience

I've worked in small startups, large international law firms and in government. The common strength I've had in each of those jobs is my ability to quickly analyze, contextualize and synthesize data, and to clearly communicate the results.

I have a law degree from Columbia, where I was also the Editor-in-Chief of the Columbia Science and Technology Law Review, and a systems engineering degree from the University of Waterloo with an option in cognitive science.

My Data Science Philosophy

I believe in using models that are reproducible and, to the extent possible, interpretable. I use deep learning only when appropriate to the problem at hand, not as a first resort.

I believe in style guides (for both natural and programming languages), but not in blindly following rules at the expense of broader principles (for both natural and programming languages).

I believe in writing DRY code (and thinking about why), and I believe that data science projects for fun should have whimsical names. :-)

Most of all, I believe in paying attention to the details, in being clear about my assumptions, in being aware of the limitations of statistics, and in doing great work that matters.

Get in touch

I'd love to hear your feedback or questions about my work. I'd also love to work on meaningful projects involving NLP, so shoot me an email if you have an opportunity in mind or an idea for collaboration. Feel free to reach out to me at jeff.kao at propublica.org or on my contact page. You can message me on Signal at +1 646 789 5351.