What can we learn from 1.1 billion GitHub events and 42 TB of code? (Inglés)
Data Science

Presentación

Anyone can easily analyze the more than five years of GitHub metadata and 42+ terabytes of open source code. We'll leverage this data to understand the community and code related to any language or project. Relevant for open source creators, users, and choosers.

Resumen

“Data gives us insights into how people build software, and the activities of open source communities on GitHub represent one of the richest datasets ever created of people working together at scale.”—GitHub Universe 2016

With Google BigQuery anyone can easily analyze the more than five years of GitHub metadata and 42+ terabytes of open source code. Felipe Hoffa explains how to leverage this data to understand the community and code related to any language or project. Relevant for open source creators, users, and choosers, this is data that you can leverage to make better choices.

Topics include:

  • How it’s run
  • How coding patterns have changed through time
  • Guiding your project design decisions based on actual usage of your APIs
  • How to request features based on data
  • The most effective phrasing to request changes
  • Effects of social media on a project’s popularity
  • Who starred your project and what other projects interest them
  • Measuring community health
  • Running static code analysis at scale
  • Tabs or spaces? Where should commas go?
  • Africa and open source - a data based analysis.