❯ Guillaume Laforge

bigquery

What can we learn from millions of (groovy) source files in Github

What can you learn from millions of (Groovy) source files stored on Github? In this presentation, I analized source files in the Github archives stored on BigQuery, and in particular Groovy source file, but also Gradle build files, or Grails controllers and services. What kind of questions can we answer How many Groovy files are there on Github? What are the most popular Groovy file names? How many lines of Groovy source code are there? Read more...

Gradle vs Maven and Gradle in Kotlin or Groovy

Once in a while, when talking about Gradle with developers, at conferences or within the Groovy community (but with the wider Java community as well), I hear questions about Gradle. In particular Gradle vs Maven, or whether developers adopt the Kotlin DSL for Gradle builds. In the past, I blogged several times about using BigQuery and the Github dataset to analyze open source projects hosted on Github, by running some SQL queries against that dataset. Read more...

Analyzing half a million Gradle build files

Gradle is becoming the build automation solution of choice among developers, in particular in the Java ecosystem. With the Github archive published as a Google BigQuery dataset, it’s possible to analyze those build files, and see if we can learn something interesting about them! This week, I was at the G3 Summit conference, and presented about this topic: I covered the Apache Groovy language, as per my previous article, but I expanded my queries to also look at Grails applications, and Gradle build files. Read more...

What can we learn from million lines of Groovy code on Github?

Github and Google recently announced and released the Github archive to BigQuery, liberating a huge dataset of source code in multiple programming languages, and making it easier to query it and discover some insights. Github explained that the dataset comprises over 3 terabytes of data, for 2.8 million repositories, 145 million commits over 2 billion file paths! The Google Cloud Platform blog gave some additional pointers to give hints about what’s possible to do with the querying capabilities of BigQuery. Read more...