“That sounds way too complex!” – Challenge Accepted!

Yesterday Adam Machanic called for new #tsql2sday hosts. I applaud his continued efforts organizing this monthly blog party. It inspires new bloggers, gently guilts those who have lapsed, and provides a topic and schedule for the uncertain. But it routinely surprised me that there was no canonical list of past events with a link to the current topic.

There was of course Adam’s debut post and the recent rules of engagement update, but though some folks owned relevant domains, they went nowhere. The closest we had to an official archive was Steve Jones’ T-SQL Tuesday Topic List. I’m grateful he was diligent in maintaining this, but its existence hinted at a problem. I appreciate that the event is supposed to be a little organic, driving blog and twitter traffic with hashtags and attention, but this comment by Robert Bishop sums up a friction which might have been limiting its growth:

So is the twitter tag #tsql2sday the best way to learn about the next blog party? I always seem to find out about them after the fact and can’t get a blog post out in time.

To start fixing this, today Brent Ozar transferred ownership of tsqltuesday.com to Steve Jones, who spun up a new WordPress site (on Azure) to host the list.

That’s great, but as data professionals, we should want this history in some sort of structured data format. Which brings us to Adam’s above “challenge” to me.

For some time, I’ve wanted an easy way to host structured data (for free or very cheap) in a public way, where the community could contribute and discuss suggested changes, but where the final editing decisions were made by trusted caretakers. Wikipedia comes close, but (1) it has rules on the type of content which can be hosted, (2) it’s a little too open for my tastes, and (3) it presents data in human-readable format, not machine-readable format. Hosting my own Wiki would solve (1) and (2), but not (3), and it introduces a new issue: (4) I have to become a Wiki software host. There are some free or cheap third party Wiki hosting services, but my quick review didn’t find anything that was machine readable.

I would love to see a SaaS (software as a service) version of this: a hosted, structured database which would have a GUI for admins to design and configure the database, an API layer to extract information, a nice-looking GUI for the public to review the data, a GUI for contributors to suggest edits (complete with discussion and maybe voting), and a GUI for moderators approve them. If there’s something like this and I’ve missed it– oops. Let me know in the comments. But I don’t know of one, and my initial searches didn’t find anything, so I thought about how I could make some of this happen.

GitHub is git repository hosting as service. Git is designed as a source code control system, but it stores pretty much any data. GitHub has made a name in the industry with its free hosting of open source projects. Put this together and GitHub is a free host of pretty much any public data. But that’s not all. It also has a way to control and delegate editing access. Check! And it has a GUI for contributors to suggest edits (pull requests). Check! You can host data in any format, which can include CSVs, JSON, or XML– i.e., structured data. Check! If only there were a GUI for users to conveniently review the data.

Enter GitHub pages. In addition to hosting data, GitHub hosts web pages (with certain restrictions). Now, technically HTML is a structure, so we could just maintain the data as an HTML table which could be viewed nicely by humans and consumed by machine, but I’ve heard somewhere that parsing HTML is not fun. Ideally, our data would be stored and maintained in a real structured format but able to be displayed nicely in a web page. JSON is a structured format, and it works well with web pages because it’s basically JavaScript. Although I have a special place in my heart for XML, JSON has displaced it and it’s finally supported in SQL Server 2016 (and it seems much faster than XML).

So that’s the challenge: create a GitHub repository which stores T-SQL Tuesday data in JSON format, T-SQL which can be used to review and adjust the JSON-formatted data, and a web page which consumes and displays that data cleanly. And here’s the proof of concept repository and its visual display.

Now, what’s great about this public hosting is that you, the community, can embrace this project and move it forward. Use the provided scripts to spin up the data in SQL Server and add more history. My long term goal is for it to include the individual blog posts for each topic. Or, if you have some web chops, improve the bare bones human-readable display. You can check out my first foray using this concept (a dormant project to list tech conferences) for some ideas. Now let’s show the other tech communities that the SQL Server crowd can rock GitHub just as well as those JavaScripters.

Leave a Reply

Your email address will not be published. Required fields are marked *