Categories: Case Studies, How To, Implementation, | Tue, 14 Jul 2020 02:28:41 GMT
Blogs > https://www.sociallyconstructed.online/blogs/week-6-from-airtable-to-google-sheets
This series follows Ria Gupta, a talented software developer we’re working with for Google Summer of Code.
She’s been selected by CHAOSS project and SociallyConstructed.Online to implement our Social Currency Metrics System In their analytics platform, Grimoire Lab. In this series she is providing us a step by step account of her progress as cross posted from her personal blog! |
Series Blog Contents:
Check here for all published blogs. Announcement: Ria’s Journey begins! Week 1: Ria’s 1st week Week 2: The SCMS in Airtable Week 3: Preparations & Superheroes Week 4: Putting Code to PIxel Week 5: The SCMS Data’s Alive! Week 6: Airtable to Google Sheets Week 7: Our 1st Visualizations |
Airtable’s limitations
I didn’t realise it until just last week but Airtable has a limit of 1200 records per base only for a free version and the SCMS could be implemented in any spreadsheet software as far as GrimoireLab is concerned.
Our initial goal was to collect as much data as possible to represent the community’s sentiments holistically and make SCMS usable by other open-source communities as well.
So we decided to shift the implementation to Google Sheets. Google Sheets has a limit at 5 million cells, which is adequately enormous [yeah….0.o].
So, I collected all data from Github, Mailing lists of Grimoirelab and IRC channel of CHAOSS. Then, randomly tagged it (as done in Week 2), and exported to Google Sheets using an API.
After looking at the data carefully, we also noticed that we had Github comments by Coveralls indicating the coverage increased or decreased. Other than this, we had lots of IRC messages indicating a person has joined the channel or has left the channel. This data is not providing us with any additional information about the community and is not a user sentiment. So, we planned to remove all those unnecessary records. We collected around 5.6k filtered data from Github, Mailing list, IRC channel.
The Codex
Codices help us to rely on Qualitative data in an objective sense over time much like Quantitative data. To reduce the subjectivity of this qualitative data, it is imperative to define a codex table which can help in tagging data points accurately by better defining and honing in on the purpose of tags. It also helps in collaborating with the team to keep similar ideas running.
Codices contain the definitions of metrics within the organisation (here it is Grimoirelab), and an example to illustrate the definition. It also consists of “when to use” and “when-not-to-use” a metric.
Other than this, since we have a full version of data present in google sheets, I converted this entire Excel sheet data to the ElasticSearch index (precisely as done in Week 2). The only difference is the number of records being used. Earlier, I could only have limited records containing the “extra_scms_data” in the Enriched index.
Now, every meaningful record (i.e. ignoring comments by coveralls) has an additional field present in its ElasticSearch index.
And the weekly meeting…
We discussed Dashboarding and expanding on the SCMS. We’ll be having a collaboration meeting on Friday,26 for discussing the findings of tagged data. Till then, in the next week, I’ll be focusing on writing tests and making a basic dashboard. ?
That’s it for this week.? Make sure you have a look at the project updates on Github #ria18405/GSoC. All questions and comments are welcomed! Stay tuned for more weekly updates. ?