This article is originally published at https://lcolladotor.github.io/
Just like most scientific departments, we have a seminar (weekly over here) where very bright people come to us to talk about their work. Being a Biostatistics department, we mostly get faculty from other Biostatistics departments from universities to talk to us. This week was quite different. Amy Heineike from Quid gave us a talk describing their product, which fits perfectly in what is now called “data science”. You can see Amy at the end of the table in the picture below.
So what is Quid? It’s a start up tech company that provides either their software or reports derived from it that help big companies (a) analyze a field, (b) look at what the competition is doing, (c) take informed decisions (helpful for marketing). The short video below describes Quid in a more general way, check it out!
As Amy Heineike described in her talk, the three common decision-taking pathways are:
- Someone follows their own intuition. Say a big shot that thinks he knows where the world is going.
- Someone with decision power asks others to generate reports for her/him. That is, lots of manual work where some read, consult others, etc then they summarize the information in a report.
- Similar to the above one where lots of people gather the information, then a program is run and the decision is pretty much made by the computer.
The Quid paradigm is to use the computer to gather all the information and then have a human(s) look at a network with a very cool 3D tool to assimilate the information and decide themselves. The argument is that the human brain is very powerful for visual pattern recognition and can out-perform computers.
At first I felt that you can do the network part with a software like Cytoscape which I find to be very powerful for network analysis. But the pipeline used by Quid is much more extensive and it’s an all-in-one bundle.
Another key argument in favor of Quid is that most of the information shared is done in a list format. Like google search results, powerpoint bullet points, your facebook feed, etc. But who came up with the ranking? How are things related? That’s when you need a network representation.
I recommend taking a look at their technical overview page where they have the main steps outlined. But needless to say, they depend strongly on the natural language processing early steps. Their 3D tool looked very interesting and I love to play with it. Amy Heineike actually poked us by showing a video of a short session using the software that was designed so we would want to have a go with Quid. I, as many others, were hooked! Sadly, Quid’s software is not the kind that academics can go buy for now.
I found the example using “synthetic biology” as the query to be pretty interesting. Sadly I don’t have a picture, but one of the features that seems very powerful is when you change to a 2D display. In it, you have the time on the X-axis and the number of articles (well, any kind of input file Quid can use) on the Y-axis. By clicking on a point (which corresponds to a node in the network 3D environment) you can then visualize all the connections that are directly linked to it. Thus you have a scatterplot with a 2D network on top of it. That information can be really useful to understand the flow of information. The specific example was how someone proposed years ago that a specific kind of application was possible, time later grants on the subject were announced, and more close to the present he got a grant, then other grants and results were publicized.
Now, Quid has some flaws. For instance, one hot question was how to control the threshold that determines whether two nodes are connected or not. The answer was something like this: experts in their fields have validated the results for queries related to them. Not very convincing for a biostat crowd. Another one was how to control/remove/correct bias. Amy Heineike replied that you need to learn where the data used by Quid is like. For example, when looking at companies the number of news articles mentioned is linked to how efficient/big their public relations office is.
Nevertheless, Quid’s product is very interesting. Plus, I feel that part of our tool-box as Biostatisticians is visualizing data in ways that allow us to understand what is going on. As for working at Quid or doing anything alone the line, we definitely need to learn more about computer science. After all, you need incredibly fast algorithms and code to work with enormous data sets.
PS Amy Heineike might develop a Pubmed scrapper for Quid. Meaning that Quid would be able to access citations data. Then it would be very cool to use a few “seed” papers that you are interested in to find the complete history behind them and any other papers similar to them. There might another group out there working in your field that you don’t know about! Which I think happens more frequently that what you think. Specially if you don’t look abroad.
Edit: I had completely forgotten that I had read about Amy Heineike before in her SimpleStatistics interview. There’s more about her in this video and in The Phenomlist.
Please visit source website for post related comments.