Attempt at network stress testing in R

I’ve been asked by reviewers to stress test two networks following Jeong &  Barabási (2000). Critically the reviewers asked for an exploration of how network diameter changed as progressively larger numbers of nodes were randomly dropped from the networks.

Although the netboot library makes it trivial to do a case-drop bootstrap on a network, it reports a limited set of network statistics and diameter is not one of them.

Here’s an attempt to run a stress test on network diameter for a small (1000 node) randomly generated ring network. I’m sure there are more efficient ways of doing this, and I’m concerned that the algorithm might struggle with the large real-world networks I’ll be applying it to, but I’m proud of the pretty output for now:

library(pacman)
library(tidyverse)
library(tidygraph)
library(igraph)
library(ggplot2)

#Function graphdropstats accepts graph object and number of cases to drop
#drops ndrop cases(vertices) (using uniform random distribution to identify nodes to drop)
#then returns statistic on subgraph, in this case diameter
# V(graph) gives list of nodes in graph
# vcount(graph) gives number of vertices, but more efficient to get this from length of V(graph)

graphdropstats <- function(graph,ndrop){
 keepnodes<-V(graph) #vector of vertex ID's in graph
 droplist<-sample(as_ids(keepnodes),ndrop)
 keepnodes<-keepnodes[-droplist] #vector of positions in keepnodes to drop
 samplegraph<-induced_subgraph(graph,keepnodes)
 return(diameter(samplegraph))
}

#generate graph for testing
graph1<-create_ring(1000)

## sampling with nreps replications, dropping ndrop nodes at random and saving statistics; 
## and incrementing ndrop each time until ndropstop
nreps=100
ndropstop=100
ndrop=1
allresults<-vector("numeric", nreps)

for (ndrop in 1:ndropstop){
 result<-vector("numeric", nreps)
 for (i in 1:nreps) {
 result[i] <- graphdropstats(graph1,ndrop)
 }
if(ndrop > 1) allresults<-rbind(allresults,result)
}

allresults <- allresults[-1,] #drop first row of matrix which otherwise is blank
#allresults 
matplot(allresults, type='p', pch=15, col=c("gray70"),xlab="N vertices dropped at random", ylab="Network diameter")
index<-1:(ndropstop-1)
lines(index, rowMeans(allresults), col = "red", lwd = 2)

#Edit 27/3/2018: bugfix

This gives us this plot:

…. which is pretty much what I’m looking for.  It shows, as expected, that ring networks are highly vulnerable to node dropout. Compare to a 1000 node scale-free network:

Fingers crossed that it’s efficient enough to run on large co-authorship networks!

 

  • [DOI] R. Albert, H. Jeong, and A. Barabási, “Error and attack tolerance of complex networks,” Nature, vol. 406, iss. 6794, p. 378–382, 2000.
    [Bibtex]
    @article{Jeong2000, title={Error and attack tolerance of complex networks}, volume={406}, url={http://dx.doi.org/10.1038/35019019}, DOI={10.1038/35019019}, number={6794}, journal={Nature}, publisher={Springer Nature}, author={Albert, Réka and Jeong, Hawoong and Barabási, Albert-László}, year={2000}, month={Jul}, pages={378–382}}

 

Transcription nirvana? Automatic transcription with R & Google Speech API

For as long as I’ve been doing qualitative analysis I’ve been looking for ways to automate transcription. When I was doing my masters I spent more time (fruitlessly) looking for technical solutions than actually doing transcription. Speech recognition has come a a long way since then; perhaps it’s time to try again?

I came across a blog post recently that suggested it’s becoming possible using the Google Speech API. This is the same deep-learning model that powers Android speech recognition, so it seems promising.

After setting up a GCloud account (currently with $300 free credit; not sure how long that will last) installing the R libraries and running some text is simple:

#install package; run first time or to update package....
#devtools::install_github("ropensci/googleLanguageR")
library(googleLanguageR)

Once you’ve authorized with GCloud (a single line of code) the transcription itself requires a single command:

gl_speech("path to audio clip")

I tested it with a really challenging task: a 15 second clip of the Fermanagh Rose from the 2017 Rose of Tralee:

Then run the transcription:

audioclip <- "<<path to audio file>>"
testresult<-gl_speech(audioclip, encoding = "FLAC", sampleRateHertz = 22050, languageCode = "en-IE", maxAlternatives = 2L, profanityFilter = FALSE, speechContexts = NULL, asynch = FALSE)
testresult

Which spat out :

 startTime endTime word
1 0s 1.500s things
2 1.500s 1.600s are
3 1.600s 2.600s boyfriend
4 2.600s 2.700s and
5 2.700s 3.200s see
6 3.200s 3.600s uncle
7 3.600s 7.100s supposed
8 7.100s 7.300s to
9 7.300s 7.400s be
10 7.400s 7.500s on
11 7.500s 12.200s something
12 12.200s 12.700s instead
13 12.700s 13s so
14 13s 14.600s Big
15 14.600s 14.900s Brother
16 14.900s 15.400s big
17 15.400s 15.800s buzz
18 15.800s 16.300s around
19 16.300s 17.300s Broad
20 17.300s 17.600s range
21 17.600s 17.900s at
22 17.900s 24.300s Loughborough
23 24.300s 24.700s bank
24 24.700s 25.100s whereabouts
25 25.100s 25.100s in
26 25.100s 25.600s Fermanagh
27 25.600s 27.700s between
28 27.700s 28.300s Fermanagh
29 28.300s 28.800s Cavan
30 28.800s 29.700s and
31 29.700s 29.800s I
32 29.800s 30.100s live
33 30.100s 30.400s action
34 30.400s 30.600s the
35 30.600s 30.900s road
36 30.900s 31.500s on
37 31.500s 31.800s for
38 31.800s 32s the
39 32s 32.200s Marble
40 32.200s 32.300s Arch
41 32.300s 32.400s Caves
42 32.400s 33.800s and
43 33.800s 34.400s popular
44 34.400s 34.600s culture

Honestly, that’s not bad — although not quite useable. It’s certainly a good base to start transcribing from. I was not expecting it do deal so well with fast speech and regional dialects. Perhaps transcription nirvana will arrive soon; not quite here yet, but quite astonishing that such powerful  language processing is so easily accomplished.

Managing my publication page on WordPress with Papercite

This is a very exciting find: a way to automatically generate a publications page on a WordPress blog from a bibtex file.

I’ve used Jabref to manage my own publication record for years now. Papercite pulls the most recent version of the Jabref database (a bibtex file) via a Dropbox link and automatically generates my publication page (see it in action here).  Here’s the script in the WordPress page that does the work:

{bibtex  highlight=”Michael Quayle|M. Quayle|Mike Quayle|Quayle|Quayle M.” template=av-bibtex format = APA  show_links=1  process_titles=1 group=year group_order=desc file=https://www.dropbox.com/s/2ol9lo2rh52bo6c/1.MQPublications.bib?dl=1}

(Note: I’ve replaced the square brackets with curly braces so that the publications page doesn’t render in this post about the publications page; the curly brackets above need to be square brackets in order for the script to run.)

Now, when I update my bibtex record with new publications (which I would be doing anyway) my publications page automatically shows the most recent updates.

Fingers-crossed that this continues to work when Dropbox changes its web-rendering policy in September….

 

VIAPPL symposium & workshop coming soon

I’m excited to be travelling to South Africa for the VIAPPL symposium at the first Pan-African Psychology Union Congress in Durban on the 19th of September.

We’ll follow that up with a VIAPPL researcher’s conference with collaborators from the University of KwaZulu-Natal, University of Groningen and the University of Limerick (me).

Contact me (mike.quayle<<at>>ul.ie) for more information on these.