Attempt at network stress testing in R

I’ve been asked by reviewers to stress test two networks following Jeong &  Barabási (2000). Critically the reviewers asked for an exploration of how network diameter changed as progressively larger numbers of nodes were randomly dropped from the networks.

Although the netboot library makes it trivial to do a case-drop bootstrap on a network, it reports a limited set of network statistics and diameter is not one of them.

Here’s an attempt to run a stress test on network diameter for a small (1000 node) randomly generated ring network. I’m sure there are more efficient ways of doing this, and I’m concerned that the algorithm might struggle with the large real-world networks I’ll be applying it to, but I’m proud of the pretty output for now:


#Function graphdropstats accepts graph object and number of cases to drop
#drops ndrop cases(vertices) (using uniform random distribution to identify nodes to drop)
#then returns statistic on subgraph, in this case diameter
# V(graph) gives list of nodes in graph
# vcount(graph) gives number of vertices, but more efficient to get this from length of V(graph)

graphdropstats <- function(graph,ndrop){
 keepnodes<-V(graph) #vector of vertex ID's in graph
 keepnodes<-keepnodes[-droplist] #vector of positions in keepnodes to drop

#generate graph for testing

## sampling with nreps replications, dropping ndrop nodes at random and saving statistics; 
## and incrementing ndrop each time until ndropstop
allresults<-vector("numeric", nreps)

for (ndrop in 1:ndropstop){
 result<-vector("numeric", nreps)
 for (i in 1:nreps) {
 result[i] <- graphdropstats(graph1,ndrop)
if(ndrop > 1) allresults<-rbind(allresults,result)

allresults <- allresults[-1,] #drop first row of matrix which otherwise is blank
matplot(allresults, type='p', pch=15, col=c("gray70"),xlab="N vertices dropped at random", ylab="Network diameter")
lines(index, rowMeans(allresults), col = "red", lwd = 2)

#Edit 27/3/2018: bugfix

This gives us this plot:

…. which is pretty much what I’m looking for.  It shows, as expected, that ring networks are highly vulnerable to node dropout. Compare to a 1000 node scale-free network:

Fingers crossed that it’s efficient enough to run on large co-authorship networks!


  • [DOI] Albert, R., Jeong, H., & Barabási, A.. (2000). Error and attack tolerance of complex networks. Nature, 406(6794), 378–382.
    @article{Jeong2000, title={Error and attack tolerance of complex networks}, volume={406}, url={}, DOI={10.1038/35019019}, number={6794}, journal={Nature}, publisher={Springer Nature}, author={Albert, Réka and Jeong, Hawoong and Barabási, Albert-László}, year={2000}, month={Jul}, pages={378–382}}


Transcription nirvana? Automatic transcription with R & Google Speech API

For as long as I’ve been doing qualitative analysis I’ve been looking for ways to automate transcription. When I was doing my masters I spent more time (fruitlessly) looking for technical solutions than actually doing transcription. Speech recognition has come a a long way since then; perhaps it’s time to try again?

I came across a blog post recently that suggested it’s becoming possible using the Google Speech API. This is the same deep-learning model that powers Android speech recognition, so it seems promising.

After setting up a GCloud account (currently with $300 free credit; not sure how long that will last) installing the R libraries and running some text is simple:

#install package; run first time or to update package....

Once you’ve authorized with GCloud (a single line of code) the transcription itself requires a single command:

gl_speech("path to audio clip")

I tested it with a really challenging task: a 15 second clip of the Fermanagh Rose from the 2017 Rose of Tralee:

Then run the transcription:

audioclip <- "<<path to audio file>>"
testresult<-gl_speech(audioclip, encoding = "FLAC", sampleRateHertz = 22050, languageCode = "en-IE", maxAlternatives = 2L, profanityFilter = FALSE, speechContexts = NULL, asynch = FALSE)

Which spat out :

 startTime endTime word
1 0s 1.500s things
2 1.500s 1.600s are
3 1.600s 2.600s boyfriend
4 2.600s 2.700s and
5 2.700s 3.200s see
6 3.200s 3.600s uncle
7 3.600s 7.100s supposed
8 7.100s 7.300s to
9 7.300s 7.400s be
10 7.400s 7.500s on
11 7.500s 12.200s something
12 12.200s 12.700s instead
13 12.700s 13s so
14 13s 14.600s Big
15 14.600s 14.900s Brother
16 14.900s 15.400s big
17 15.400s 15.800s buzz
18 15.800s 16.300s around
19 16.300s 17.300s Broad
20 17.300s 17.600s range
21 17.600s 17.900s at
22 17.900s 24.300s Loughborough
23 24.300s 24.700s bank
24 24.700s 25.100s whereabouts
25 25.100s 25.100s in
26 25.100s 25.600s Fermanagh
27 25.600s 27.700s between
28 27.700s 28.300s Fermanagh
29 28.300s 28.800s Cavan
30 28.800s 29.700s and
31 29.700s 29.800s I
32 29.800s 30.100s live
33 30.100s 30.400s action
34 30.400s 30.600s the
35 30.600s 30.900s road
36 30.900s 31.500s on
37 31.500s 31.800s for
38 31.800s 32s the
39 32s 32.200s Marble
40 32.200s 32.300s Arch
41 32.300s 32.400s Caves
42 32.400s 33.800s and
43 33.800s 34.400s popular
44 34.400s 34.600s culture

Honestly, that’s not bad — although not quite useable. It’s certainly a good base to start transcribing from. I was not expecting it do deal so well with fast speech and regional dialects. Perhaps transcription nirvana will arrive soon; not quite here yet, but quite astonishing that such powerful  language processing is so easily accomplished.