Children's Publishing Blogs - Nathaniel Beck blog posts

Home > All Blogs > Tag > Nathaniel Beck

Sort Blog Posts

Sort Posts by:

in
from

Suggest a Blog

Enter a Blog's Feed URL below and click Submit:

Most Commented Posts

In the past 7 days

JacketFlap Sponsors

Spread the word about books.
Put this Widget on your blog!

Are you a book Publisher?
Learn about Widgets now!

Advertise on JacketFlap

MyJacketFlap Blogs

Login or Register for free to create your own customized page of blog posts from your favorite blogs. You can also add blogs by clicking the "Add to MyJacketFlap" links next to the blog name in each post.

Blog Posts by Tag

In the past 7 days

Blog Posts by Date

Click days in this calendar to see posts by day or month

new posts in all blogs

Viewing: Blog Posts Tagged with: Nathaniel Beck, Most Recent at Top [Help]

Results 1 - 2 of 2

1. Replication redux and Facebook data

By: MAlvarez, on 1/19/2015
Blog: OUPblog (Login to Add to MyJacketFlap)
JacketFlap tags: Technology, Journals, Politics, Data, facebook, Social Sciences, *Featured, oxford journals, Political Analysis, R. Michael Alvarez, Nathaniel Beck, Dataverse, Add a tag

Introduction, from Michael Alvarez, co-editor of Political Analysis

Recently I asked Nathaniel Beck to write about his experiences with research replication. His essay, published on 24 August 2014 on the OUPblog, concluded with a brief discussion of a recent experience of his when he tried to obtain replication data from the authors of a recent study published in PNAS, on an experiment run on Facebook regarding social contagion. Since then the story of Neal’s efforts to obtain this replication material have taken a few interesting twists and turns, so I asked Neal to provide an update — because the lessons from his efforts to get the replication data from this PNAS study are useful for the continued discussion of research transparency in the social sciences.

Replication redux, by Nathaniel Beck

When I last wrote about replication for the OUPblog in August (“Research Replication in Social Science”), there was one smallish open question (about my own work) and one biggish question (on whether I would ever see the Kramer et al., “Experimental evidence of massive-scale emotional contagion through social networks”, replication file, which was “in the mail”). The Facebook story is interesting, so I start with that.

After not hearing from Adam Kramer of Facebook, even after contacting PNAS, I persisted with both the editor of PNAS (Inder Verma, who was most kind) and with the NAS through “well connected” friends. (Getting replication data should not depend on knowing NAS members!). I was finally contacted by Adam Kramer, who offered that I could come out to Palo Alto to look at the replication data. Since Facebook did not offer to fly me out, I said no. I was then offered a chance to look at the replication files in the Facebook office 4 blocks from NYU, so I accepted. Let me stress that all dealings with Adam Kramer were highly cordial, and I assume that delays were due to Facebook higher ups who were dealing with the human subjects firestorm related to the Kramer piece.

When I got to the Facebook office I was asked to sign a standard non-disclosure agreement, which I dec. To my surprise this was not a problem, with the only consequence being that a security officer would have had to escort me to the bathroom. I then was put in a room with a Facebook secure notebook with the data and R-studio loaded; Adam Kramer was there to answer questions, and I was also joined by a security person and an external relations person. All were quite pleasant, and the security person and I could even discuss the disastrous season being suffered by Liverpool.

I was given a replication file which was a data frame which had approximately 700,000 rows (one for each respondent) and 7 columns containing the number of positive and negative words used by each respondent as well as the total word count of each respondent, percentages based on these numbers, experimental condition. and a variable which omitted some respondents for producing the tables. This is exactly the data frame that would have been put in an archive since it contained all the data needed to replicate the article. I also was given the R-code that produced every item in the article. I was allowed to do anything I wanted with that data, and I could copy the results into a file. That file was then checked by Facebook people and about two weeks later I received the entire file I created. All good, or at least as good as it is going to get.

Intel team inside Facebook data center. Intel Free Press. CC BY 2.0 via Wikimedia Commons.

The data frame I played with was based on aggregating user posts so each user had one row of data, regardless of the number of posts (and the data frame did not contain anything more than the total number of words posted). I can understand why Facebook did not want to give me the data frame, innocuous as it seemed; those who specialize in de-de-identifying private data and reverse engineering code are quite good these days, and I can surely understand Facebook’s reluctance to have this raw data out there. And I understand why they could not give me all the actual raw data, which included how feeds were changed and so forth; this is the secret sauce that they would not like reverse engineered.

I got what I wanted. I could see their code, could play with density plots to get a sense of words used, I could change the number of extreme points dropped, and I could have moved to a negative binomial instead of a Poisson. Satisfied, I left after about an hour; there are only so many things one can do with one experiment on two outcomes. I felt bad that Adam Kramer had to fly to New York, but I guess this is not so horrible. Had the data been more complicated I might have felt that I could not do everything I wanted, and running a replication with 3 other people in a room is not ideal (especially given my typing!).

My belief is that that PNAS and the authors could simply have had a different replication footnote. This would have said that the code used (about 5 lines of R, basically a call to a Poisson regression using GLM) is available at a dataverse. In addition, they could have noted that the GLM called used the data frame I described, with the summary statistics for that data frame. Readers could then see what was done, and I can see no reason for such a procedure to bother Facebook (though I do not speak for them). I also note a clear statement on a dataverse would have obviated the need for some discussion. Since bytes are cheap, the dataverse could also contain whatever policy statement Facebook has on replication data. This (IMHO) is much better than the “contact the authors for replication data” footnote that was published. It is obviously up to individual editors as to whether this is enough to satisfy replication standards, but at least it is better than the status quo.

What if I didn’t work four blocks from Astor Place? Fortunately I did not have to confront this horror. How many other offices does Facebook have? Would Adam Kramer have flown to Peoria? I batted this around, but I did most of the batting and the Facebook people mostly did no comment. So someone else will have to test this issue. But for me, the procedure worked. Obviously I am analyzing lots more proprietary data, and (IMHO) this is a good thing. So Facebook, et al., and journal editors and societies have many details to work out. But, based on this one experience, this can be done. So I close this with thanks to Adam Kramer (but do remind him that I have had auto-responders to email for quite while now).

On the more trivial issue of my own dataverse, I am happy to report that almost everything that was once on an a private ftp site is now on my Harvard dataverse. Some of this was already up because of various co-authors who always cared about replication. And on stuff that was not up, I was lucky to have a co-author like Jonathan Katz, who has many skills I do not possess (and is a bug on RCS and the like, which beats my “I have a few TB and the stuff is probably hidden there somewhere”). So everything is now on the dataverse, except for one data set that we were given for our 1995 APSR piece (and which Katz never had). Interestingly, I checked the original authors’ web sites (one no longer exists, one did not go back nearly that far) and failed to make contact with either author. Twenty years is a long time! So everyone should do both themselves and all of us a favor, and build the appropriate dataverse files contemporaneously with the work. Editors will demand this, but even with this coercion, this is just good practice. I was shocked (shocked) at how bad my own practice was.

Heading image: Wikimedia Foundation Servers-8055 24 by Victorgrigas. CC BY-SA 3.0 via Wikimedia Commons.

The post Replication redux and Facebook data appeared first on OUPblog.

Research replication in social science: reflections from Nathaniel Beck

Replication and data access has become a hot topic throughout the sciences. As a former editor of Political Analysis and the chair of the Society for Political Methodology‘s Data Access and Research Transparency (DA-RT) committee, I have been thinking about these issues a lot lately. But here I simply want to share a few recent experiences (two happy, one at this moment less so) which have helped shape my thinking on some of these issues. I note that in none of these cases was I concerned that the authors had done anything wrong, though of course I was concerned about the sensitivity of results to key assumptions.

The first happy experience relates to an interesting paper on the impact of having an Islamic mayor on educational outcomes in Turkey by Meyerson published recently in Econometrica. I first heard about the piece from some students, who wanted my opinion on the methodology. Since I am teaching a new (for me) course on causality, I wanted to dive more deeply into the regression discontinuity design (RDD) as used in this article. Coincidentally, a new method for doing RDD was presented at the recent (2014) meetings of the Society for Political Methodology by Rocio Titiunik. I want to see how her R code worked with interesting comparative data. All recent Econometrica articles are linked to both replication and supplementary materials on the Econometrica web site. It took perhaps 15 minutes to make sure that I could run Stata on my desktop and get the same results as in the article. So thanks to both Meyerson and Econometrica for making things so easy.

I gained from this process, getting a much better feel for real RDD data analysis so I can say more to my students than “the math is correct.” My students gain by seeing a first rate application that interests them (not a toy, and not yet another piece on American elections). And Meyerson gains a few readers who would not normally peruse Econometrica, and perhaps more cites in the ethnicity literature. And thanks to Titiunik for making her R code easily accessible.

The second happy experience was similar to the first, but also opened my eyes to my own inferior practice. At the same Society meetings, I was the discussant on a paper by Grant and Lebo on using fractional integration methods. I had not thought about such methods in a very long time, and believed (based on intuition and no evidence to the contrary) that using fractional integration methods led to no changes in substantive findings. But clearly one should base arguments on evidence and not intuition. I decided to compare the results of a fractional integration study by Box-Steffensmeier and Smith with the results of a simpler analysis. Their piece had a footnote saying the data were available through the ICPSR (excellent by the standards of 1998). Alas, on going to the ICPSR web site I could not find the data (noting that the lots of things have happened since 1998 and who knows if my search was adequate). Fortunately I know Jan so I wrote to her, and she kindly replied that the data were on her Dataverse at Harvard. A minute later I had the data and was ready to try to see if my intuitions might indeed be supported by evidence.

Feel free to use this image just link to www.rentvine.com — Typing on Keyboard – Male Hand by Dave Dugdale. CC BY-SA 2.0 via Flickr.

This experience made me think: could someone find my replication data sets? For as long as I can remember (at least back to 1995), I always posted my replication data sets somewhere. Articles written until 2003 sent readers my public ftp site at UCSD. But UCSD has changed the name and file structure of that server several times since 2003, and for some reason they did not feel obligated to keep my public ftp site going (and I was not worried enough about replication to think of moving that ftp site to NYU). Fortunately I can usually find the replication files if anyone writes me, and if I cannot, my various more careful co-authors can find the data. But I am sure that I am not the only person to have replication data on obsolete servers. Thankfully Political Analysis has required me to put my data on the Political Analysis Dataverse so I no longer have to remember to be a good citizen. And my resolution is to get as many replication data sets from old pieces on my own Harvard Dataverse. I will feel less hypocritical once that is done. It would be very nice if other authors emulated Jan!

The possibly less happy outcome relates to the recent article in PNAS on a Facebook experiment on social contagion. The authors, in a footnote, said that replication data was available by writing to the authors. I wrote twice, giving them a full month, but heard nothing. I then wrote to the editor of PNAS who informed me that the lead author had both been on vacation and was overwhelmed with responses to the article. I am promised that the check is in the mail.

What editor wants to be bothered by fielding inquiries about replication data sets? What author wants to worry about going on vacation (and forgetting to set a vacation message)? How much simpler the world would have been for the authors, editor, and me, if PNAS simply followed the good practice of Political Analysis, the American Journal of Political Science, the Quarterly Journal of Political Science, Econometrica, and (if rumors are correct) soon the American Political Science Review of demanding that authors post, either on the journal web site or the journal Dataverse, all replication materials before an article is actually published? Why does not every journal do this?

A distant second best is to require authors to post their replication on their personal website. As we have seen from my experience, this often leads to lost or non-working URLs. While the simple solution here is the Dataverse, surely at a minimum authors should provide a standard Document Object Identifier (DOI) which should persist even as machine names change. But the Dataverse solution does this, and so much more, that it seems odd in this day and age for all journals not to use this solution. And we can all be good citizens and put our own pre-replication standard datasets on our own Dataverses. All of this is as easy (and maybe) easier than maintaining private data web pages, and one can rest easy that one’s data will be available until either Harvard goes out of business or the sun burns out.

Featured image: BalticServers data center by Fleshas CC-BY-SA-3.0 via Wikimedia Commons.

The post Research replication in social science: reflections from Nathaniel Beck appeared first on OUPblog.

<<	May 2025					>>
Su	Mo	Tu	We	Th	Fr	Sa
				01	02	03
04	05	06	07	08	09	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

You are not logged in

Register Now (it's free!)

Log in to JacketFlap

Sort Blog Posts

Sort Posts by:

Suggest a Blog

Enter a Blog's Feed URL below and click Submit:

Most Commented Posts

In the past 7 days

Recent Posts

(tagged with 'Nathaniel Beck')

Recent Comments

Recently Viewed

JacketFlap Sponsors

Spread the word about books.
Put this Widget on your blog!

MyJacketFlap Blogs

Blog Posts by Tag

In the past 7 days

Blog Posts by Date

Click days in this calendar to see posts by day or month

Writer Blogs

Agent Blogs

Publisher Blogs

Editor Blogs

Librarian Blogs

Bookseller Blogs

Reviews Blogs

Illustrator Blogs

News Blogs

Industry Blogs

Viewing: Blog Posts Tagged with: Nathaniel Beck, Most Recent at Top [Help]

Results 1 - 2 of 2

How to use this Page

Replication redux, by Nathaniel Beck

Related Stories

Introduction from Michael Alvarez, co-editor of Political Analysis:

Research replication in social science: reflections from Nathaniel Beck

Related Stories