Battle of the Bots: A Personal Experience with Emerging Issues in Psychological Research and Recommendations for Future Research

Imagine my shock when less than 24 hours after sharing my research study on social media, over 2,000 responses had already been recorded. The study was targeted towards a very specific population (sexual minority Latinx youth). Therefore, I was optimistically expecting as many as 100 participants to complete my study over the course of several weeks. Needless to say, I was beyond surprised to see such a tremendous number of responses so shortly after sending out the call for participants. Immediately, I knew something had gone awry. In a panic, I called Qualtrics to see if they had any idea what was going on. They did not. My most immediate solution was to shut down data collection and close the survey to participants.

I spent the following days thinking about what went wrong. I began sifting through the responses and almost immediately it became apparent that most of these respondents were not true participants. Anecdotally, I had heard that a colleague had encountered issues with “bots” taking over his survey. In the initial parts of designing my survey, I had expressed these concerns to Qualtrics’ support team members who were helping me put together the survey. They had mentioned a few solutions (see below). Since I had not heard of other researchers having issues with bots, I assumed that I had taken the necessary precautions to launch the survey. That assumption was incorrect. In an effort to “make lemons into lemonade” I am sharing my experience so that hopefully others will use the strategies I have found to be helpful and have a less frustrating experience.

Bots Defined

What are bots? There is no easy way to answer this question. For the purposes of this article, I use the term bot to refer to responses generated by a computer program coded to take a survey in an automated and repetitive way. This definition is based on accounts by other researchers, as well as patterns of responses I found in my data. For example, I noticed that the first time some of these identified bots took the survey, they would take longer to complete it (sometimes 20-40 minutes), I assume that this is because they were taking time to write the program. The next times they would take the survey, it would take them seconds to just a few minutes to complete. However, this was not always the case; there were responses which appeared to be generated by the same bot (based on patterns of responses) which took approximately the same amount of time to complete for each survey response.

The Motivation Behind Bots

So, who makes bots? Where do they come from? What do they want? These are all great questions to ask, and questions asked when I initially encountered this issue. I will start with more background on my particular study to help answer these questions.

Incentives. As most researchers can relate, I had a limited budget amount allotted towards participant incentives. The participants I was recruiting are part of multiple marginalized identities (ethnic and sexual minorities, minors); therefore, it was important to me that they receive some form of compensation for their time. Due to California regulations, my university’s Office of Human Subjects informed me that I could not do a gift card raffle with minors because it would be considered gambling. I adjusted my budget and strategy and developed a plan to compensate each participant with a $5 gift card. While to most this is a minimal amount, it can quickly (and I do mean quickly) add up if you are able to write a program that completes the survey hundreds of times over – this is where the bots come into play. If there is money to be made, chances are someone has tried to find a way to make it in a way that is faster and more lucrative than ever before (thank you capitalism).

The Rise of the Bots

There has been a documented surge of bots on Amazon’s Mechanical Turk (MTurk) starting approximately in February 2018. Kennedy and colleagues (2018) found that these bots were real people in Venezuela who, in the face of economic crisis and desperation, had few other options to meet their needs and began taking the surveys to receive the gift-card incentives and exchange them for real cash. However, this is not always the case. Utilizing the geoIP address and other tools (more on that below), I have identified bots from all over the world and with various different patterns of responses. This suggests to me that there were different users creating the bots from all different parts of the world.

I cannot emphasize enough just how inventive and sophisticated these bots are. Every time I thought I developed a solution, I would launch my survey only to find that they had taken over once again. Even after following Qualtrics’ suggestions to use password protection for the survey and include captcha, I was still inundated with responses. Each time there were less and less of them, but it still gave me the experiential terror of feeling like I was playing real life whack-a-mole with bots.

Tools for Battling Bots

So, what do you do to avoid the challenging experience I am describing here? My recommendations will focus on Qualtrics since that is the platform that my university utilizes. However, many of these solutions are applicable to other platforms as well. Here are the main recommendations I have found help with the battle against bots:

  • Plan ahead. Create a protocol that will help you determine which responses are genuine and which ones are not. Creating a quality check protocol might be helpful in identifying possible fraudulent responses. Establishing that plan ahead of time will guide your decisions along the way.
  • Include captcha in your study. While many bots bypass captcha, it still prevents many of them from accessing your study.
  • Select Prevent Ballot Box Stuffing. This feature can be found under Survey Options. This prevents participants from being able to answer your survey more than once. Although, there is a way around this if they use different IP addresses, it makes it more difficult for them to cheat.
  • Add a password protection to the survey. Include the password in recruitment materials so that participants are able to access the survey. This significantly reduced the number of bot responses, from thousands down to hundreds. Password protection can be found under Survey Options.
  • Set quotas. Quotas can be helpful, if you monitor your survey responses closely. I set a quota of maximum 100 participants so that at the very least I would not be inundated with an overwhelming number of responses before I had the chance to sift through them. This allows you to temporarily stop responses after a certain number of responses and then evaluate the quality of the data before deciding to continue data collection.
  • Add text entry questions. Free response options will help catch the bots once they have already taken the survey. Adding things like “tell me your favorite animal” or asking participants to come up with complete sentences, which is a more helpful strategy than simple one words answers because they are a bit more complicated for bots to answer. This will also help you find patterns in the data. This is the primary way I was able to filter out the bots in my study, finding patterns in capitalization, punctuation, and syntax.
  • Use quality checks questions. Survey items such as “select 5 on the sliding scale below” can be helpful and another way to screen out bots.
  • Incorporate validity check questions. For example, asking both “how old are you?” and “what year were you born?” will help you screen for potential bots. For instance, a respondent who claims to be 17 years old and born in 1990 is likely to be a bot.
  • Use the Meta Data function. Qualtrics offers the option of collecting meta data. Go to add a new question, then for question type, select “Meta Info Question.” This will collect the browser type and version, as well as other information such as the operating system the respondent is using. If you become suspicious of a particular response, you could use this tool to verify that this person is not a bot.
  • Monitor completion time. Discard any survey response that is completed in less than X number of seconds, or an amount of time that would be an unreasonable amount of time to complete your survey. This is done by adding a filter to the data, selecting survey metadata, and duration (in seconds). For example, I asked a few friends who of similar age and background as the participants I was hoping to recruit to complete the survey so I could see how long it would take them to complete it. They ranged from 12-15 minutes. After opening the survey for data collection, I had responses that took seconds to just a couple of minutes to complete. It was reasonable to assume that these responses who completed the survey in under 7 minutes were likely to be bots.
  • Create a separate survey for incentives. This will help you verify responses and only compensate true participants. If you select to do this, be sure to include settings that will only allow participants coming from the study to access the incentives survey. This can be done under survey options and selecting HTTP Referrer Verification.
  • Use geoIP location to detect bots. If you look at any one particular survey response, towards the end of the page, you will see information on the location data. Qualtrics will provide a location estimation based on that respondent’s IP address. You can use this to verify that the participant is actually located within the particular location that is part of your inclusion criteria, if you have such criteria. Alternative, I use this to verify their response to survey questions, such as “what state do you live in?” See Bai (2018) for more information on utilizing locations to determine low quality responses on MTurk. To find more information on how to screen out international respondents in Qualtrics, see Burleigh, Kennedy, & Clifford (2018).
  • Plan for bots in your informed consent form. Include language about bots, what you will do to address them, and a disclosure that participants who are found to be bots will not be compensated onto your IRB protocols and consent forms. Here is the language I included for my IRB protocol and consent forms: “Participant IP addresses will be collected in Qualtrics for data quality purposes. IP addresses will be discarded at the conclusion of the study. If responses to the survey are found to be fraudulent (e.g., bots, en mass responses, etc.), multiple responses are made by the same user, or an individual is found to be purposely manipulating the survey, (e.g., participating in a survey for which they are not eligible, giving dishonest responses), payment will be withheld.”

Bots and the Future of Research

  • Online platforms have come to play an increasingly important role in counseling psychology studies, particularly as a promising avenue for conducting psychological research with hard to reach populations, such as LGBTQ+ people. Platforms such as Qualtrics and Amazon’s Mechanical Turk (MTurk) have facilitated the inclusion and recruitment of these hard to reach populations. As new avenues for research arise, so will new challenges.
  • Bots have become an emerging issue in our field, posing a real threat to data quality and ultimately endangering our research studies. The solutions provided in this document are in no way definitive. Screening bots is a time-consuming process, and we need more strategies to stop them altogether.
  • It is likely that as technology changes, bots will find new ways to compromise our studies. Given the scope of the problem and the speed of technological change, our response must be simultaneously rapid, comprehensive, and continually evolving.
  • The battle of the bots is one that can only be won if we work collaboratively and quickly. The development of institutional support, such as the creation of a taskforce or working group, dedicated to addressing the issue of bots could provide tremendous benefits to our field. This issue could be also be addressed within APA and receive support of relevant entities, such as the Science Directorate. Resources outside of APA may also be beneficial, such as the National Science Foundation.

 

Sam del Castillo is a fifth-year doctoral candidate in the Counseling, Clinical, and School Psychology Department at the University of California



Tags: 
Posted on: November 5th, 2020