The list below contains some of the electronic resources used by TREC members in empirical and experimental research in translation. If you want to know more about their experiences, visit our Publications section.
Subject profiling is a method used to analyze intraindividual differences, that is to say variability in terms of research participants' profiles. Research participants have different characteristics in terms of subject matter knowledge, interests, personality, and strategic processing. Their behaviour is likely to be affected by cognitive and noncognitive factors which configure differently within individuals.
Subject profiling is often undertaken by means of tests or questionnaires intended to assess various aspects of a person. For example, these tests can be used to measure intelligence, or outline core personality traits. Many tests, especially those designed to identify personality traits, ask the test-taker to answer multiple-choice questions in relation to typical behaviours. Other types of tests, such as those intended to measure intelligence, ask the test-taker to answer test questions correctly.
In translation and interpreting process research, there is growing interest in how research participants’ individual differences influence task performance. Distinct and informative participant profiles can be obtained with a number of different aptitude and personality instruments. The tests listed below are just some of the available tests that can be used to characterize participants’ mental abilities, personality traits and language skills.
- Dialang tests
- Myers-Briggs Type Indicator (MBTI)
- Test of English as a Foreign Language (TOEFL)
- Wechsler Adult Intelligence Scale (WAIS)
Quantitative methods collect and analyse quantitative data on variables, that is to say, on characteristics that can have different values. They are focused on quantity, and aim to predict, control, describe, confirm and test hypotheses. A number of instruments and techniques are available to collect quantitative data. Among them, one of the most representative and commonly used is the survey, which is used to obtain, systematically, information regarding the variables bearing on a research study.
There are various types of tools for data collection within the framework of a survey, including interviews or tests, but the most widespread and used technique within the research community is the questionnaire. This needs to be well-structured but the participation of the researcher in its administration is less required, especially when data are collected online.
Online questionnaires are an effective alternative to traditional paper surveys administered in person or by mail, and its advantages include the speed with which data can be collected, its cost-effectiveness, and the confidential and anonymous treatment of responses.
An eye tracker is a device which registers eye movements. The most common type is the video-based eye tracking system, which typically uses infrared light to create a corneal reflection that is recorded by a camera. The eye movement data allow the experimenter to calculate fixation count and duration, saccades, regressions and pupillary movements. For translation experiments, remote eye tracking systems are preferred as they allow the participant free head movement and visual access to the keyboard. Depending on the equipment, remote eye tracking systems register eye movements with a frequency of between 120 Hz and 1000 Hz, and they are accurate to 0.5 of a degree, which corresponds to up to 1 cm of inaccuracy.
MONITORING PHYSIOLOGICAL REACTIONS
This category includes Electroencephalography (EEG) and Electrocardiography (ECG).
The investigation of coherent electric activity in the brain has opened new possibilities to our knowledge on human cognition. Electroencephalography (EEG) is the recording of electrical activity along the scalp. Different EEG techniques are used to investigate cognitive processing:
EEG Coherence is the oscillatory coupling between two channels, occurring with several oscillations, in a narrow frequency band, for a given period of time. EEG Coherence measures electric signal correlation between regions and over trials. Two types of coherent activity can be detected in oscillatory signals picked up from two brain or scalp areas: Synchrony and Asynchrony. EEG Coherence is used to quantify the existence of coherent brain activity differences between conditions.
Event-Related Potentials (ERPs) are very small voltages generated in the brain structures in response to stimuli, and thus perceived to be the direct result of cognitive processes. ERPs are time-locked, i.e. a point in time is marked when a stimulus occurs, and averaged on a large number of trials in order to cancel out noise in the data. In the domain of language processing, cognitive ERPs such as N400 and P600 are related with information processing. The N400, a peak in the electrical activity measured 400 milliseconds (ms) after stimulus, is usually associated to the expectancy of a given word to end a sentence. The P600, generated 600 ms after stimulus, is typically elicited in the case of grammatical errors and syntactic anomalies.
An ECG is used to measure the rate and regularity of heartbeats. ECGs performed for research purposes serve to establish a connection between neurovegetative activity (level of alertness, emotion, physical implication) and cognitive processing. Performing information processing tasks, such as e.g. sentence generation, has been found to lead to accelerated heart rate, corresponding to a high level of vegetative activity.
Keystroke logging, or keylogging, is the action of recording (or logging) the keys struck on a keyboard.
Speech recognition (SR) is the process of converting spoken input into written text using a computer application. SR allows the user to generate text without the use of a computer keyboard, and it can potentially reduce the amount of time used to produce a text or a translation. Users typically need to train their SR system in order to improve the recognisability of the spoken input and thus the quality of the written output.
RECORDING ON-SCREEN ACTIVITY
Screen recording software captures all on-screen activity that transpires during the course of translation in video (usually avi or flash) format. Such activity includes the utilization of CAT tools, online information retrieval, patterns in text generation, revisions, and overall translation workflow. Most screen recording applications also capture keystroke and mouse click data. Screen recording videos can be played in real time, or at speeds of up to 16 times the original. Additional electronic recording device features, such as pause, rewind, and fast-forward, are also standard.
A major advantage of screen recording as a research tool is its relative preservation of ecological validity. Translators are not required to do anything they would otherwise not do while translating in their natural environments and the software can be readily installed on any computer. The software runs in the background during task completion in an unobtrusive manner.
To date, screen recording has been utilized as a method for researching such phenomena as translator style, problem-solving, and revision tendencies.
Remote monitoring software brings the possibility of seeing what users are doing and typing in real-time from any web-browser. It documents from words typed, to web search and programs used. Some of its remote access and monitoring features are: remote installation, configuration and removal, anytime access, centralized tracking, real-time activity viewing, invisibility, logging of keystroke, application usage and website activity, recording of all online searches as well as access to all files and documents opened from within the operating system. It is also easy to use.
Caution: Although monitoring software is legal, it must only be used with the consent of the subjects under study so that it is not confused with a spyware.
Tagging texts is the most frequent form of corpus annotation. It usually refers to Part-of-Speech (PoS) tagging, i.e. the grammatical annotation of language data which can be retrieved by search engines developed by computational linguists. Tagged texts contain identification fields (tags) usually in XML format. These tagged segments are usually enclosed between the symbols < and /> which refer to the specific part of a text assigned to a particular PoS. There are several taggers available in the web, some of which can be downloaded for free.
The notion of parallel text alignment derives from Harris’ (1988) concept of bitext, and refers to (a) establishing links between linguistic units of at least two texts, (b) representing such links on screen, and (c) storing them for future access. Some aligners, such as Animalign, work at subsentential levels, but most of them align sentences. Generally, a text alligner imports files with (almost) identically formatted texts and connects them in a new file—usually, only two files at a time. Results are presented in columns or on a split screen. An editing function usually lets users to correct the automatic alignment and merge and split aligned sentences, or leave empty slots.
Aligned texts may be analyzed, searched, and manipulated for research or professional purposes. Hence, text aligners usually feature as subsidiary tools mainly in concordancers and in translation memory systems, although there some standalone programs as well, both commercial (e.g., ABBYY Aligner) and free. In research, aligners may be used to easily contrast an original with its translation or to compare alternative translations of a single source text, and also to study bilingual terminology and phraseology. Professional translators often use them to build translation memories from stored translations. Text aligners in concordancers usually offer many possibilities of analysis, whereas those in translation memory packages typically offer a wider range of input and output formats.
Harris, Brian. 1988. Bi-text, a new concept in translation theory. Language Monthly 54: 8–10.
There is a wide range of software packages including standard operations available to the public. Some standard operations are the creation of concordances and wordlists and the provision of statistics.
A concordancer is a piece of software which allows users to retrieve all occurrences of a particular word or phrase in a corpus, together with the segment of text in which they are located. This segment is called the concordance line. The basic investigation procedure for querying text corpora consists of producing multiple concordance lines, also known as KWIC concordance, where KWIC stands for Key Word In Context (‘Key word’ refers to the searched word). Apart from listing all the occurrences of a word (some programs also provide sentence and paragraph length concordances), and revealing the contexts in which it appears, KWIC concordances can also be sorted in different ways (i.e. alphabetically, to the left or to the right of the key word). They can also be expanded to reveal more of the context, and some programs even offer a collocational profile of the key word in a set of concordance lines. Other facilities that some software packages offer are, for instance, the listing of all the word-forms in a corpus, in frequency or alphabetical order. Word frequency profiles give statistical information that is useful for investigating different measures in a corpus, such as type-token ratio and lexical density.
ANALYSING QUALITATIVE DATA
Audio data (e.g. from interviews, think-aloud commentaries, retrospective commentaries) and audiovisual data (e.g. from screen recordings with voice-overs, video recordings, films) can be analysed by coding various segments and then combining codes to form categories that can contribute to theory-building or testing hypotheses. The applications listed below can all be used for the type of data collected in translation process research and provide various possibilities for calculating descriptive statistics.
MANAGING STATISTICAL INFORMATION
Quantitative research with clear predictive or confirmatory objectives often requires different kinds of statistical operations. When gathered, quantitative data retains a qualitative aspect or when a study is only looking for general tendencies, descriptive statistics are sufficient. However, if we want to know the statistical relevance of differences studied in a sample or the exact magnitude of relation between parameters, and to be able to extrapolate our findings to a whole population, we need to apply inferential statistics. The computer programs described below help with organising and validating quantitative databases, provide a wide range of data transformation facilities, support descriptive and inferential statistical operations and make it possible to generate graphs in order to visualise partial or total quantitative results.
DIALANG is a language diagnosis system developed by many European higher education institutions. It reports your level of skill against the Common European Framework (CEF) for language learning. DIALANG’s languages are Danish, Dutch, English, Finnish, French, German, Greek, Icelandic, Irish-gaelic, Italian, Norwegian, Portuguese, Spanish and Swedish. DIALANG has instructions and tests in all these languages. The test results are provided for each skill: writing, grammar, vocabulary, listening and reading. DIALANG is not an exam. DIALANG does not issue certificates.
The MBTI is a personality test based on Carl Gustav Jung's theory of personality types. It is a self-report measure which sorts people into one of 16 personality types, and indicates one's personality preference for Extraversion/Introversion, Sensing/Intuition, Thinking/Feeling and Judging/Perceiving.
TOEFL is an English proficiency test which aims to test an individual's ability to use and understand English in an academic setting. It is often used to assess the English language proficiency of non-native speakers, and consists of four sections measuring basic language skills: reading, listening, speaking, writing.
The WAIS is an intelligence test for adults and older adolescents which produces an Intelligence quotient (IQ) score. It is based on the belief that intelligence involves a variety of skills, and intends to measure human intelligence reflected in both verbal and performance abilities. Tasks on the WAIS include, for instance, questions of general knowledge, arithmetic and vocabulary.
Dropbox Forms integrates Dropbox with the WYSIWYG web form builder JotForm. JotForm allows you to create online forms and post the link for people to fill it in. All responses will be sent straight to Dropbox.
Google Forms is a tool to create free online surveys. Respondents can be invited by email, and they can use any web browser, including smartphone o tablet browsers.Responses are collected in an online spreadsheet.
LimeSurvey is an open source and free application to easily create, distribute and analyse questionnaire based surveys online. Designed to be user-friendly, it enables users to develop multi-question and multi-lingual surveys, send email invitations or reminders and export survey results to various formats. LimeSurvey also provides statistical and graphical analysis with optional graphs.
SurveyMonkey is an online survey tool. With an account you can design your survey, set up an audience project (MonkeySurvey allows you to purchase a group of specific respondents to take the survey) and analyse results.
Users can choose among four options, from the Basic Free one (with limitations such as 10 questions and 100 responses per survey) to the Platinum account for 800 € per year.
Tobii is a manufacturer of eye tracking hardware and software. Tobii manufactures the standalone T60, T120 & TX300 series and they also manufactured the discontinued T1750 series.
Tobii Studio is an eye tracking software which is used with the Tobii eye trackers to collect and analyse eye tracking data. Tobii Studio provides graphical visualisations, such as heat maps and gaze plots, as well as quantitative metrics, such as fixation count, fixation duration and pupil size.
SMI Vision produces eye and gaze tracking systems such as SMI RED, RED250 and RED500. It also manufacturers the mobile wireless eye tracking glasses. The SMI Experiment Center and SMI Begaze software packages make it possible to create, run and analyse experiments.
SR Research is an eye tracker manufacturer that develops high-speed eye tracking systems such as the EyeLink 1000 and the EyeLink II. SR Research offers software packages to build and execute experiments (Experiment Builder) and analyse eye tracking data (Data Viewer).
Besa provides new advanced technologies for the analysis and visualization of human brain activity using expert knowledge in Clinical Neurophysiology, Neuroimaging and Neuroscience. They develop and distribute innovative software for research and clinical applications in fields such as Electroencephalography (EEG). Among Besa's products is EEGFocus, which combines digital EEG review with advanced analysis features such as 3D whole head mapping, brain source montages and images, spike pattern search and averaging, spectral analysis, DSA trend analysis, on-line correction of eye and EKG artifacts etc. EEGFocus provides a user-friendly interface for immediate analysis of abnormal patterns during review, e.g. spikes or rhythmic EEG activities. Similar patterns can be searched for in the whole EEG and averaged, for example, to analyze a spike or seizure onset.
g.MOBIlab+ is a tool for recording multimodal biosignal data on a standard Pocket PC, PC or notebook. This allows investigation of brain-, heart-, and muscle- activity, eye movement, respiration, galvanic skin response, pulse and other body signals. g.MOBIlab+ is not a medical device.
Inputlog is a logging tool that logs all types of input modes: keyboard, mouse & speech recognition. Researchers make frequent use of keystroke logging tools to describe online writing or translation processes in detail.
Translog (2000, 2006) is a computer program that logs keyboard activity involved in a writing process. Translog is developed specifically to be used for translation, having a window displaying the source text and a window in which the target text is typed.
Translog II is based on the earlier Translog versions. In addition to the keylogging capabilities of Translog, Translog II can record the position of the eyes on the computer monitor during a translation process. Translog II is compatible with Tobii’s eye tracking hardware systems.
Dragon NaturallySpeaking is a speech recognition software package which allows the user to create text documents and control the computer through voice input.
Windows Speech Recognition is a computer application which allows the user to control the computer by giving specific voice commands and to dictate text through speech. The application is included in Windows Vista, Windows 7 and Windows 8.
A free screen recorder, offering all the basic features. It only runs on a PC.
Screen recording software available for both PC and Mac users. Users must pay a license fee. Offers more advanced features such as clip editing (cutting, splicing, etc.).
The acronym CLAWS stands for Constituent Likelihood Automatic Word-Tagging System. This tagger has been developed since the early 1980s at UCREL, the University Centre for Computer Corpus Research on Language at the University of Lancaster, United Kingdom. The latest version of the tagger is CLAWS4 and it was used to tag parts of speech (POS) in approximately 100 million words of the British National Corpus (BNC).
Litterae is a tagger developed at LETRA, the Laboratory for Experimentation in Translation, at Federal University of Minas Gerais, Brazil. It has been especially designed to annotate translation process data, including not only texts but also temporal reference to textual occurrences in the production process. Litterae can be used to tag translation units, including interim renditions in the course of text production, and assign a PoS tag to them.
Permission to use the software only under request.
PetraTAG is an open-source Spanish tagger and lemmatizer written in C++ developed by the PETRA Research Group at Universidad de Las Palmas de Gran Canaria, Spain. Petratag has been specially developed to tag bilingual texts. It includes a lemmatizer to find the canonical form of each word of a text and a PoS tagger to tag words according to their PoS function in texts. It is free and available as a standalone application, with some basic concordancing capabilities, and also as a Java object that can be integrated in applications such as spelling/grammar checkers, chatbots, computer-aided revision tools, etc. A new release that adds statistical graphics is on the works.
A commercial, Windows-based multilingual concordancer that recently changed its semi-automatic alignment function (build 269). It can be used with any language and may work simultaneously with up to four texts. Searches and frequency data are not saved.
A commercial, Windows-based tool embedded in Trados 2011. WinAlign identifies titles, sentences, list elements, proper names, numbers, dates, table cells, captions, index entries, footnotes, formatting or tags. It supports many office, desktop publishing and markup formats, some of wich may be displayed in WYSIWYG fashion. It currently has some issues when running that may be avoided by converting files into html format.
A commercial, Windows-based suite of programs to analyze texts that may run on Apple and Linux operating systems under certain conditions. It can be used to align texts in any language and provides sentence-by-sentence or paragraph-by-paragraph display of two or more aligned texts, and has powerful customizing options, but it does not tag texts.
This is a simple, online, automatic text aligner that will handle files of up to 1 MB each. It supports many input formats and offers output in tabbed TXT and HTML formats. Registration needed. The company, Terminotix, also offers other alignment products that offer batch alignment of multiple files.
AntConc is a freeware, multiplatform tool for carrying out corpus linguistics research and datadriven learning. AntConc contains seven tools: Concordance Tool, Concordance Plot Tool, File View Tool, Clusters (N-Grams), Collocates, Word List, and Keyword List.
WordSmith Tools is an integrated suite of programs for looking at how words behave in texts. You will be able to use the tools to find out how words are used in your own texts, or those of others. The WordList tool lets you see a list of all the words or word-clusters in a text, set out in alphabetical or frequency order. The concordancer, Concord, gives you a chance to see any word or phrase in context -- so that you can see what sort of company it keeps. With KeyWords you can find the key words in a text. (extract from WST Manual)
Qualitative data analysis package that allows annotation and coding of text, video and audio files. In addition, information about code frequency can be exported directly to SPSS for further analysis.
EdEt (Editor for Ethnographers), conceived by Iwona Kaliszewska (copyright belonging to the Institute of Ethnology and Cultural Anthropology, Warsaw University, Poland), supports work with textual and visual qualitative material, such as interviews (biographical or ethnographical interviews, especially with various participants), everyday narratives, participant or non-participant observation charts, field notes and fragments of texts. It is useful for editing, organising and analysing different kinds and sources of data in order to obtain triangulated qualitative content results. Furthermore, it allows for coding and code hierarchy establishment. EdEt’s main advantages are that it is free, lightweight and easy to use; it allows for collaborative work and team research supervision; it facilitates archiving; and it makes searching in collected material easy. The program has special interview features, such as semi-automatic participant statement division and scope for introducing characteristics corresponding to people or situations. Additionally, it offers an advanced query system, through which it is possible to retrieve statements with particular features, made by particular people, on particular topics and from particular interviews. EdEt could be helpful in the first, exploratory step of research, when specific research issues have yet to be chosen and researchers need to know what their collected data shows, or during the main process of in-depth qualitative analysis. Polish, English and Spanish versions of the program are available.
ELAN, developed by the Max Planck Institute for Psycholinguistics, is an annotation tool for audio and/or video streams. An annotation can either be time-aligned to the media or it can refer to other existing annotations. The textual content of annotations is always in Unicode and the transcription is stored in an XML format.
HyperRESEARCH is a software program for Windows and Mac OS X that helps you perform qualitative data analysis. You can use HyperRESEARCH to examine and organize textual, audio, video, and image data. You tag selections of your source data with key phrases—called codes—for later analysis and retrieval. (extract from User's guide)
NVivo is software that helps you easily organize and analyze unstructured information. NVivo handles Word documents, PDFs, audio files, database tables, spreadsheets, videos, pictures and web data. Information can be interchanged between NVivo and other applications like Microsoft Word and Excel, IBM SPSS Statistics, Survey Monkey, EndNote, Evernote and OneNote. (extracts from User handbook)
An R graphical user interface (GUI) for everyone, Deducer is designed to be a free, easy-to-use quantitative data analysis software package. This powerful data analysis tool has a menu system for common data manipulation and analysis tasks, and an Excel-like spreadsheet in which to view and edit data frames. The goal of the Deducer project is twofold: (1) to provide an intuitive GUI for R, encouraging non-technical users to learn and perform analyses without programming getting in their way; and (2) to enable expert R users to perform common tasks more efficiently by simplifying facilities. Deducer is designed to work with the Java-based R console JGR, although it supports a number of other R environments (e.g. Windows RGUI and RTerm).
PSPP is a program for the statistical analysis of sampled data. It can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. PSPP can be used with its graphical interface or more traditional syntax commands. A list of some of the program’s features follows:
- support for over a billion cases and over a billion variables;
- syntax and data files compatible with IBM SPSS Statistics;
- choice of terminal or graphical user interface;
- choice of text, postscript, pdf or html output formats;
- interoperability with free software;
- easy data import from spreadsheets, text files and database sources;
- user interface translated to multiple languages;
- fast statistical procedures, even with very large data sets;
- free software, licensed under GPLv3 or later;
- cross-platform (runs on many different computers and operating systems).
PSPP is particularly aimed at statisticians, social scientists and students requiring fast, convenient analyses of sampled data.
IBM SPSS Statistics is one of the world’s leading statistical software packages. It is an integrated family of products which addresses the entire analytical process, from planning and data collection to analysis, reporting and deployment. With more than a dozen integrated modules, IBM SPSS Statistics makes it possible to dig deep into quantitative data via data manipulation and statistical procedures, making sense of complex patterns and associations, and enabling end users to draw conclusions and make predictions. IBM SPSS Statistics comes in three editions: (1) IBM SPSS Statistics Standard (essential analytical tools for the most common projects); (2) IBM SPSS Statistics Professional (a comprehensive set of features and tools to address the entire analytic lifecycle); and (3) IBM SPSS Statistics Premium (designed for enterprise businesses with needs across all advanced analytics efforts). The first edition, IBM SPSS Statistics Standard, is the most frequently used in Translation Studies research. Its key capabilities include the generation of linear models (making analyses more accurate and conclusions more dependable) and nonlinear models (making it possible to apply more sophisticated models to quantitative data), simulation modelling (providing scope for building better models and assessing risk when inputs are uncertain, using Monte-Carlo simulation techniques) and table customisation (allowing for the slicing and dicing of data for easy analysis and reporting).