A. The Language Test
Language Testing is the practice and study of evaluating the proficiency of an individual in using a particular language effectively . The purpose of a language test is to determine a person knowledge and/or ability in the language and to discriminate that person’s ability from that of others. Such ability may be of different kinds, achievement, proficiency or aptitude. Tests, unlike scales, consist of specified tasks through which language abilities are elicited. The term language test is used somewhat more widely to include for example classroom testing for learning and institutional examinations.
From the explanation above, the writer tries to develop the specific explanation for this language test in order to easy to understand.
1. The Understanding of the Test
Test is one of the important methods in determining the extent of student success in learning and teaching process. Without test, the teacher cannot get some useful and accurate information which is related to students’ achievement. To learn more about the test, the writer will first elaborate the definition of it.
A test is an assessment, often administered on paper, intended to measure the test-takers’ or respondents’ knowledge, skills, aptitudes, or classification in many other topics .
Test is any standardized procedure for measuring sensitivity, memory, intelligence, aptitude, personality, et cetera that was standardized on a large sample of students .
Amir Daien Indrakusuma said in Arikunto that the test is a tool or a systematic and objective procedure to obtain data or the desired particulars about a person, in a way that may be regarded as precisely and quickly .
Robert L. Linn and Norman E. Gronlund state that a test is a particular type of assessment that typically consists of a set of questions administered during a fixed period of time under reasonably comparable conditions for all students .
In Business Dictionary, test is explained as following:
Examination, evaluation, observation, or trial used (under actual or simulated environmental or operating conditions) to determine and document (1) capabilities, characteristics, effectiveness, reliability, and/or suitability of a material, product, or system, or the (2) ability, aptitude, behavior, skill level, knowledge, or performance of an person .
Based on the definitions above, the writer can conclude that the test is an instrument, assessment, systematic or standardized procedure for measuring a sample of behavior by posing a set of questions in a uniform manner.
2. The Importance of the Test
Tests play important role in the institutional studies, if there were no test, most of the graduation would have been less informed than they are today .
Testing, in education, is an attempt to measure a student’s knowledge, intelligence, or other characteristics in a systematic way. Teachers give tests to discover the learning abilities of their students. They also give tests to see how well students have learned a particular subject. Some tests help people choose a vocation, and other tests help them understand their own personality.
3. The Types of the Tests
A test’s specifications provide the official statement about what the tests and how it tests it. The specifications are the blueprint to be followed by test and items designers, and they are also essential in the establishment of the tests construct validity (see the sub-chapter of construct validity).
Test designers need guidance on practical matters that will be assist test construction. Therefore, before the teachers take the right step in making the tests, they must know in advance about the types of tests that will be used to the students. In other words, teachers must get clear and detailed information for the purpose of the test so that it can be very useful to students. Many types of tests are to determine the level of student performance.
Norman E. Gronlund classifies a test into four types. Those are placement tests, formative tests, diagnostic tests, and summative tests .
Jack R. Frankel and Norman E. Wallen also classify a test into four types: achievement tests, aptitude tests, performance tests, and projective devices .
While, Wilmar Tinambunan says that there are two types of test used in determining a person’s abilities: aptitude tests and achievement tests .
The classification of test done by some experts above, generally, there is no too deep difference. In other words, they differ in terms and scope of each type of test. Therefore, the writer will discuss achievement tests, aptitude tests, proficiency tests, and placement tests.
a. Achievement tests
Achievement, or ability, tests measure an individual’s knowledge or skill in a given area or subject . The primary goal of the achievement tests is to measure past learning, that is, the accumulated knowledge and skills of an individual in a particular field.
Achievement test scores are often used in an educational system to determine what level of instruction for which a student is prepared. High achievement scores usually indicate a mastery of grade-level material, and the readiness for advanced instruction. Low achievement scores can indicate the need for remediation or repeating a course grade.
It is not unusual for public school systems to utilize achievement testing to identify students who are prepared to move on to more advanced courses of study or who needs some types of remedial instruction. The idea behind using an achievement test format to measure the grade level for each student is not intended to reflect on the general intelligence of the individual. Rather, the purpose of the testing is to ensure each student is placed in a classroom situation where there is the best opportunity for learn and assimilate data in an organized fashion that prepares them for moving on to more advanced material.
For example, a student who does not do well with basic mathematics on an achievement test is likely to be placed in a remedial learning situation. Doing so provides the student with the opportunity to master the basics before attempting to learn more advanced mathematical concepts like algebra or geometry. At a later date, the student will have the chance to take a second test; should the results indicate that the student is sufficiently prepared to move on to something more complicated, he or she can be reassigned to a more challenging course of study.
While utilized widely, not all educators universally support tests of achievement. Proponents point to the success of the tests in placing students into learning situations that are more in line with their current level of knowledge rather than overwhelming them with additional information that they may or may not be able to assimilate. Individuals and groups that oppose the use of an achievement test claim that the exams are not structured in a manner that accounts for the general aptitude of each student, resulting in the creation of an overall learning environment that pigeonholes rather than nurtures each student and promote productive learning.
b. Aptitude Tests
Another familiar type of ability test that the writer would like to discuss is the so-called aptitude or intelligence tests. They assess intellectual abilities that are not, most cases, specifically taught in school .
Aptitude tests are intended to measure an individual’s potential to achieve; in actuality, they measure present skills or abilities. They differ from achievement tests in their purpose and often in content, usually including a wider variety of skills or knowledge. The same test may be either an aptitude or an achievement test, depending on the purpose for which it is used.
An English achievement test, for example, may measure aptitude for additional English learning or course. Although such tests are used primarily by consolers to help individuals identify areas in which they may have potential.
c. Proficiency Tests
Proficiency tests are designed to measure test taker’s ability in language regardless of any training they may have had in that language . In contrast to achievement tests, content of proficiency tests are not based on the syllabus or instructional objectives of language courses. Rather, those are based on a specification of what candidates or test takers have to be able to do in the language in order to be considered proficient.
Proficiency tests normally measure a broad range of language skills and competence, including structure, phonology, vocabulary, integrated communication skills, and cultural insight. There is also proficiency test, which include appropriateness of language usage in its specified social context, in other words, communicative competence.
If we compare between proficiency and achievement tests, we will find that the difference lies rather in the source of materials used in its preparation and in the use to be made of the test results. Whereas achievement tests are used to obtain measures from formal studying during a specified time, proficiency tests serve principally to obtain measures of the degree of knowledge of particular language at particular time and for a particular purpose.
There is a different content and level of difficulty in proficiency tests . There is one designed to measure someone whether he/she has sufficient command of the language for a specific purposes. An example of this would be someone who will follow a particular subject area at a particular university. The content therefore will/should reflect the purpose for which the test has been prepared.
There is other proficiency test, which by contrast do not have any occupation or course of study in mind . For this form, the concept of proficiency therefore is more general. This test is intended to show whether candidates have reached a certain standard with respect to certain specified abilities. An example for this is like Cambridge Examination (First Certificate Examination and Proficiency Examination)
d. Placement tests
Placement tests are designed to assess student’s level of language ability so that they can be placed in the appropriate course or class . Such test may be based in aspects of the syllabus taught at the institution concerned, or may be based on unrelated material. In some language centers, students are placed according to their rank in the test results so that, for example, the students with the top eight scores might go into the top class. In other centers the student’s ability in different skills such as reading and writing may need to be identified.
Placement test is intended to know the student’s entry performance. That is, whether or not the student has possessed the knowledge and skills needed to begin the planned instruction; to what extent has the student already mastered the objectives of the planned instruction .
The result of tests will enable the teacher to sort students into teaching group. That is, to determine the position in the instructional sequence and the mode of instruction that is most likely to benefit the student the most.
4. The characteristics of Good Test
A test which is good as the measuring instrument must meet the test requirements, namely to have validity, reliability, objectivity, practicability, and economics . Validity refers to the adequacy and appropriateness of the interpretation made from tests, with regard to a particular use . An information data can be said is valid in accordance with actual circumstances. For further explanation about the validity, the writer will explain in the following subchapter.
The second characteristic of good test is needed to have reliability. Reliability refers to the consistency of test results . If teachers obtain quite similar scores when the same test procedure is used with the same students on two different occasions, they can conclude that their results have a high degree of reliability from one occasion to another. Similarly, if different teachers independently rate student performances on the same test task and obtain similar ratings, they can conclude that tests can be said reliable if it gives results that remain when a test is practiced to students for many times.
B. The Validity
1. The Understanding of the Validity
Validity indicates the ability of an instrument to measure what should be measured. Someone who wants to measure the height must use the meter; measure the weight must use the scales. Meter and scale are a valid measure in the case. In a study involving variable/concept that cannot be measured directly, the issue of validity becomes simple, in it also involves the translation of concepts from the theoretical to the empirical level (indicators), but nevertheless not simply a research instrument should be valid for the results can be trusted.
Test validity is the most critical factor to be judged in the total of foreign language testing. A test is valid when it measures effectively what it is intended to measure, whether it is achievement, aptitude, or proficiency in the language. A test may be designed to measure the integrative abilities or discrete items within the subsystem of a language.
For example, if a test is designed to measure aural comprehension, it must do exactly this and not attempt to measure another skill such as reading comprehension. If a test is intended to measure a person’s ability to speak the language, it is valid only if speaking skills and not writing ability are the specific measurable skills emphasized.
Validity is not a simple concept; however, the concept of validity reveals a number of aspects. In other words, there are many kinds of validity elaborated by some experts.
Arthur Hughes classifies validity into four: content validity, criterion-related validity, face validity, and construct validity . According to Julian C. Stanley, there are five types of validity. They are substitutive validity, predictive validity, content validity, construct validity, and factorial validity . In addition, J. Charles Anderson, Caroline Clapham and Dianne Wall say that there are three types of validity. The first is internal validity, which consists of face validity, content validity, and response validity. The second is external validity, which consists of concurrent and predictive validity. The third is construct validity .
2. The Types of The Validity
Based on the explanation above the writer will discuss some types of validity. Those are face validity, content validity, criterion validity, and construct validity.
a. Face validity
Face validity is a property of a test intended to measure something . It is the validity of a test at face value. In other words, a test can be said to have face validity if it “looks like” it is going to measure what it is supposed to measure.
For instance, if a teacher prepares a test to measure whether students can perform multiplication and it looks like a good test of multiplication ability for them, he has shown the face validity of his test.
Face validity is more referring to the shape and appearance instruments. According to Djamaludin Ancok in Arikunto, it is very important in measuring the ability of individuals such as the measurement of honesty, intelligence, talent and skill.
Substantially, there is no different view among definitions above. They would like to elaborate that a test is regarded as having face validity, if its appearance is acceptable, it is readable clearly, and it has a clear instruction in answering the tests.
Therefore, the test maker should pay attention some criteria before making a test especially that is related to the face validity. There are several considerations that must be done in making a test:
a. A test maker should consider about a spelling in constructing test items. It must be avoided to make a wrong spelling because it can be a trouble for a testee in taking a test.
b. The test maker should pay attention about punctuation or markers like period, comma, colon, semicolon, question mark, exclamation mark, etc. Although it seems like a simple thing, but practically it can help a test taker to understand a test items.
c. In constructing the tests, the test maker should consider the composition of test items. For example, the first is about a grammatical sentence. A grammar obviously has an important role. The test maker should consider about it in constructing a test item. If a test item is provided with ungrammatical sentence, the test taker will confuse in facing it. It will be a constraint in comprehending a question. The second is a space between lines. The item tests should be placed in appropriate position. There must be a fit space among lines. The last is about whether the sentence is logic or not. The test maker should keep away from illogical statement of a test item.
d. The last thing, which the test maker should think over, is an instruction of tests. The instruction must be given with a clear and simple form, so the testee can directly understand what a questions means.
b. Content Validity
Content validity is concerned with the extent to which the test is representative of a defined body of content consisting of topics and processes . Moreover, the test should reflect instructional objectives or subject matters. But it is not expected that every knowledge or skills will always appear in the test; there may simply be too many things for all of them to appear in a single test.
If the test given to student does not have content validity, there will be consequences. The first consequence is that the students cannot demonstrate skills that they possess if they are not tested. The second consequence is that irrelevant items are presented that students will likely answer incorrectly only because the content was not taught . These two consequences tend to lower the test scores. As a result, the test score is not an adequate measure of student performance relative to the content covered by instruction.
Content validity is assured by checking all items in the test to make certain that they correspond to the instructional objectives of the course. In other words, a test can be judged as having content validity by comparing a test of specification and test content. Ideally, these judgments should be made by people who have an experience around language teaching and testing, or experts. A common way is for them to analyze the content of a test and to compare it with a statement of what the content ought to be. Such a content statement may be the test’s specification, it may be a formal teaching syllabus or curriculum, or it may be a domain of a specification .
There are two importance of content validity. First, the greater test’s content validity, the more likely it is to be an accurate measure of what it is supposed to measure. Secondly, such a test is likely to have a harmful backwash effect. Areas which are not tested are likely become areas ignored in teaching and learning. The best a safeguard against this is to construct full test specifications and to ensure that the test content is a fair reflection of these .
c. Criterion Validity
Criterion validity is a measure of how well one variable or set of variables predicts an outcome based on information from other variables, and will be achieved if a set of measures from a personality test relate to a behavioral criterion on which psychologists agree . A typical way to achieve this is in relation to the extent to which a score on a personality test can predict future performance or behavior. Another way involves correlating test scores with another established test that also measures the same personality characteristic.
In order to know whether a test has criterion validity or no, it can be traced from two ways; predictive validity and concurrent validity.
1) Predictive Validity
Predictive validity applies if there is an intervening period (e.g., three or six months) between the time of testing and the collection of data on the criterion.
Operationally, this time of criterion data collection is the distinction between the two types of criterion validity. Specifically, the question of concurrent validity is whether or not the test scores estimate a specified present performance; that of predictive validity is whether or not the test scores predict a specified future performance.
The simplest form of predictive validation is to give students a test, and then at some appropriate point in the future give them another test of the ability the initial test was intended to predict. A common use for a proficiency test like IELTS (International English Language Testing System) or the TOEFL (Test of English as Foreign Language) is to identify students who might be at risk when studying in an English-medium setting because of weaknesses in their English. Predictive validation would involve giving students the IELTS test before they leave their home country for overseas study, and then, once they have all arrived in the host study setting and had time to settle down, giving them a test of their use of English in that study setting. A high correlation between the two scores would indicate a high degree of predictive validity for the IELTS test.
Another example of a predictive validation study might be the validation of a test of language competence for student teachers of that language. In this example, such students have to pass the test before they are allowed to enter the Teaching Practice component of their course, during which they will need a high level of foreign language competence. Predictive validation of the test involves following up those students who pass the test, and getting their pupils, their fellow teachers and their teacher-observers to rate them for their language ability in the classroom.
The predictive validity of the test would be the correlation between the test results and the ratings of their language ability in class.
2) Concurrent Validity
Concurrent validity is the comparison of the test scores with some other measure for the same candidates taken at roughly the same time as the test . Concurrent validity applies if data on the two measures – test and criterion – are collected at or about the same time.
M. Ngalim Purwanto also said that if a test result has high correlation with the results of other measuring devices on the same field at the same time, it is said that tests had concurrent validity .
The other measure may be scores from a parallel version of the same test or from some other test; the candidates’ self-assessment of their language abilities; or ratings of the candidate on relevant dimension by teachers, subject specialists or other informants.
A mechanism for ascertaining concurrent validity could follow a pattern such as the following: a new language test is administered to students in the course for which the test is developed, and scores are recorded for each student. These scores are then compared to the criterion test grades or to teachers’ ratings. If the individuals with the highest criterion test grades or teachers’ rating score highest on the new test, and those with the lowest grades and/or ratings on the test have also been rated lowest by the teachers, then it is highly probable that the new test measures what it is designed to measure. The relationship of the two is a measure of concurrent validity.
d. Construct Validity
A test, part of a test, or a testing technique is said to have construct validity if it can be demonstrated that it measures just the ability or trait, which it is supposed to measure . The word ‘construct’ above refers to any underlying ability or traits, which is hypothesized in a theory of language ability. One might hypothesize, for example, that the ability to read includes a number of sub-abilities, such as the ability to find out the main idea of a text.
Determining construct validity involves both logical and mathematical operations. There are several steps in determining it. The first step is to decide what traits or abilities are being tested and then to deduce what sorts of behaviors, abilities, or achievements would be typical of people who possess a lot of the traits but would be unusual among people with little of trait.
The next step is to decide on some behavior, ability or achievement that would be unrelated to the trait one is trying to measure. The mathematical operation is to correlate test scores with the hypothetically related behavior and the hypothetically unrelated behavior. Construct validation is demonstrated when the hypothetical relationships are shown by the correlations; that is, the correlation between the test and the related behavior is high, but the correlation between the test and the unrelated behavior is low.