Transcription

Intelligence 32 (2004) 349 – 362The end of the Flynn effect?A study of secular trends in mean intelligence test scoresof Norwegian conscripts during half a centuryJon Martin Sundeta,*, Dag G. Barlaugb, Tore M. TorjussenbaInstitute of Psychology, University of Oslo, P.O. Box 1094, Blindern N-317 Oslo, NorwaybPsychological Services, Norwegian Armed Forces, NorwayReceived 22 October 2003; received in revised form 5 April 2004; accepted 8 June 2004AbstractThe present paper reports secular trends in the mean scores of a language, mathematics, and a Raven-like testtogether with a combined general ability (GA) score among Norwegian (male) conscripts tested from the mid1950s to 2002 (birth cohorts c1935–1984). Secular gains in standing height (indicating improved nutrition andhealth care) were also investigated. Substantial gains in GA were apparent from the mid 1950s (test years) to theend 1960s–early 1970s, followed by a decreasing gain rate and a complete stop from the mid 1990s. The gainsseemed to be mainly caused by decreasing prevalence of low scorers. From the early 1970s, the secular gains inGA were almost exclusively driven by gains on the Raven-like test. However, even the means on this particular teststopped to increase after the mid to late 1990s. It is concluded that the Flynn effect may have come to an end inNorway. Height gains were strongly correlated with intelligence gains until the cessation of height gains in theconscript cohorts towards the end of the 1980s. Contrary to the intelligence gains, the height gains (conscriptcohorts 1969–2002) were most pronounced in the upper half of the distribution. Evidence indicating decreasingintercorrelations between tests is reported.D 2004 Elsevier Inc. All rights reserved.Keywords: Flynn effect; Intelligence; Norwegian conscripts* Corresponding author.E-mail address: [email protected] (J.M. Sundet).0160-2896/ - see front matter D 2004 Elsevier Inc. All rights reserved.doi:10.1016/j.intell.2004.06.004

350J.M. Sundet et al. / Intelligence 32 (2004) 349–3621. IntroductionIt has long been known among intelligence test users that test performance improves from onegeneration to the next (the Flynn effect), necessitating new and stricter norms from time to time.Scientific interest in secular increases of intelligence test scores virtually exploded after the publicationof the seminal paper by Flynn (1987), reviewing data showing substantial gains in 14 industrializedcountries in Europe, North America, and the Far East. Later, secular gains have been observed inSweden (Emanuelsson, Reuterberg, & Svensson, 1993; Emanuelsson & Svensson, 1990), Denmark(Teasdale & Owen, 1989), Israel (Flynn, 1998a), and in urban regions in Brazil and China (Flynn,1998b). The average gain seems to be about 3–5 IQ points per decade. Recently, a diminishing growthrate in the birth cohorts 1940–1980 of male conscripts has been observed in Denmark (Teasdale &Owen, 2000). In Sweden, the secular trends may have more or less leveled out in the birth cohortsbetween 1972 and 1977 (Emanuelsson et al., 1993).The secular gains seem to be largest on tests not clearly related to school curricula and presumablymeasuring fluid intelligence (Cattell, 1987). On Raven’s Progressive Matrices and Raven-like tests,gains in the neighborhood of 18–20 IQ points in a generation seem to be quite typical in manyindustrialized countries (Flynn, 1999). Kenyan 7-year-old school children in a rural district showed themost dramatic gains on the Raven Progressive Matrices Test ever observed (estimated to be at least 0.8IQ points per year) over a 14-year period from 1984 to 1998 (Daley, Whaley, Sigman, Espinosa, &Neumann, 2003).At least in some countries, the secular gains seem to have been unevenly distributed over abilitylevels. Teasdale and Owen (1989, 2000) found that the secular gains mainly were caused by lowerprevalence of low scorers. In Britain, the same tendency was found for some tests (including Raven’sProgressive Matrices), but not for others (Lynn & Hampson, 1986). In other countries, the gains seem tobe evenly distributed cross ability levels (Flynn, 1998b).Detterman and Daniels (1989) reported smaller correlations between different tests in high-scoringthan in low-scoring groups, indicating that the secular gains may be accompanied by the changing factorstructure of intelligence test scores. Recently, Kane and Oakland (2000) reported that the testintercorrelations in the U.S. Wechsler tests were lower in more recent standardization samples relative toolder ones (time span 20–50 years). Indications of declining intercorrelations have also been found inFrance (Lynn & Cooper, 1993). Data on Danish conscripts (males) over a 10-year period showed onlysmall downward changes in the intercorrelation pattern (Teasdale & Owen, 2000).The main aim of the present paper is to report data on the secular trends of mean general intelligencetest scores and subtest scores of a large number of Norwegian (male) conscripts who were tested in theyears from 1954 to 2002, inclusive (birth cohorts c1935–1984). Changes in the distribution of testscores have also been scrutinized. In particular, we have looked for possible secular trends in thevariation pattern of test scores, and, whether the secular gains have been evenly distributed across abilitylevels. To investigate possible changes in the prominence of the g factor, we have studied secular trendsin the intercorrelations between tests.In addition, secular trends of standing height have been analyzed. Standing height is a useful indicatorof nutrition and health status, and secular gains in height have, to some extent, occurred in tandem withIQ gains (Martorell, 1998). Lynn (1990) has proposed that nutrition and health care improvements areamong the main causal factors of IQ gains. It seems clear that nutrition and health care factors may nottell the whole story. Thus, the Flynn effect has outlasted height gains by a decade or so in many countries

J.M. Sundet et al. / Intelligence 32 (2004) 349–362351(Martorell, 1998). Nevertheless, the nutrition–health care hypothesis deserves further attention. Otherpotential contributing factors, like more complex societies, increasing access to mass media, computergames, smaller families, changing rearing styles, to name some of the more prominent proposals, havebeen extensively discussed elsewhere (e.g., Neisser, 1998) and will not be further elaborated in thepresent paper.2. Methods and materialsIn Norway, military service is compulsory for every able young man. Before they actually enter theservice, the young men are required to meet before a draft board, where their medical and psychologicalsuitability, including intellectual ability, for military service is assessed. A great majority of the menmeeting before the draft board (about 95%) are examined between their 18th and 20th birthday. Thephysically or psychologically disabled are exempted from these investigations. In addition, seamen andothers being abroad at the normal conscript age are normally exempted.2.1. Test materialsThe draft board assessment of intellectual ability includes three speeded tests: Arithmetic (25 min),Word Similarities (8 min), and Figures (20 min). The Arithmetic test (30 items), presented in prose,purports to measure not only arithmetic and elementary algebraic ability but also logical reasoningability, and is quite similar to the Arithmetic test in WAIS. The contents of the Arithmetic test wereslightly modified in 1963. This change was mainly a modernization of some of the items, but thedifficulty seems to be about the same. In the mid 1990s, the Arithmetic test was changed from openended answers to multiple choice (five alternatives). The Word Similarities Test (akin to the VocabularyTest in WAIS) is a multiple-choice test (54 items). A key word is given, and the task is to find thesynonym among six alternatives. The Figures Test (36 multiple choice items with six or eightalternatives) was constructed to be very similar with the Raven Progressive Matrices, except that theRaven test is organized in groups, with increasingly difficult items within each group, whereas the itemsin the Figures Test was constructed to provide a linear increase in difficulty (which is also the case withregard to the other tests). The Word Similarities and Figure tests have remained unchanged since 1954.The test–retest reliabilities of Arithmetic, Figures, and Word Similarities tests as calculated from asample (Nc800) in the mid 1950s were .84, .72, and .90, respectively (Sundet, Tambs, Magnus, &Berg, 1988). The alpha coefficients of Arithmetic, Figures, and Word Similarities tests calculated for thedraft cohorts 1993–2002 were .81, .80, and .90, respectively. A general ability (GA) score is a combinedmeasure of the performance on the three tests seen together, obtained by transforming the raw scores in astandardization sample into normally distributed F scores (M 50, S.D. 20). The F scores are added andsubsequently transformed into stanine scores. In a small sample (N 48), the correlation between GA andthe WAIS IQ has been found to be .73 (cf. Sundet et al., 1988).2.2. Participants and dataWe have retrieved intelligence test scores from several data sets. One data set comprised GA scoresfor the draft cohorts from 1969 through 2001 (GA data for 1957–1959 and 2002 have been calculated

352J.M. Sundet et al. / Intelligence 32 (2004) 349–362from raw scores in the other data set), but no data on the separate tests. Altogether, we have accessed GAdata on more than 960,000 young men over a period of 45 years. Mostly due to a greater number ofseamen, the proportions having intelligence test scores were somewhat lower in the older cohorts. Theaverage proportion over all cohorts was 0.85 (S.D. .07).In this file, standing height data (measured as part of the medical examination) was also available.Height data for other relevant cohorts have been compiled from the Statistical Yearbook published byStatistics Norway.Data on the separate tests have been retrieved for the draft cohorts 1957 through 1959 and 1993through 2002 (a small number from 2003 was included in the 2002 cohort). Altogether, subtest datawas available for approximately 210,000 conscripts. In the 1957–1959 draft cohorts test data wereavailable for 80–85% (about 52,000) of the men eligible for drafting these years. Data for the draftcohorts 1993–2002 comprised about 55% (Nc158.000) of the young men appearing before the draftboard during this period. In Norway, the drafting is done separately in several regions, and data fromall the regions for all the cohorts was not possible to retrieve. In the 1993 and 1994 cohorts, therewere data sets from three and five out of seven regions, respectively. Data from 1995–1997, inclusive,stemmed from four regions. Data sets from the whole country were available for the cohorts from1998 through 2000, and for the two most recent draft cohorts, data sets from six and four regions,respectively, were retrieved. On the average, about 85% of the cohorts from the relevant regions hadintelligence test data.2.3. Scales and normsThe data sets from 1957 and 1993 through 2002 contained raw scores on the tests. In the 1957–1959data, GA in stanine scores (M 5, S.D. 2) according to the 1954 norms on all three tests were available.The GA scores in the 1969–2001 draft cohorts were also given on a stanine scale. In the present paper,scores have often been transformed into IQ equivalents (M 100, S.D. 15).In the analyses of data, both raw score statistics and statistics on scaled scores have been used. Whenscaled scores have been analyzed, the test norms from 1954 have been used as the reference point forcomparisons. In all the analyses applying raw scores (either in the analyses of the raw scores themselvesor as basis for scaling), zero scorers have been excluded. Raw scores on the Word Similarities andFigures Tests were scaled according to the 1954 norms directly. In connection with the modernization ofthe contents of the Arithmetic test in 1963, stricter norms on this test were introduced. We had accessonly to the 1963 norms for this particular test. Flynn (1987, p. 174–175), who included Norwegian data(cf. Rist, 1982) in his review of Western intelligence test scores, estimated that this norm changecorresponded to 7.5 IQ points. This is probably a slight overcorrection. In the 1957 data set, wecalculated GA in stanine units from the raw scores according to the 1963 norms for the Arithmetic testand the 1954 norms on the other two tests (Figures and Word Similarities). The GA scores calculated inthis manner were compared with the GA scores according to the 1954 data on all three tests. Thedifference between the mean GA scores was 0.28 stanine points (2.1 IQ points), corresponding to 6.3 IQpoints on the Arithmetic test. The scaled scores on the Arithmetic test have accordingly been elevatedby this amount in the data sets from 1957 and 1993–2002. In the data set comprising the 1969–2001draft cohorts, GA scores were corrected upwards by 2.1 IQ points.In the 1958 and 1959 draft cohorts, the scores on each test were given on an 11-point scale accordingto norms from 1961, which have been unavailable to us. The 1954 score equivalents were calculated by

J.M. Sundet et al. / Intelligence 32 (2004) 349–362353taking the differences between the 1954- and 1961-normed scores in the 1957 data set. These differenceswere added to the means in the 1958 and 1959 data sets.The test battery was restandardized in 1980 on the basis of data from 1974 (Storsve, 1975). The meanGA scores according to the new norms were lowered 1.02 stanine points relative to the 1954/1963norms. In the 1969–2001 draft cohorts, where only GA scores were available, we have corrected theseupwards by 7.5 IQ points for the draft cohorts from 1980 and later.3. ResultsFig. 1 displays the secular trends in GA mean scores (IQ units) from 1954 (IQ mean set to 100)through 2002.The GA means seem to have increased more or less linearly from the 1954 to the 1969 draftcohort. In this period, the gain in mean GA was 8.6 IQ points, corresponding to an average gain ofnearly 0.6 IQ points each year. From 1970 to 1976, inclusive, the gain was 1.4 IQ points(approximately 0.2/year). From 1978 to the beginning of the 1980s, there was a remarkable declinein the mean GA scores, corresponding roughly to 1.2 IQ points, which is almost the whole increaseduring the period 1970–1976. From the beginning of the 1980s to the mid 1990s there was a moreor less steady increase in the GA means amounting to approximately 3 IQ points, corresponding toa gain of about 0.2 IQ points each year. From the mid 1990s or so, the GA means were decliningagain. During the whole period 1954–2002, the gain has been 10.8 IQ points, or an average yearlyincrease of 0.23 IQ points, which is on the low side compared with the 0.3 gain rate found inmany other industrialized countries.Fig. 1. GA of Norwegian conscripts (in IQ score units) by year of testing [data for 1965 and 1968 (corrected) have been adoptedfrom Flynn, 1987, Table 4].

354J.M. Sundet et al. / Intelligence 32 (2004) 349–362The time trends in raw scores (means, standard deviations, skewness, and kurtosis) on the Arithmetic,Figures, and Word Similarities for the draft cohorts 1957, 1968, 1974, 1977, and 1993 through 2002 areshown in Table 1. The data for 1968 and 1977 are from Rist (1982; cf. Flynn, 1987) and the 1974 dataare from Storsve (1975).The mean raw scores on the Arithmetic test increased steadily from 1957 to about the end ofthe 1960s. Keeping in mind that the content of Arithmetic test was slightly altered in 1963, thechange in mean scores between the 1957 and 1968 draft cohort (1.3 raw score points) may beinterpreted as an increase in arithmetic ability in this period. A peculiar and substantial drop of themeans occurred from about 1968, followed by slight gains from the late 1970s to the early 1990s.A more or less steady decline of the Arithmetic mean scores was apparent in the draft cohorts1993–2002. Notably, the highest mean was in the 1968 draft cohort. The gains in Word Similaritiesmean scores seemed to last until the mid 1970s. In the period from the mid 1970s to the early1990s, the means remained more or less the same, followed by quite steady decline until 2002. Themean scores on the Figures Test behaved differently. There was a quite steady increase in meanscores on this test until the late 1990s. No systematic changes were apparent from the late 1990s to2002. Possible ceiling effects (see below) may have depressed the means of the Figures Test tosome extent.To get a clearer picture of the relative changes of the three tests, the raw scores have been transformedto F scores and then to IQ scores (Fig. 2). The 1954 norms have been used as reference in Fig. 2,allowing the inclusion of data from the 1958 and 1959 cohorts. Only summary statistics on the rawscores were available for the test years 1968, 1974, and 1977. For these years, data from Flynn (1987,Table 4) with corrected Arithmetic scores was adopted. An extra bonus was the inclusion of data from1963 (on Arithmetic) and from 1980, not available otherwise.Table 1Mean, standard deviation, skewness, and kurtosis of the raw scores on the Arithmetic, Figures, and Word Similarities tests byyear of 39.45.40abFiguresFrom Rist (1982).From Storsve (1975).Word SimilaritiesnMS.D.16 792788550271814 75216 99011 57112 80713 11025 34221 94116 64411 73313 56.3916 247788551371814 75517 00611 57612 80113 11425 34621 95316 64011 73813 34.24.25.29.28.31n16 453788551071814 75017 00011 54612 79513 11325 33321 94716 64011 72613 409

J.M. Sundet et al. / Intelligence 32 (2004) 349–362355Fig. 2. Scores on each of the three ability tests of Norwegian conscripts (in IQ units) by year of testing [the data points for 1963,1968, 1974, 1977, and 1980 (with corrections of the Arithmetic scores) have been adopted from Flynn, 1987, Table 4].Fig. 2 confirms the impression that all the test means increased more or less in tandem until thelate 1960s–early 1970s (test years). Thus, the relatively large gains on the GA means in this period(approximately 0.6 per year, cf. Fig. 1) were due to gains on all the tests. It seems also clear thatthe decrease in the GA means from the mid 1970s to the early 1980s is mainly caused by thedecreasing means on the Arithmetic test, together with slightly declining mean scores on the WordSimilarities Test from 1974 to 1980. The increase of GA means in the draft cohorts from the early1980s to the mid 1990s was almost exclusively due to gains on the Figures Test (with dueconsideration of missing information in the years between). The decline of the GA means from themid 1990s was apparently mainly caused by a relatively sharp decline in both the Arithmetic andWord Similarities means. During the whole period from 1954 to 2002, the gains on the Arithmetic,Word Similarities, and Figures Tests were 2.5 (0.05 IQ points per year), 9.6 (0.2 IQ points peryear), and 17 (0.35 IQ points per year) IQ points, respectively. The general picture seems to be aquite steady but decreasing gain rate of the means on the Figures Test until the mid to late 1990s,but no gains later. The mean gains on the Arithmetic test scores came to a more or less completestop somewhere in the 1960s, followed by a period of decline until about 1980. Small gains until1993 were followed by a period of more or less steady decline. The Word Similarities Test showedsteady mean gains until the mid 1970s, followed by a small decline until about 1980. The meanson this test were quite unchanged from about 1980 to 1993, and declined for the rest of theobservation period.Mean changes were accompanied by changes in standard deviations, skewness, and kurtosis(Table 1). The scores on all three tests showed decreasing trends in the standard deviations in the1993–2002 draft cohorts relative to the 1957 cohort (Table 1). The standard deviation in the 1993–2002 cohorts was about 85% of the standard deviation on the Arithmetic test. The correspondingnumbers for Figures and the Word Similarities were 73% and 75%, respectively. The scoredistributions on the Figures and the Word Similarities tests both showed increasing skewness fromthe 1957 to the 1993–2002 conscript cohorts. These changes may either be due to ceiling effects or

356J.M. Sundet et al. / Intelligence 32 (2004) 349–362real-world changes of the score distributions, or both. All the means were larger than 50% of themaximum obtainable scores (15, 18, and 27 for the Arithmetic, Figures, and Word Similarities,respectively). The mean of the Arithmetic test did not increase much (from 17.5 in the 1957 cohortto 17.9 in the 1993–2002 cohorts). The means of the 1968 and 1974 cohorts were higher than the1993–2002 mean, and so were the standard deviations (Table 1). Similarly, the means of the WordSimilarities scores in the 1974 and 1977 cohorts were of the same order of magnitude as in the1993–2002 cohorts, but the standard deviations were larger. This pattern does not indicateappreciable ceiling effects in the 1993–2002 cohorts relative to the 1957 cohort with regard to theArithmetic and Word Similarities tests. The mean score on the Figures Test in the 1957 cohort was66% of the maximum score, increasing to 72–73% in the 1968–1977 cohorts and 77.5% in the1993–2002 cohorts, and increasing means seem to be associated with decreasing standard deviations(Table 1). Seen in isolation, this pattern may indicate ceiling effects in the more recent cohortsrelative to the older ones. On the other hand, the relative decreases of the standard deviations of theFigures scores were of the same order of magnitude as for the Word Similarities scores, indicatingthat, at least, some of the standard deviation changes of the Figures standard deviations are due toreal-world distribution changes.We have addressed the question concerning whether the mean changes have been unevenlydistributed over different ability levels by calculating the mean scores below and above the medianof the 1957–1959 (only 1957 for raw scores) and the 1993–2002 groups and studied the differencebetween corresponding means in the two cohort groups. The results of this analysis are shown inTable 2.It is clear from Table 2 that the gains over cohort groups tended to be largest below the median.For instance, with regard to the Word Similarities Test, the difference between the mean scores ofthe 1993–2002 and the 1957–1959 draft cohorts below the median was about 11 IQ points (8.5 rawscore points). Above the median, the corresponding difference was about 3 IQ points (2.4 raw scorepoints). The corresponding differences also decreased on the Figures Test, but not to the sameextent (about 19 IQ points below the median and 12 IQ points above the median). Due to possibleceiling effects on this particular test, the gains in means above the median may, to some extent,have been suppressed. With regard to the Arithmetic test, there was a gain in the mean scoresbelow the median, and a slight loss in the above median mean scores in the 1993–2002 cohortsrelative to the 1957–1959 cohorts, indicating a lower prevalence of both low and high scorers in themore recent cohorts.Table 2Mean scores in IQ units and raw scores (in brackets) below and above the median for the Arithmetic, Figures, and WordSimilarities tests in the 1957–1959 (1957) and 1993–2002 draft cohortsTest year1957–19591993–2002DifferenceMean scores below medianMean scores above medianArithFiguresWord simArithFiguresWord Sim90.7 (13.5)93.6 (14.3)2.9 (0.8)87.4 (19.3)106.5 (25.1)19.1 (5.8)90.6 (17.6)101.5 (26.1)10.9 (8.5)115.8 (22.5)114.5 (21.9)1.3 ( 0.6)114.9 (27.6)126.3 (30.8)11.4 (3.2)115.9 (37.6)119.0 (40.7)3.1 (3.1)Arith arithmetic.Word Sim word similarities.

J.M. Sundet et al. / Intelligence 32 (2004) 349–362357Fig. 3. Mean standing height and mean GA (both in z scores units 5) by year of testing.The mean standing height among Norwegian conscripts increased from about 176.5 cm from the mid1950s (Statistical Yearbook, 1960) to about 178.8 cm in 1969. The height means further increased toabout 179.6 cm in 1987. From about 1987 or so, no systematic change was apparent. Standard deviationsof the height distributions increased slightly in the draft cohorts 1969–2002. Fig. 3 shows the means ofstanding height and GA (both transformed to z scores in an aggregated file, and added 5 to removenegative numbers) according to draft year.It can be seen from Fig. 3 that the mean standing height and mean GA follow each other quite closely.The within-person correlation between standing height and GA showed a continuous decrease from .17to .14 in the period 1969–2002.Fig. 4. Mean height below and above the median height by year of testing.

358J.M. Sundet et al. / Intelligence 32 (2004) 349–362Table 3Intercorrelations between the test scores in IQ units and raw scores (in brackets) in the 1957– 1959 (1957) draft cohorts (lowertriangle) and in the 1993–2002 cohorts (upper triangle)(1)(2)(3)(1) Arith(2) Fig(3) Word S–.64 (.68).73 (.75).54 (.56)–.61 (.64).57 (.60).48 (.49)–Arith arithmetics.Fig figures.Word S word similarities.Fig. 4 displays the change trends above and below the median height relative to the 1969 draft cohort.The gains above the median were at least as large as the gains below the median in all the cohorts from1969 to 2002.The intercorrelations between the tests in raw scores and IQ units are displayed in Table 3. It is seenthat the correlations were somewhat lower in the 1993–2003 than in the 1957–1959 (1957) draft cohorts.The intercorrelations (raw scores) reported by Storsve (1975) were about midway between thecorrelations shown in Table 3.The possible range restrictions of the Figures scores complicate the interpretation of the decliningintercorrelations. Assuming that range restrictions are the main cause of the declines, the largest declineshould be observed in the correlations between Figures and the other two tests, whereas the Arithmetic–Word Similarities correlation should remain approximately the same. This is not in accordance with theobserved declines seen in Table 3. Thus, the correlation between the Arithmetic and Word Similarities inthe 1993–2002 cohort declined by 22% (20% in the raw scores) relative to the 1957–1959 (1957) cohorts,whereas the correlation decline was 16% (18% in the raw scores) between Arithmetic and Figures. Theobserved decline in the correlation between Figures and Word Similarities scores was 21% on the IQ-unitscale and 24% on the raw score scale. This pattern indicates that most of the observed declines in theintercorrelations are real-world declines, entailing a less pronounced g factor in the more recent cohorts.4. DiscussionThe tests used to assess the intellectual ability among Norwegian conscripts are representative ofsubtests regularly included in standard intelligence tests. Thus, the mathematics and language tests aresimilar to the subtests in WAIS, and the nonverbal test was explicitly constructed to be similar with theRaven Progressive Matrices Test. In the Cattell (1987) system, the first two tests measure crystallizedintelligence, whereas the last one measures fluid intelligence. The GA score, obtained by combining thescores on the three tests, correlates substantially (in the .70s) with the WAIS IQ scores. The GA scoresreported in the present paper are thus quite comparable with the scores obtained on standard intelligencetests.Test scores have been attained from very large samples comprising a large proportion of Norwegian(male) conscripts. The selection due to the exclusion of physically or psychologically disabled hasprobably been more or less the same from year to year and is not likely to affect the observed seculartrends in intelligence test scores.

J.M. Sundet et al. / Intelligence 32 (2004) 349–362359The potentially most serious threat to the present results seems to be the possible ceiling effects on theRaven-like Figures Test that may have suppressed the means and standard deviations of the scores innewer cohorts relative to older ones.Earlier reports have monitored secular trends up to the birth cohorts around 1980 (Colom, JuanEspinosa, & Garciá, 2001; Emanuelsson et al., 1993; Teasdale & Owen, 2000). The present resultsextend the time window to the birth cohorts in the mid 1980s. The mean scores of all the three tests showquite substantial gain rates in the conscript cohorts from the mid 1950s to the late 1960s–mid 1970s (Fig.2 in the present paper, and Table 4, Flynn, 1987), and the mean GA scores increased accordingly in thisperiod (Fig. 1).The peculiar drop in Arithmetic mean scores in the conscript cohorts from about the end1960s to about mid 1970s (Fig. 2 and Table 1) was so extensive that it caused a drop in theGA scores (Fig. 1). Rist (1982) convincingly argued that this particular decrease is due toteaching of modern mathematics (more algebra) at the expense of training in arithmeticoperations. This program was terminated after a few years. Despite some gains from about 1975to 1993, the means on the Arithmetic test never quite reached the 1968 level. The means of theWord Similariti

The end of the Flynn effect? A study of secular trends in mean intelligence test scores of Norwegian conscripts during half a century Jon Martin Sundeta,*, Dag G. Barlaugb, Tore M. Torjussenb aInstitute of Psychology, University of Oslo, P.O. Box 1094, Blindern N-317 Oslo, Norway bPsychological Servi