d_g11

: HH16fesd; Spoken discussion on problems between kids/people. The text was elicited during the first session held with the subject, immediately after the written text was collected. The investigator sat opposite the subject, said subject's name into the tape-recorder, then said: "Now I would like you to talk about the subject (problems between people). Discuss it and present your ideas as if you are standing in front of a class. Do not tell a story but discuss the subject. You can take some time to think, and when you are ready, start speaking". If subject asked questions like: "Must I give my opinion?", "Should I give reasons why it's not okay?", Should I say what I think about such things?", The response was: "You can do that too / as well". There was no dialogue or interaction during tape- recording. In case when subjects asked questions during tape-recording, the investigator shruged or nodded, or repeated the instructions given. The recording took place in the morning, in a separate classroom at the school., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and the investigator, both during text production and in the preceding instructions. The questionnaire that followed text elicitation was written and answered in Hebrew., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Expository', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second part of the first meeting with the subject, who was asked to talk about problems between kids/people. The recording took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. After the subject wrote about the subject, s/he was asked to give a talk discussing the topic of problems between people, and to present his/her ideas as if standing in front of a class. The subject was asked not to tell a story but to discuss the topic. The subject was given time to think and was told to start speaking when ready. After the subject began talking and the tape-recorder was switched on, there was no further verbal interaction, questions were answered by a nod, a shrug or by repeating the instructions. The recorded text was transcribed and coded according to CHAT format., H16 was the only participant in the session. The investigator participates only at the beginning of the recording, to give instructions., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the tape-recorded spoken text. This version contains general information about the subject, and a broad phonemic transcription (in Roman characters) of what the subject said, including disfluencies (pauses, hesitation phenomena, prosodic information, false starts, repetitions) and interchanges between the subject and the investigator. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. Any exchange between the subject and the investigator is set apart from the main body of the text, with the general header SBJ marking the subject's tier and the general header INV marking the investigator's tier. The phonemic transcription follows both the CHAT conventions (MacWhinney, 1995) as well as special conventions formulated by the project for transcribing both spoken and written Hebrew texts to make them accessible to cross-linguistic analysis. The text itself is divided into clauses based on criteria specified in Berman & Slobin (1994: 660-663), who defined a clause as a unified predicate including aspectual and modal modifications. Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized version of the spoken text, stripped of disfluencies. This version contains a broad phonemic transcription (in Roman characters) of what the subject said, but omits and in some cases corrects or standardizes deviations from normative linguistic form and use, such as hesitations, false starts, repairs, etc.. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., The media file is not available to the public., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and the investigator, both during text production and in the preceding instructions. The questionnaire that followed text elicitation was written and answered in Hebrew., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Expository', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second part of the first meeting with the subject, who was asked to talk about problems between kids/people. The recording took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. After the subject wrote about the subject, s/he was asked to give a talk discussing the topic of problems between people, and to present his/her ideas as if standing in front of a class. The subject was asked not to tell a story but to discuss the topic. The subject was given time to think and was told to start speaking when ready. After the subject began talking and the tape-recorder was switched on, there was no further verbal interaction, questions were answered by a nod, a shrug or by repeating the instructions. The recorded text was transcribed and coded according to CHAT format., H16 was the only participant in the session. The investigator participates only at the beginning of the recording, to give instructions., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the tape-recorded spoken text. This version contains general information about the subject, and a broad phonemic transcription (in Roman characters) of what the subject said, including disfluencies (pauses, hesitation phenomena, prosodic information, false starts, repetitions) and interchanges between the subject and the investigator. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. Any exchange between the subject and the investigator is set apart from the main body of the text, with the general header SBJ marking the subject's tier and the general header INV marking the investigator's tier. The phonemic transcription follows both the CHAT conventions (MacWhinney, 1995) as well as special conventions formulated by the project for transcribing both spoken and written Hebrew texts to make them accessible to cross-linguistic analysis. The text itself is divided into clauses based on criteria specified in Berman & Slobin (1994: 660-663), who defined a clause as a unified predicate including aspectual and modal modifications. Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized version of the spoken text, stripped of disfluencies. This version contains a broad phonemic transcription (in Roman characters) of what the subject said, but omits and in some cases corrects or standardizes deviations from normative linguistic form and use, such as hesitations, false starts, repairs, etc.. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum.

: HH16fewd; Written discussion on problems between kids/people. The text was elicited during the first session held with the subject. At the start of the session, the investigator gave a general introduction, saying: "We are conducting international research for the university. We are collecting material for this, and you will be asked to talk and also to write. I hope you don't mind if we record you/ being recorded/ if it is recorded." Following this, the investigator said: "Now I am going to ask you to speak and to write about situations of conflict / problems between people / problems in school (life) / different kinds of predicaments.First I will show you a short video that was filmed in a school to give you an idea of what I mean / It will give you an idea what I mean." These instructions were followed by the investigator and subject both watching together a three-minute long wordless video that showed vignette scenes of conflict (moral, social, and physical) between young people in a culturally-unidentifiable school setting. The investigator then sat down next to the subject, and said: "You have seen different kinds of problems / conflicts / situations / predicaments in the video. We would like to know what children / people think about this subject. We are making a collection / are collecting compositions / essays on the topic of problems between people / in school life. So I would like you to write about the subject . Discuss the topic, and present your ideas in writing. Do not write a story, but a composition. You can take some time to think, and then start writing." If the subject asked: 'Must I give my opinion?', "Should I give reasons why it's not okay?", Should I write what I think about such things?", "Should I write what I feel about it?" the response was: "You can do that too / as well". Subjects were made to feel that they could include such ideas and information but that this was not exactly what was asked of them, that was not all they should write. If the subject asked "Can I take notes?'", the response was: 'If you like'. There was no interaction during writing. In cases where subjects asked questions, the investigator shrugged or nodded, or simply repeated the instructions given earlier. The elicitation of the written text took place in the morning, in a separate classroom at the school., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions. The questionnaire that followed text elicitation was written and answered in Hebrew., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Expository', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the first meeting with the subject, who was asked to write an essay discussing problems between kids/people. The meeting took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. The subject was then asked to write about problems between people, discuss them and present his/her ideas. The subject was asked not to write a story but to discuss the topic. The subject was then given time to think and was requested to start writing when ready. After the subject began writing there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The written text was transcribed and coded according to CHAT format., H16 was the only participant in the session., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the written Hebrew text. This version contains general information about the subject, and a computerized replica of the subject's text which was handwritten in Hebrew characters. This mirror version (also in Hebrew orthography) of what the subject wrote includes: spelling errors, erasures, repairs and other forms of editing, as well as the original punctuation, page layout, division into lines and paragraphs, headings, underlinings, and special characters as in the original written text., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized or stripped version of the written text. This version contains a broad phonemic transcription (in Roman characters) of what the subject wrote, but omits, and in some cases corrects or standardizes, any deviations from normative linguistic form and use, such as spelling mistakes. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., The original written text is not available to the public., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions. The questionnaire that followed text elicitation was written and answered in Hebrew., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Expository', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the first meeting with the subject, who was asked to write an essay discussing problems between kids/people. The meeting took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. The subject was then asked to write about problems between people, discuss them and present his/her ideas. The subject was asked not to write a story but to discuss the topic. The subject was then given time to think and was requested to start writing when ready. After the subject began writing there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The written text was transcribed and coded according to CHAT format., H16 was the only participant in the session., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the written Hebrew text. This version contains general information about the subject, and a computerized replica of the subject's text which was handwritten in Hebrew characters. This mirror version (also in Hebrew orthography) of what the subject wrote includes: spelling errors, erasures, repairs and other forms of editing, as well as the original punctuation, page layout, division into lines and paragraphs, headings, underlinings, and special characters as in the original written text., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized or stripped version of the written text. This version contains a broad phonemic transcription (in Roman characters) of what the subject wrote, but omits, and in some cases corrects or standardizes, any deviations from normative linguistic form and use, such as spelling mistakes. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum.

: HH16fnsd; Spoken personal-experience narrative about a situation of interpersonal conflict with other kids/people. The text was elicited during the second part of the second session held with the subject, immediately following elicitation of the written text. After the subject finished writing, the investigator sat next to the subject, and said: "Now I would like you to please tell me the story." If the subject asked questions like: 'Should I tell the same story?' the response was: 'Talk about the same incident'. The subject was again asked not to tell what s/he saw in the video, but to tell a (personal) story about something that s/he experienced. The subject was then given time to think before speaking. If the subject said something like: "But I never had any such problem", or "Should I talk about problems that were shown in the video?", the response was to repeat the instructions. There was no dialogue or conversational interaction during recording. If subjects asked questions during the recording, the investigator shrugged or nodded, or repeated the instructions she had given before. The recording took place in the morning, in a separate classroom at the school, at most three days after the first session., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Narrative', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second half of the second meeting with the subject, who tells a personal story about a problem s/he had with other kids/people. The meeting took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. After the subject was first asked to write a story about a personal problem s/he encountered with other people, s/he was then asked to tell the story. The subject was asked not to tell what s/he saw in the video, but to describe a personal experience. The subject was given time to think and was requested to start speaking when ready. After the subject began talking there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The spoken text was transcribed and coded according to CHAT format., H16 was the only participant in the session. The investigator only appears at the beginning of the recording, giving instructions., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the tape-recorded spoken text. This version contains general information about the subject, and a broad phonemic transcription (in Roman characters) of what the subject said, including disfluencies (pauses, hesitation phenomena, prosodic information, false starts, repetitions) and interchanges between the subject and the investigator. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. Any exchange between the subject and the investigator is set apart from the main body of the text, with the general header SBJ marking the subject's tier and the general header INV marking the investigator's tier. The phonemic transcription follows both the CHAT conventions (MacWhinney, 1995) as well as special conventions formulated by the project for transcribing both spoken and written Hebrew texts to make them accessible to cross-linguistic analysis. The text itself is divided into clauses based on criteria specified in Berman & Slobin (1994: 660-663), who defined a clause as a unified predicate including aspectual and modal modifications. Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized version of the spoken text, stripped of disfluencies. This version contains a broad phonemic transcription (in Roman characters) of what the subject said, but omits and in some cases corrects or standardizes deviations from normative linguistic form and use, such as hesitations, false starts, repairs, etc.. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., The media file is not available to the public., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Narrative', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second half of the second meeting with the subject, who tells a personal story about a problem s/he had with other kids/people. The meeting took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. After the subject was first asked to write a story about a personal problem s/he encountered with other people, s/he was then asked to tell the story. The subject was asked not to tell what s/he saw in the video, but to describe a personal experience. The subject was given time to think and was requested to start speaking when ready. After the subject began talking there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The spoken text was transcribed and coded according to CHAT format., H16 was the only participant in the session. The investigator only appears at the beginning of the recording, giving instructions., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the tape-recorded spoken text. This version contains general information about the subject, and a broad phonemic transcription (in Roman characters) of what the subject said, including disfluencies (pauses, hesitation phenomena, prosodic information, false starts, repetitions) and interchanges between the subject and the investigator. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. Any exchange between the subject and the investigator is set apart from the main body of the text, with the general header SBJ marking the subject's tier and the general header INV marking the investigator's tier. The phonemic transcription follows both the CHAT conventions (MacWhinney, 1995) as well as special conventions formulated by the project for transcribing both spoken and written Hebrew texts to make them accessible to cross-linguistic analysis. The text itself is divided into clauses based on criteria specified in Berman & Slobin (1994: 660-663), who defined a clause as a unified predicate including aspectual and modal modifications. Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized version of the spoken text, stripped of disfluencies. This version contains a broad phonemic transcription (in Roman characters) of what the subject said, but omits and in some cases corrects or standardizes deviations from normative linguistic form and use, such as hesitations, false starts, repairs, etc.. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum.

: HH16fnwd; Written personal-experience narrative about a situation of interpersonal conflict with other kids/people. The text was elicited during the second session held with the subject. The investigator sat opposite the subject, and said: "Yesterday / a couple of days ago / a little while ago / before lunch you saw a video that showed different kinds of problems between people / in school life / situations where people do not agree. We are making a collection of stories about problems between people / problems in school life / conflicts / situations / predicaments. So, I would like you to write about a time / an incident when you had / encountered a problem with someone. Don't write about what you saw in the video, write a (personal) story about something that happened to you / something you experienced. You can take your time." If the subject asked questions like: "But I never had any such problem", or "Should I talk about problems that were shown in the video?", the response was to repeat the instructions. If the subject asked: 'Can I take notes, make a draft?' the response was: 'Yes, if you like'. If the subject asked: 'How long should it be?' the response was: 'Whatever you like'. The investigator then said: "Let me know when you are done" and left the room, or else remained in the room, engaged in some other activity. There was no dialogue or interaction during writing. In case when subjects asked questions, the investigator shruged or nodded, or repeated the instructions given. The session took place in the morning, in a separate classroom at the school, at most three days after the first session., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Narrative', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second meeting with the subject, who writes a personal story about a problem s/he had with other kids/people. The meeting took place at school, one day after the first meeting. The topic was presented by the investigator. Since this was the second meeting, the video was recalled (not shown again) at the beginning of the session, and the subject was asked to write a story about a personal problem s/he encountered with other people. The subject was asked not to write about what s/he saw in the video, but to describe a personal experience. The subject was given time to think and was requested to start writing when ready. After the subject began writing there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The written text was transcribed and coded according to CHAT format., H16 was the only participant in the session., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the written Hebrew text. This version contains general information about the subject, and a computerized replica of the subject's text which was handwritten in Hebrew characters. This mirror version (also in Hebrew orthography) of what the subject wrote includes: spelling errors, erasures, repairs and other forms of editing, as well as the original punctuation, page layout, division into lines and paragraphs, headings, underlinings, and special characters as in the original written text., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized or stripped version of the written text. This version contains a broad phonemic transcription (in Roman characters) of what the subject wrote, but omits, and in some cases corrects or standardizes, any deviations from normative linguistic form and use, such as spelling mistakes. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., The original written text is not available to the public., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Narrative', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second meeting with the subject, who writes a personal story about a problem s/he had with other kids/people. The meeting took place at school, one day after the first meeting. The topic was presented by the investigator. Since this was the second meeting, the video was recalled (not shown again) at the beginning of the session, and the subject was asked to write a story about a personal problem s/he encountered with other people. The subject was asked not to write about what s/he saw in the video, but to describe a personal experience. The subject was given time to think and was requested to start writing when ready. After the subject began writing there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The written text was transcribed and coded according to CHAT format., H16 was the only participant in the session., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the written Hebrew text. This version contains general information about the subject, and a computerized replica of the subject's text which was handwritten in Hebrew characters. This mirror version (also in Hebrew orthography) of what the subject wrote includes: spelling errors, erasures, repairs and other forms of editing, as well as the original punctuation, page layout, division into lines and paragraphs, headings, underlinings, and special characters as in the original written text., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized or stripped version of the written text. This version contains a broad phonemic transcription (in Roman characters) of what the subject wrote, but omits, and in some cases corrects or standardizes, any deviations from normative linguistic form and use, such as spelling mistakes. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum.

: HH18mesd; Spoken discussion on problems between kids/people. The text was elicited during the first session held with the subject, immediately after the written text was collected. The investigator sat opposite the subject, said subject's name into the tape-recorder, then said: "Now I would like you to talk about the subject (problems between people). Discuss it and present your ideas as if you are standing in front of a class. Do not tell a story but discuss the subject. You can take some time to think, and when you are ready, start speaking". If subject asked questions like: "Must I give my opinion?", "Should I give reasons why it's not okay?", Should I say what I think about such things?", The response was: "You can do that too / as well". There was no dialogue or interaction during tape- recording. In case when subjects asked questions during tape-recording, the investigator shruged or nodded, or repeated the instructions given. The recording took place in the morning, in a separate classroom at the school., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and the investigator, both during text production and in the preceding instructions. The questionnaire that followed text elicitation was written and answered in Hebrew., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Expository', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second part of the first meeting with the subject, who was asked to talk about problems between kids/people. The recording took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. After the subject wrote about the subject, s/he was asked to give a talk discussing the topic of problems between people, and to present his/her ideas as if standing in front of a class. The subject was asked not to tell a story but to discuss the topic. The subject was given time to think and was told to start speaking when ready. After the subject began talking and the tape-recorder was switched on, there was no further verbal interaction, questions were answered by a nod, a shrug or by repeating the instructions. The recorded text was transcribed and coded according to CHAT format., H18 was the only participant in the session. The investigator participates only at the beginning of the recording, to give instructions., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the tape-recorded spoken text. This version contains general information about the subject, and a broad phonemic transcription (in Roman characters) of what the subject said, including disfluencies (pauses, hesitation phenomena, prosodic information, false starts, repetitions) and interchanges between the subject and the investigator. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. Any exchange between the subject and the investigator is set apart from the main body of the text, with the general header SBJ marking the subject's tier and the general header INV marking the investigator's tier. The phonemic transcription follows both the CHAT conventions (MacWhinney, 1995) as well as special conventions formulated by the project for transcribing both spoken and written Hebrew texts to make them accessible to cross-linguistic analysis. The text itself is divided into clauses based on criteria specified in Berman & Slobin (1994: 660-663), who defined a clause as a unified predicate including aspectual and modal modifications. Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized version of the spoken text, stripped of disfluencies. This version contains a broad phonemic transcription (in Roman characters) of what the subject said, but omits and in some cases corrects or standardizes deviations from normative linguistic form and use, such as hesitations, false starts, repairs, etc.. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., The media file is not available to the public., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and the investigator, both during text production and in the preceding instructions. The questionnaire that followed text elicitation was written and answered in Hebrew., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Expository', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second part of the first meeting with the subject, who was asked to talk about problems between kids/people. The recording took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. After the subject wrote about the subject, s/he was asked to give a talk discussing the topic of problems between people, and to present his/her ideas as if standing in front of a class. The subject was asked not to tell a story but to discuss the topic. The subject was given time to think and was told to start speaking when ready. After the subject began talking and the tape-recorder was switched on, there was no further verbal interaction, questions were answered by a nod, a shrug or by repeating the instructions. The recorded text was transcribed and coded according to CHAT format., H18 was the only participant in the session. The investigator participates only at the beginning of the recording, to give instructions., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the tape-recorded spoken text. This version contains general information about the subject, and a broad phonemic transcription (in Roman characters) of what the subject said, including disfluencies (pauses, hesitation phenomena, prosodic information, false starts, repetitions) and interchanges between the subject and the investigator. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. Any exchange between the subject and the investigator is set apart from the main body of the text, with the general header SBJ marking the subject's tier and the general header INV marking the investigator's tier. The phonemic transcription follows both the CHAT conventions (MacWhinney, 1995) as well as special conventions formulated by the project for transcribing both spoken and written Hebrew texts to make them accessible to cross-linguistic analysis. The text itself is divided into clauses based on criteria specified in Berman & Slobin (1994: 660-663), who defined a clause as a unified predicate including aspectual and modal modifications. Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized version of the spoken text, stripped of disfluencies. This version contains a broad phonemic transcription (in Roman characters) of what the subject said, but omits and in some cases corrects or standardizes deviations from normative linguistic form and use, such as hesitations, false starts, repairs, etc.. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum.

: HH18mewd; Written discussion on problems between kids/people. The text was elicited during the first session held with the subject. At the start of the session, the investigator gave a general introduction, saying: "We are conducting international research for the university. We are collecting material for this, and you will be asked to talk and also to write. I hope you don't mind if we record you/ being recorded/ if it is recorded." Following this, the investigator said: "Now I am going to ask you to speak and to write about situations of conflict / problems between people / problems in school (life) / different kinds of predicaments.First I will show you a short video that was filmed in a school to give you an idea of what I mean / It will give you an idea what I mean." These instructions were followed by the investigator and subject both watching together a three-minute long wordless video that showed vignette scenes of conflict (moral, social, and physical) between young people in a culturally-unidentifiable school setting. The investigator then sat down next to the subject, and said: "You have seen different kinds of problems / conflicts / situations / predicaments in the video. We would like to know what children / people think about this subject. We are making a collection / are collecting compositions / essays on the topic of problems between people / in school life. So I would like you to write about the subject . Discuss the topic, and present your ideas in writing. Do not write a story, but a composition. You can take some time to think, and then start writing." If the subject asked: 'Must I give my opinion?', "Should I give reasons why it's not okay?", Should I write what I think about such things?", "Should I write what I feel about it?" the response was: "You can do that too / as well". Subjects were made to feel that they could include such ideas and information but that this was not exactly what was asked of them, that was not all they should write. If the subject asked "Can I take notes?'", the response was: 'If you like'. There was no interaction during writing. In cases where subjects asked questions, the investigator shrugged or nodded, or simply repeated the instructions given earlier. The elicitation of the written text took place in the morning, in a separate classroom at the school., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions. The questionnaire that followed text elicitation was written and answered in Hebrew., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Expository', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the first meeting with the subject, who was asked to write an essay discussing problems between kids/people. The meeting took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. The subject was then asked to write about problems between people, discuss them and present his/her ideas. The subject was asked not to write a story but to discuss the topic. The subject was then given time to think and was requested to start writing when ready. After the subject began writing there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The written text was transcribed and coded according to CHAT format., H18 was the only participant in the session., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the written Hebrew text. This version contains general information about the subject, and a computerized replica of the subject's text which was handwritten in Hebrew characters. This mirror version (also in Hebrew orthography) of what the subject wrote includes: spelling errors, erasures, repairs and other forms of editing, as well as the original punctuation, page layout, division into lines and paragraphs, headings, underlinings, and special characters as in the original written text., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized or stripped version of the written text. This version contains a broad phonemic transcription (in Roman characters) of what the subject wrote, but omits, and in some cases corrects or standardizes, any deviations from normative linguistic form and use, such as spelling mistakes. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., The original written text is not available to the public., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions. The questionnaire that followed text elicitation was written and answered in Hebrew., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Expository', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the first meeting with the subject, who was asked to write an essay discussing problems between kids/people. The meeting took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. The subject was then asked to write about problems between people, discuss them and present his/her ideas. The subject was asked not to write a story but to discuss the topic. The subject was then given time to think and was requested to start writing when ready. After the subject began writing there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The written text was transcribed and coded according to CHAT format., H18 was the only participant in the session., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the written Hebrew text. This version contains general information about the subject, and a computerized replica of the subject's text which was handwritten in Hebrew characters. This mirror version (also in Hebrew orthography) of what the subject wrote includes: spelling errors, erasures, repairs and other forms of editing, as well as the original punctuation, page layout, division into lines and paragraphs, headings, underlinings, and special characters as in the original written text., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized or stripped version of the written text. This version contains a broad phonemic transcription (in Roman characters) of what the subject wrote, but omits, and in some cases corrects or standardizes, any deviations from normative linguistic form and use, such as spelling mistakes. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum.

: HH18mnsd; Spoken personal-experience narrative about a situation of interpersonal conflict with other kids/people. The text was elicited during the second part of the second session held with the subject, immediately following elicitation of the written text. After the subject finished writing, the investigator sat next to the subject, and said: "Now I would like you to please tell me the story." If the subject asked questions like: 'Should I tell the same story?' the response was: 'Talk about the same incident'. The subject was again asked not to tell what s/he saw in the video, but to tell a (personal) story about something that s/he experienced. The subject was then given time to think before speaking. If the subject said something like: "But I never had any such problem", or "Should I talk about problems that were shown in the video?", the response was to repeat the instructions. There was no dialogue or conversational interaction during recording. If subjects asked questions during the recording, the investigator shrugged or nodded, or repeated the instructions she had given before. The recording took place in the morning, in a separate classroom at the school, at most three days after the first session., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Narrative', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second half of the second meeting with the subject, who tells a personal story about a problem s/he had with other kids/people. The meeting took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. After the subject was first asked to write a story about a personal problem s/he encountered with other people, s/he was then asked to tell the story. The subject was asked not to tell what s/he saw in the video, but to describe a personal experience. The subject was given time to think and was requested to start speaking when ready. After the subject began talking there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The spoken text was transcribed and coded according to CHAT format., H18 was the only participant in the session. The investigator only appears at the beginning of the recording, giving instructions., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the tape-recorded spoken text. This version contains general information about the subject, and a broad phonemic transcription (in Roman characters) of what the subject said, including disfluencies (pauses, hesitation phenomena, prosodic information, false starts, repetitions) and interchanges between the subject and the investigator. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. Any exchange between the subject and the investigator is set apart from the main body of the text, with the general header SBJ marking the subject's tier and the general header INV marking the investigator's tier. The phonemic transcription follows both the CHAT conventions (MacWhinney, 1995) as well as special conventions formulated by the project for transcribing both spoken and written Hebrew texts to make them accessible to cross-linguistic analysis. The text itself is divided into clauses based on criteria specified in Berman & Slobin (1994: 660-663), who defined a clause as a unified predicate including aspectual and modal modifications. Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized version of the spoken text, stripped of disfluencies. This version contains a broad phonemic transcription (in Roman characters) of what the subject said, but omits and in some cases corrects or standardizes deviations from normative linguistic form and use, such as hesitations, false starts, repairs, etc.. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., The media file is not available to the public., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Narrative', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second half of the second meeting with the subject, who tells a personal story about a problem s/he had with other kids/people. The meeting took place at school. The topic was presented by the investigator, who gave a short explanation and showed a short, wordless video tape to the subject at the beginning of the session. After the subject was first asked to write a story about a personal problem s/he encountered with other people, s/he was then asked to tell the story. The subject was asked not to tell what s/he saw in the video, but to describe a personal experience. The subject was given time to think and was requested to start speaking when ready. After the subject began talking there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The spoken text was transcribed and coded according to CHAT format., H18 was the only participant in the session. The investigator only appears at the beginning of the recording, giving instructions., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the tape-recorded spoken text. This version contains general information about the subject, and a broad phonemic transcription (in Roman characters) of what the subject said, including disfluencies (pauses, hesitation phenomena, prosodic information, false starts, repetitions) and interchanges between the subject and the investigator. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. Any exchange between the subject and the investigator is set apart from the main body of the text, with the general header SBJ marking the subject's tier and the general header INV marking the investigator's tier. The phonemic transcription follows both the CHAT conventions (MacWhinney, 1995) as well as special conventions formulated by the project for transcribing both spoken and written Hebrew texts to make them accessible to cross-linguistic analysis. The text itself is divided into clauses based on criteria specified in Berman & Slobin (1994: 660-663), who defined a clause as a unified predicate including aspectual and modal modifications. Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized version of the spoken text, stripped of disfluencies. This version contains a broad phonemic transcription (in Roman characters) of what the subject said, but omits and in some cases corrects or standardizes deviations from normative linguistic form and use, such as hesitations, false starts, repairs, etc.. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis, but not necessarily for phonological or even morphophonological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum.

: HH18mnwd; Written personal-experience narrative about a situation of interpersonal conflict with other kids/people. The text was elicited during the second session held with the subject. The investigator sat opposite the subject, and said: "Yesterday / a couple of days ago / a little while ago / before lunch you saw a video that showed different kinds of problems between people / in school life / situations where people do not agree. We are making a collection of stories about problems between people / problems in school life / conflicts / situations / predicaments. So, I would like you to write about a time / an incident when you had / encountered a problem with someone. Don't write about what you saw in the video, write a (personal) story about something that happened to you / something you experienced. You can take your time." If the subject asked questions like: "But I never had any such problem", or "Should I talk about problems that were shown in the video?", the response was to repeat the instructions. If the subject asked: 'Can I take notes, make a draft?' the response was: 'Yes, if you like'. If the subject asked: 'How long should it be?' the response was: 'Whatever you like'. The investigator then said: "Let me know when you are done" and left the room, or else remained in the room, engaged in some other activity. There was no dialogue or interaction during writing. In case when subjects asked questions, the investigator shruged or nodded, or repeated the instructions given. The session took place in the morning, in a separate classroom at the school, at most three days after the first session., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Narrative', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second meeting with the subject, who writes a personal story about a problem s/he had with other kids/people. The meeting took place at school, one day after the first meeting. The topic was presented by the investigator. Since this was the second meeting, the video was recalled (not shown again) at the beginning of the session, and the subject was asked to write a story about a personal problem s/he encountered with other people. The subject was asked not to write about what s/he saw in the video, but to describe a personal experience. The subject was given time to think and was requested to start writing when ready. After the subject began writing there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The written text was transcribed and coded according to CHAT format., H18 was the only participant in the session., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the written Hebrew text. This version contains general information about the subject, and a computerized replica of the subject's text which was handwritten in Hebrew characters. This mirror version (also in Hebrew orthography) of what the subject wrote includes: spelling errors, erasures, repairs and other forms of editing, as well as the original punctuation, page layout, division into lines and paragraphs, headings, underlinings, and special characters as in the original written text., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized or stripped version of the written text. This version contains a broad phonemic transcription (in Roman characters) of what the subject wrote, but omits, and in some cases corrects or standardizes, any deviations from normative linguistic form and use, such as spelling mistakes. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., The original written text is not available to the public., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum., The Spencer project was a cross-linguistic study conducted in seven countries, funded by a major grant from the Spencer Foundation, Chicago, USA (1997-2000), with Ruth Berman of Tel-Aviv University as PI. The project goals were to: 1) understand how school children of different ages (as compared with adults) construct texts, in the sense of monologic pieces of discourse; 2) examine what linguistic, cognitive, and communicative resources they deploy in order to adapt their texts to different circumstances, in narrative and expository discourse and in writing compared with speech; and 3) ascertain whether and where there are find shared or different trends depending on the language they use. The study devolved around four independent variables: Language (X7), Age-level (X4), Genre (X2), and Modality (X2). The seven languages were: Dutch, English, French, Hebrew, Icelandic, Spanish, and Swedish. The four age groups were gradeschool (G), junior highschool (J), highschool (H), and adults in their 20s and 30s, graduate-level university students. The two genres were personal-experience narrative and expository discussion, and the two modalities were speech and writing. For further details see the final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages / Ruth A. Berman, Tel Aviv University (October 2001), Hebrew is the only language used in this session, both by the participant and by the investigator, during text production and in the preceding instructions., This file was generated from an IMDI 1.9 file and transformed to IMDI 3.0. The substructure of Genre is replaced by two elements named "Genre" and "SubGenre". The original content of Genre substructure was: Interactional = 'Unspecified', Discursive = 'Narrative', Performance = 'Unspecified'. These values have been added as Keys to the Content information., This is the second meeting with the subject, who writes a personal story about a problem s/he had with other kids/people. The meeting took place at school, one day after the first meeting. The topic was presented by the investigator. Since this was the second meeting, the video was recalled (not shown again) at the beginning of the session, and the subject was asked to write a story about a personal problem s/he encountered with other people. The subject was asked not to write about what s/he saw in the video, but to describe a personal experience. The subject was given time to think and was requested to start writing when ready. After the subject began writing there was no interaction, questions were answered with a nod, a shrug or by repeating the instructions. The written text was transcribed and coded according to CHAT format., H18 was the only participant in the session., The mirror orthographic file is not available to the public., The orthographic annotation is a replica or mirror version of the written Hebrew text. This version contains general information about the subject, and a computerized replica of the subject's text which was handwritten in Hebrew characters. This mirror version (also in Hebrew orthography) of what the subject wrote includes: spelling errors, erasures, repairs and other forms of editing, as well as the original punctuation, page layout, division into lines and paragraphs, headings, underlinings, and special characters as in the original written text., The phonemic transcription file is not available to the public., The phonemic annotation is a standardized or stripped version of the written text. This version contains a broad phonemic transcription (in Roman characters) of what the subject wrote, but omits, and in some cases corrects or standardizes, any deviations from normative linguistic form and use, such as spelling mistakes. The standardized transcripts provide the information relevant to within-language and especially cross-linguistic comparisons of morpho-syntactic and lexical structure and referential content, and they also allow for analysis of the interaction between linguistic forms and discourse functions. The phonemic transcription follows the CHAT specifications combined with special transcription conventions formulated for Hebrew texts to make them accessible to cross-linguistic analysis. The transcription system that was used was detailed enough to allow for morphological analysis. The annotation file contains the basic headers required by the CLAN program. The text itself is divided so that each *SBJ tier contains a clause rather than a turn, based on criteria specified in Berman & Slobin (1994: 660-663). Division into clauses was conducted by two native speakers of Hebrew with training in linguistics. Reliability was tested on 10% of all texts, and inter-judge agreement reached nearly 95%. Any exchange between the subject and the collector was removed from the transcription., The morphological analysis is not available to the public., The morphological annotation appears on a %MOR tier, following CLAN conventions. This annotation contains information about whole words, not morphemes. Each word was coded semi-automatically with the help of a unique lexicon file so that the following information is available: the lexical category of the word; if Noun, whether it is in Genitive case, in Hebrew a suffix that varies by gender, number and person; if Verb - whether lexical, aspectual, copular, etc; if Verb - verb root, verb pattern, and tense; and for all words -- base form (lexeme) so that different forms of the same lexeme are coded as a single type, for purposes of TTR and VOCD calculations., The syntactic analysis is not available to the public., The syntactic coding appears on a %SYN tier, following CLAN conventions. This annotation contains information clause type and clause linkage. Each clause was assigned to one of the following categories: Main, Subordinate, Coordinate, Juxtaposed, Gapped. Six types of links were used to code relationships between each pair of clauses: finite linking, nonfinite linking, noun complementation, coordination, and juxtaposition. The information on each %SYN tier was used to create @GEM tiers, follwing CLAN conventions. These tiers contain coding of a new unit of text analysis termed Clause Packages (Katzenberg and Cahana-Amitai 2002). Segmentation into clause packages was done by three native speakers of Hebrew with training in linguistics and discourse analysis (a major in linguistics, a linguist, and an expert in narrative analysis). Two coders worked together on segmenting all texts, and the third segmented them independently, yielding approximately 90% inter-judge agreement. This coding was used for the comparison of Clause Packages with T-Units (Verhoeven, Aparici, Cahana-Amitai, val Hell, Kriz & Viguié-Simon, 2002)., Aparici, M., L. Tolchinsky & E. Rosado (2000). On defining Longer Units in narrative and expository Spanish texts. In: Aparici, M. et al. (eds.), Working Papers in Developing Literacy Across Genres, Modalities, and Languages, Vol. 3. Spain: University of Barcelona, 95-122. Baruch, E. 1999. Observations from the field. In: Aisenman, R. A (ed.), working papers in developing literacy across genres, modalities, and languages. Tel-Aviv University. Berman, R.A. 2001. Final report to the Spencer Foundation, Chicago: Developing Literacy In Different Contexts and In Different Languages. Tel Aviv University. Berman, R.A., and Verhoeven,L. 2002. Crosslinguistic perspectives on the development of text production abilities in speech and writing. Written Language and Literacy, Volume 5, 1-44. Katzenberger, I. E. in press. The Development of Clause Packaging in Spoken and Written Texts. Journal of Pragmatics. Katzenberger, I. E. & Cahana-Amitay, D. 2002. Segmentation marking in text production. Linguistics, 40-6, 1161-1184. Verhoeven, L,. Aparici, M., Cahana-Amitai, D., val Hell, J., Kriz. S., & Viguie-Simon A. 2002. Clause packaging in writing and speech: A cross-linguistic developmental analysis. Written Language and Literacy, Volume 5, 135-162 MacWhinney, Brian. 1995. The CHILDES Project. Hillsdale, NJ: Erlbaum.

In collections