TY - JOUR
T1 - Designing cosih
T2 - The corpus of spoken israeli hebrew
AU - Izre’el, Shlomo
AU - Hary, Benjamin
AU - Rahav, Giora
PY - 2001
Y1 - 2001
N2 - This paper describes the initial design of the Corpus of Spoken Israeli Hebrew (CoSIH). CoSIH will attempt to include a representation of most varieties of spoken Hebrew as it is used in Israel today. CoSIH is designed to consist of two complementary corpora: a main corpus and a supplementary corpus. The main corpus, which will comprise about 90% of the entire collection, will be sampled statistically. For analytical purposes it will use a conceptual tool in the form of a multidimensional matrix combining demographic and contextual tiers. The combined demographic and contextual design will be capable of showing the distribution of speech types in various subgroups of the population. The supplementary corpus will include about 10% of the collected data, and will add to the statistically-sampled corpus some targeted demographically sampled texts and a contextually designed collection. This design is culturally dependent to suit the special structure of the Israeli Hebrew speech community and thus includes both native and non-native speakers of Hebrew. Nonetheless, the principles governing this design are such that they would service study of many other speech communities, to the extent that the design itself may be employed for other corpora with only slight modifications.
AB - This paper describes the initial design of the Corpus of Spoken Israeli Hebrew (CoSIH). CoSIH will attempt to include a representation of most varieties of spoken Hebrew as it is used in Israel today. CoSIH is designed to consist of two complementary corpora: a main corpus and a supplementary corpus. The main corpus, which will comprise about 90% of the entire collection, will be sampled statistically. For analytical purposes it will use a conceptual tool in the form of a multidimensional matrix combining demographic and contextual tiers. The combined demographic and contextual design will be capable of showing the distribution of speech types in various subgroups of the population. The supplementary corpus will include about 10% of the collected data, and will add to the statistically-sampled corpus some targeted demographically sampled texts and a contextually designed collection. This design is culturally dependent to suit the special structure of the Israeli Hebrew speech community and thus includes both native and non-native speakers of Hebrew. Nonetheless, the principles governing this design are such that they would service study of many other speech communities, to the extent that the design itself may be employed for other corpora with only slight modifications.
UR - http://www.scopus.com/inward/record.url?scp=84989382478&partnerID=8YFLogxK
U2 - 10.1075/ijcl.6.2.01izr
DO - 10.1075/ijcl.6.2.01izr
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84989382478
SN - 1384-6655
VL - 6
SP - 171
EP - 197
JO - International Journal of Corpus Linguistics
JF - International Journal of Corpus Linguistics
IS - 2
ER -