Loading...
 

Guide to database

This is a guide to using the database. Here we note the conventions that we used when entering the data. This is supposed to help the user in searching the data optimally.

Go to the database.


© 2024 Daniel Aremu, Katharina Hartmann, Anke Himmelreich, Johannes Mursell. Some rights reserved.

Terms of use

You may use the data of this database for your own work, if you cite the work appropriately.

How to cite

Aremu, Daniel, Katharina Hartmann, Anke Himmelreich and Johannes Mursell (2024): Database for the VP-periphery in Mabia languages. [Data set] Available online at https://mabia-vp.com/tiki-index.php?page=Database, Accessed on DATE OF ACCESS.

Origin of the data

The data of this database were collected during fieldwork in Ghana and Germany 2021--2024. The Mabia languages in this database do have writing systems, however, the standardization of writing as well as the development of writing systems for the various dialects of each language are not complete. Given this, the database stores the data as written for us by the speakers. Therefore, differences in word separation and grapheme-phoneme correspondence are possible.

Additionally, some data have tone markings. These tones were marked by the speakers according to their intuitions. For some data, we have audio recordings, which represent the phonetic tones.

How to search for data

When to search the corpus for data, the user can first optionally set the filters and enter certain search items (words, glosses, POS tags). After this, the user has to press the button "Search" and wait until all data are displayed below. The result will be a list of datapoints, each consisting of one or more glossed and tagged sentences with additional information on the speaker, dialect, and date of elicitation. Concretely, a datapoint contains more than one sentence if one sentence provides the context for another sentence (e.g. question-answer pairs) or the sentences are variants of each other (e.g. different word orders or grammatical vs. ungrammatical versions). The results can be downloaded in different file formats (xml, tex, docx and pdf).

Note that the data contain Unicode signs which might cause problems for Latex or Word. For Latex, use xelatex or lualatex as a compiler or replace the problematic characters with their Latex commands if you must use (pdf)latex. For Word, make sure that your encoding is set to utf-8.

The print to PDF button will lead you to the print option in your browser. Make sure that your settings are set to "Save to PDF" if you would like to download a PDF.

Note that you have two "Search"-buttons after the first search: "New search" and "Add to search results". If you press "New search", all former search results will be overridden on the server. If you press "Add to search results", the new results will be appended to the previous results. For example if you want to find all ditransitives and serial verb constructions, but no intransitives or simple transitives, first search for "Ditransitive" and then when you choose "Serial verb construction", press "Add to search results. We recommend that you save your results in-between. The files containing your search results are deleted after 24 hours.

Make sure that you have the newest version of your browser installed. Some functions might not be fully available if your browser version is too old.


Searching for words, glosses or POS tags

The database can be searched for certain words, glosses and part of speech tags (henceforth "search terms"). These can be searched for in the field "Free search". The free search is completely case-insensitive.

Search terms including or excluding tone marking

Tones in the data are marked by acute accent (high tone, e.g. "á"), grave accent (low tone, e.g. "à"), or macron (mid tone, e.g. "ā"). If you do not use these diacritics, the search result will include toned AND untoned versions of the search term. If you use the accents, you will only find the search items with exactly the accents you searched for. For example, you can search for only high-toned versions of a particle "la" by entering the search item "lá". If you want to find all occurences of "la", independent of tone, you have to search for "la".

Finding only glosses or parts-of-speech (gl:x, pos:x)

The search function allows you to prefix your search term either with "gl:" for gloss or "pos:" for part-of-speech. Using a prefix will lead to a more specified search where you can focus on finding only datapoints with specific glosses or parts-of-speech. For example, the term "pos:adv" (= "pos:ADV") will give you all datapoints that contain words tagged as adverbs. The term "gl:pst" (= "gl:PST") will give you all datapoints that contain words with the gloss "pst" for past tense.

Note that the prefixes lead to an exact search in contrast to an unprefixed search. That means that if you search simply "q", you will find all datapoints that somewhere contain a "q" (not including the sentence name "Q:" that precede all questions). But if you search for "gl:q" or "pos:q" will only find datapoints with a question particle. Similar contrasts hold for other search terms.

Sequences of search terms (* = same datapoint, + = adjacent words)

You can search for multiple terms at once. If the elements do not have to be adjacent, but simply occur in the same datapoint (but not necessarily in the same sentence), the two terms must be connected by the symbol "*". For example, the sequence "Adam*work" will give you all datapoints that contain "Adam" and "work". The sequence "pos:FOC*pos:ADV" will give you all datapoints that contain an adverb and a focus particle independent of their order. The sequence "Adam*gl:pst" will find all datapoints that contain the term "Adam" and the gloss "pst".

Next, you can connect two search terms with the symbol "+". That means they must be adjacent, with the second term following the first term. For example, the sequence "Adam+work" will give you all datapoints that where "Adam" is immediately followed by "work". The sequence "pos:FOC+pos:ADV" will give you all datapoints that where an adverb is immediately preceded by a focus particle. The sequence "Adam+gl:pst" will find all datapoints that where "Adam" is immediately followed by the gloss "pst".

Finally, using a simple space between two terms can give you two results. First, you can use a space to find a word together with its gloss. For example "la foc" will find you all instances of la that are focus particles. The second use for a space is if you want to translate a specific English sentence. The database is comparative and you can e.g. compare how the sentence "Adam worked." is translated differently in different Mabia languages.


Filtering the data for certain constructions


Additionally, the data can currently be filtered according for various constructions. If a filter is set to "Select all", the data are not filtered for the specific property.

In the future, further filters might be added.

Language


The filter finds data only from a certain language.

Current options: Dagaare, Dagbani, Gurene, Kasem, Kusaal, Likpakpaanl, Sisaali

Audio


The filter finds either data that come with an audio or data that don't have an audio.

Sentence embedding


The filter finds data which are either simple clauses or are of a certain type of complex clause.

Current options:

  • Simple matrix clause (e.g. Adam worked, She is singing in the church)
  • Matrix + embedded subject clause: a clause is the subject of the matrix clause (e.g. [subj Ama receiving money ] helped her family or [subj That my husband talks so much ] bothers me)
  • Matrix + embedded object clause: a clause is the object of the matrix clause (e.g. I don't know [obj when John slaughtered the fowl ], Peter said [obj that John slaughtered a fowl ])
  • Matrix + adverbial clause: a clause acts as an adverb to the matrix clause (e.g. Ama lost money [adv after she fed her child ], Ama lost money [adv because a thief stole her bag ])
  • Clause conjunction: two sentences are connected by "and" or "but" (e.g. [ Ayuo ate fufu ] and [ Bayuo slaughtered a fowl ], [ Adam will not work in France ] but [ he will work in Germany ]).
  • Clause disjunction: two sentences are connected by "or" (e.g. [ Ayuo ate fufu ] or [ Bayuo slaughtered a fowl ])
  • Matrix + relative clause: a noun in the example is modified by a relative clause (e.g. the subject can be modified by a relative clause: the man [rel that slaughtered a fowl ] saw me, the object can be modified by a relative clause: I saw the man [rel that slaughtered a fowl ])

Sentence type


The filter finds data which are either declaratives or a certain type of question, or an imperative. In complex sentences, the matrix clause and the embedded clauses can have different sentence types, e.g. with embedded questions.

Current options:

  • Declarative: a "normal sentence". This also includes sentences with focused constituents in them.
  • In-situ question: an open-ended question without fronting of the wh-word (e.g. John slaughtered what?, I don't know [ John slaughtered what ])
  • Ex-situ question: an open-ended question where the wh-word is fronted (e.g. What did John slaughter?, I don't know [ what John slaughtered ])
  • Polar question: a yes-no question (e.g. Did John slaughter the fowl?, I don't know [ if John slaughtered the fowl])
  • Imperative clause: an imperative (e.g. Slaughter the fowl!)

Focus type


The filter finds data that have no focus, or new information focus or constrastive focus. Something is tagged as in focus if it bears pragmatic focus. It does not necessarily have a focus marker.

Current options: no focus, in-situ new information focus, ex-situ new information focus, in-situ contrastive focus, ex-situ contrastive focus

Wh/rel/foc element


The filter finds the data where wh-movement, relativization or focalization targets only the chosen constituent.

Current options: No target, entire clause, truth value, subject, direct object, indirect object, adverb, verb, verb phrase

Transitivity


The filter finds data with intransitive, transitive, or ditransitive verbs or serial verb constructions. "Multiple object construction" finds all ditransitives or serial verb constructions

Current options: intransitive, transitive, ditransitive, serial verb construction, multiple object construction

Aspect


The filter finds data with a certain type of aspect (e.g. imperfective, perfective). Progressive and habitual are seen as subtypes of imperfectives, completive is a subtype of perfective.

Current options: Imperfective, progressive, habitual, perfective, completive

Tense


The filter finds data in a certain tense (e.g. past, present, future).

Current options: Past, present, future

Polarity


The filter finds either affirmative or negative sentences.

Tone


The filter finds either data that do not contain tonal marking or data that are at least partially toned


Audiofiles

Some examples in the database come with audiofiles where speakers where recorded pronouncing the respective example. If your browser allows it, a media player is displayed at the respective example (next to "date"). Here you can play the file and download it. The audiofiles are all mp3-files with the name of the respective example key (e.g. "Likpakpaanl-24.mp3").


Glosses

Conventions

Some closed-class words can be glossed by abbreviations or by English translations. The following lists the conventions that are used in the database.

  1. All glosses including functional glosses are in lower case. Formatting to small caps is done through formatting commands if needed.
  2. Pronouns are NOT glossed by English translations (i.e. "I", "you", "he" etc.), but by the phi-features they express (i.e. 1sg, 2sg, 3sg, etc.).
  3. If a language differentiates different types of locatives (e.g. in vs. on),the English translations are used. If there is no distinction, the gloss is "loc".
  4. Particles that are used for marking focus are glossed as "foc" throughout, even if they are used in non-focus contexts.

List of glosses

When entering the data and searching for data, the glosses are required to follow the Leipzig Glossing Rules. Glosses that are not in the LGR, should be listed here under "own convention". Importantly there should be no divergences from the list.

Gloss Meaning Source
1 first person Leipzig Glossing Rules
2 second person Leipzig Glossing Rules
3 third person Leipzig Glossing Rules
acc accusative Leipzig Glossing Rules
anim animate own convention
comp complementizer Leipzig Glossing Rules
compl completive Leipzig Glossing Rules
cj conjoined own convention
conj conjunction, conjoined own convention
cop copula Leipzig Glossing Rules
def definite Leipzig Glossing Rules
def demonstrative Leipzig Glossing Rules
dir directional own convention
dj disjoined own convention
emph emphatic own convention
foc focus Leipzig Glossing Rules
fut future Leipzig Glossing Rules
hest hesternal own convention
hum human own convention
ipfv imperfective Leipzig Glossing Rules
loc locative Leipzig Glossing Rules
nom nominative Leipzig Glossing Rules
 nc noun class own convention
neg negation, negative Leipzig Glossing Rules
pfv perfective Leipzig Glossing Rules
poss possessive Leipzig Glossing Rules
pro pronoun own convention
prog progressive Leipzig Glossing Rules
pst past Leipzig Glossing Rules
pl plural Leipzig Glossing Rules
ptcl particle own convention
q question particle/marker Leipzig Glossing Rules
rel relative Leipzig Glossing Rules
rel.pro relative pronoun own convention
rel.det relative determiner own convention
sg singular Leipzig Glossing Rules
tns tense own convention
top topic Leipzig Glossing Rules


Part-of-speech tagging


Part-of-speech tagging works similar to glossing. The users are instructed to adhere to the list of POS below. The tags correspond to abbreviations used commonly in linguistics.


Conventions

  1. All POS tags are uppercase throughout.

List of POS tags

POS Meaning Example
ADJ adjective red
ADV adverb yesterday
ART article the, this, a
ASP independent aspect particle
COMP (subordinating) complementizer that, whether, because
CONJ (coordinating) conjunction and, or, but
COP copula is
DEM demonstrative this, that, those, these
FOC independent focus particle
N noun (incl. proper names) house, Adam
NEG negation not
P preposition in
PART particle other than tense, focus, or aspect, or question particle
POSS possessive pronoun his
PRO personal pronoun he
Q question particle, question tag right?
REL relative pronoun, relative marker that, which
SW sentence word yes, no
TNS independent tense particle
V verb slaughter
WH wh-pronoun or determiner which, who


Word and morpheme separation

Conventions

  1. Focus particles are separate words throughout. They are not marked as suffixes to the words.
  2. Elements that are clearly suffixes are separated by a hyphen from the stem (e.g. conjoint, disjoint marking).


Tone marking

Notation

  1. High tone: acute accent, e.g. "á"
  2. Low tone: grave accent, e.g. "à"
  3. Mid tone: macron, e.g. "ā"

How the data are stored in the database


The database is xml-based, which means that the data are stored in xml format. The following gives an example of how an example (i.e. "datapoint") is stored:

<datapoint ID="Likpakpaanl-1">
      <language>Likpakpaanl</language>
      <dialect>--</dialect>
      <speaker>SA</speaker>
      <date>2021-10-28</date>
      <audio>audio</audio>
      <audiofile>Likpakpaanl-1.mp3</audiofile>
      <construction>
            <embedding>simple cl.</embedding>
            <type>declarative</type>
            <focus>no.focus</focus>
            <target>no.target</target>
            <transitivity>intransitive</transitivity>
            <aspect>perfective</aspect>
            <tense>past</tense>
            <polarity>affirmative</polarity>
            <tone>no.tone</tone>
      </construction>
      <example>
            <sentence>
                        <name/>
                        <judgment/>
                        <or_gloss>
                                    <word><or>Adam </or><gl>Adam </gl><pos>N </pos></word>
                                    <word><or>fe </or><gl>hest.pst </gl><pos>TNS </pos></word>
                                    <word><or>tun </or><gl>work </gl><pos>V </pos></word>
                                    <word><or>(fenna) </or><gl>yesterday </gl><pos>ADV </pos></word>
                        </or_gloss>
                        <translation>Adam worked yesterday.</translation>
            </sentence>
      </example>
</datapoint>


An example a.k.a. datapoint consists of three parts:

  1. Some metadata: language, dialect, speaker, date of elicitation and possibly an audiofile
  2. Information about the construction: see filters above.
  3. The example itself: It can consist of multiple sentences (e.g. question-answer pair and/or grammatical and ungrammatical versions of the same sentence). A sentence consists of triples for each word: the original word in the respective language, the gloss and the POS tag. After this a translation is added.


Each datapoint has a unique key to identify it.

The database uses xml, xsl, php, javascript, html, and css.


Contributors to this page: admin .
Page last modified on Monday February 5, 2024 15:44:23 CET by admin.

Menu