The VP-periphery in Mabia languages

This is a guide to using the database. Here we note the conventions that we used when entering the data. This is supposed to help the user in searching the data optimally.

Go to the database.

Copyright disclaimer for the database content

Terms of use

You may use the data of this database for your own work, if you cite the work appropriately.

How to cite

Aremu, Daniel, Katharina Hartmann, Anke Himmelreich and Johannes Mursell (2024): Database for the VP-periphery in Mabia languages. [Data set] Available online at https://mabia-vp.com/tiki-index.php?page=Database, Accessed on DATE OF ACCESS.

Origin of the data

The data of this database were collected during fieldwork in Ghana and Germany 2021--2024. The Mabia languages in this database do have writing systems, however, the standardization of writing as well as the development of writing systems for the various dialects of each language are not complete. Given this, the database stores the data as written for us by the speakers. Therefore, differences in word separation and grapheme-phoneme correspondence are possible.

Additionally, some data have tone markings. These tones were marked by the speakers according to their intuitions. For some data, we have audio recordings, which represent the phonetic tones.

How to search for data

When to search the corpus for data, the user can first optionally set the filters and enter certain search items (words, glosses, POS tags). After this, the user has to press the button "Search" and wait until all data are displayed below. The result will be a list of datapoints, each consisting of one or more glossed and tagged sentences with additional information on the speaker, dialect, and date of elicitation. Concretely, a datapoint contains more than one sentence if one sentence provides the context for another sentence (e.g. question-answer pairs) or the sentences are variants of each other (e.g. different word orders or grammatical vs. ungrammatical versions). The results can be downloaded in different file formats (xml, tex, docx and pdf).

Note that the data contain Unicode signs which might cause problems for Latex or Word. For Latex, use xelatex or lualatex as a compiler or replace the problematic characters with their Latex commands if you must use (pdf)latex. For Word, make sure that your encoding is set to utf-8.

The print to PDF button will lead you to the print option in your browser. Make sure that your settings are set to "Save to PDF" if you would like to download a PDF.

Note that you have two "Search"-buttons after the first search: "New search" and "Add to search results". If you press "New search", all former search results will be overridden on the server. If you press "Add to search results", the new results will be appended to the previous results. For example if you want to find all ditransitives and serial verb constructions, but no intransitives or simple transitives, first search for "Ditransitive" and then when you choose "Serial verb construction", press "Add to search results. We recommend that you save your results in-between. The files containing your search results are deleted after 24 hours.

Make sure that you have the newest version of your browser installed. Some functions might not be fully available if your browser version is too old.

Searching for words, glosses or POS tags

The database can be searched for certain words, glosses and part of speech tags (henceforth "search terms"). These can be searched for in the field "Free search". The free search is completely case-insensitive.

Search terms including or excluding tone marking

Tones in the data are marked by acute accent (high tone, e.g. "á"), grave accent (low tone, e.g. "à"), or macron (mid tone, e.g. "ā"). If you do not use these diacritics, the search result will include toned AND untoned versions of the search term. If you use the accents, you will only find the search items with exactly the accents you searched for. For example, you can search for only high-toned versions of a particle "la" by entering the search item "lá". If you want to find all occurences of "la", independent of tone, you have to search for "la".

Finding only glosses or parts-of-speech (gl:x, pos:x)

The search function allows you to prefix your search term either with "gl:" for gloss or "pos:" for part-of-speech. Using a prefix will lead to a more specified search where you can focus on finding only datapoints with specific glosses or parts-of-speech. For example, the term "pos:adv" (= "pos:ADV") will give you all datapoints that contain words tagged as adverbs. The term "gl:pst" (= "gl:PST") will give you all datapoints that contain words with the gloss "pst" for past tense.

Note that the prefixes lead to an exact search in contrast to an unprefixed search. That means that if you search simply "q", you will find all datapoints that somewhere contain a "q" (not including the sentence name "Q:" that precede all questions). But if you search for "gl:q" or "pos:q" will only find datapoints with a question particle. Similar contrasts hold for other search terms.

Sequences of search terms (* = same datapoint, + = adjacent words)

You can search for multiple terms at once. If the elements do not have to be adjacent, but simply occur in the same datapoint (but not necessarily in the same sentence), the two terms must be connected by the symbol "*". For example, the sequence "Adam*work" will give you all datapoints that contain "Adam" and "work". The sequence "pos:FOC*pos:ADV" will give you all datapoints that contain an adverb and a focus particle independent of their order. The sequence "Adam*gl:pst" will find all datapoints that contain the term "Adam" and the gloss "pst".

Next, you can connect two search terms with the symbol "+". That means they must be adjacent, with the second term following the first term. For example, the sequence "Adam+work" will give you all datapoints that where "Adam" is immediately followed by "work". The sequence "pos:FOC+pos:ADV" will give you all datapoints that where an adverb is immediately preceded by a focus particle. The sequence "Adam+gl:pst" will find all datapoints that where "Adam" is immediately followed by the gloss "pst".

Finally, using a simple space between two terms can give you two results. First, you can use a space to find a word together with its gloss. For example "la foc" will find you all instances of la that are focus particles. The second use for a space is if you want to translate a specific English sentence. The database is comparative and you can e.g. compare how the sentence "Adam worked." is translated differently in different Mabia languages.

Filtering the data for certain constructions

Additionally, the data can currently be filtered according for various constructions. If a filter is set to "Select all", the data are not filtered for the specific property.

In the future, further filters might be added.

Language

The filter finds data only from a certain language.

Current options: Dagaare, Dagbani, Gurene, Kasem, Kusaal, Likpakpaanl, Sisaali

Audio

The filter finds either data that come with an audio or data that don't have an audio.

Sentence embedding

The filter finds data which are either simple clauses or are of a certain type of complex clause.

Current options:

Simple matrix clause (e.g. Adam worked, She is singing in the church)
Matrix + embedded subject clause: a clause is the subject of the matrix clause (e.g. [_subj Ama receiving money ] helped her family or [_subj That my husband talks so much ] bothers me)
Matrix + embedded object clause: a clause is the object of the matrix clause (e.g. I don't know [_obj when John slaughtered the fowl ], Peter said [_obj that John slaughtered a fowl ])
Matrix + adverbial clause: a clause acts as an adverb to the matrix clause (e.g. Ama lost money [_adv after she fed her child ], Ama lost money [_adv because a thief stole her bag ])
Clause conjunction: two sentences are connected by "and" or "but" (e.g. [ Ayuo ate fufu ] and [ Bayuo slaughtered a fowl ], [ Adam will not work in France ] but [ he will work in Germany ]).
Clause disjunction: two sentences are connected by "or" (e.g. [ Ayuo ate fufu ] or [ Bayuo slaughtered a fowl ])
Matrix + relative clause: a noun in the example is modified by a relative clause (e.g. the subject can be modified by a relative clause: the man [_rel that slaughtered a fowl ] saw me, the object can be modified by a relative clause: I saw the man [_rel that slaughtered a fowl ])

Sentence type

The filter finds data which are either declaratives or a certain type of question, or an imperative. In complex sentences, the matrix clause and the embedded clauses can have different sentence types, e.g. with embedded questions.

Current options:

Declarative: a "normal sentence". This also includes sentences with focused constituents in them.
In-situ question: an open-ended question without fronting of the wh-word (e.g. John slaughtered what?, I don't know [ John slaughtered what ])
Ex-situ question: an open-ended question where the wh-word is fronted (e.g. What did John slaughter?, I don't know [ what John slaughtered ])
Polar question: a yes-no question (e.g. Did John slaughter the fowl?, I don't know [ if John slaughtered the fowl])
Imperative clause: an imperative (e.g. Slaughter the fowl!)

Focus type

The filter finds data that have no focus, or new information focus or constrastive focus. Something is tagged as in focus if it bears pragmatic focus. It does not necessarily have a focus marker.

Current options: no focus, in-situ new information focus, ex-situ new information focus, in-situ contrastive focus, ex-situ contrastive focus

Wh/rel/foc element

The filter finds the data where wh-movement, relativization or focalization targets only the chosen constituent.

Current options: No target, entire clause, truth value, subject, direct object, indirect object, adverb, verb, verb phrase

Transitivity

The filter finds data with intransitive, transitive, or ditransitive verbs or serial verb constructions. "Multiple object construction" finds all ditransitives or serial verb constructions

Current options: intransitive, transitive, ditransitive, serial verb construction, multiple object construction

Aspect

The filter finds data with a certain type of aspect (e.g. imperfective, perfective). Progressive and habitual are seen as subtypes of imperfectives, completive is a subtype of perfective.

Current options: Imperfective, progressive, habitual, perfective, completive

Tense

The filter finds data in a certain tense (e.g. past, present, future).

Current options: Past, present, future

Polarity

The filter finds either affirmative or negative sentences.

Tone

The filter finds either data that do not contain tonal marking or data that are at least partially toned

Audiofiles

Some examples in the database come with audiofiles where speakers where recorded pronouncing the respective example. If your browser allows it, a media player is displayed at the respective example (next to "date"). Here you can play the file and download it. The audiofiles are all mp3-files with the name of the respective example key (e.g. "Likpakpaanl-24.mp3").

Glosses

Conventions

Some closed-class words can be glossed by abbreviations or by English translations. The following lists the conventions that are used in the database.

All glosses including functional glosses are in lower case. Formatting to small caps is done through formatting commands if needed.
Pronouns are NOT glossed by English translations (i.e. "I", "you", "he" etc.), but by the phi-features they express (i.e. 1sg, 2sg, 3sg, etc.).
If a language differentiates different types of locatives (e.g. in vs. on),the English translations are used. If there is no distinction, the gloss is "loc".
Particles that are used for marking focus are glossed as "foc" throughout, even if they are used in non-focus contexts.

List of glosses

When entering the data and searching for data, the glosses are required to follow the Leipzig Glossing Rules. Glosses that are not in the LGR, should be listed here under "own convention". Importantly there should be no divergences from the list.

Gloss	Meaning	Source
1	first person	Leipzig Glossing Rules
2	second person	Leipzig Glossing Rules
3	third person	Leipzig Glossing Rules
acc	accusative	Leipzig Glossing Rules
anim	animate	own convention
comp	complementizer	Leipzig Glossing Rules
compl	completive	Leipzig Glossing Rules
cj	conjoined	own convention
conj	conjunction, conjoined	own convention
cop	copula	Leipzig Glossing Rules
def	definite	Leipzig Glossing Rules
def	demonstrative	Leipzig Glossing Rules
dir	directional	own convention
dj	disjoined	own convention
emph	emphatic	own convention
foc	focus	Leipzig Glossing Rules
fut	future	Leipzig Glossing Rules
hest	hesternal	own convention
hum	human	own convention
ipfv	imperfective	Leipzig Glossing Rules
loc	locative	Leipzig Glossing Rules
nom	nominative	Leipzig Glossing Rules
nc	noun class	own convention
neg	negation, negative	Leipzig Glossing Rules
pfv	perfective	Leipzig Glossing Rules
poss	possessive	Leipzig Glossing Rules
pro	pronoun	own convention
prog	progressive	Leipzig Glossing Rules
pst	past	Leipzig Glossing Rules
pl	plural	Leipzig Glossing Rules
ptcl	particle	own convention
q	question particle/marker	Leipzig Glossing Rules
rel	relative	Leipzig Glossing Rules
rel.pro	relative pronoun	own convention
rel.det	relative determiner	own convention
sg	singular	Leipzig Glossing Rules
tns	tense	own convention
top	topic	Leipzig Glossing Rules

Part-of-speech tagging

Part-of-speech tagging works similar to glossing. The users are instructed to adhere to the list of POS below. The tags correspond to abbreviations used commonly in linguistics.

Conventions

All POS tags are uppercase throughout.

List of POS tags

POS	Meaning	Example
ADJ	adjective	red
ADV	adverb	yesterday
ART	article	the, this, a
ASP	independent aspect particle
COMP	(subordinating) complementizer	that, whether, because
CONJ	(coordinating) conjunction	and, or, but
COP	copula	is
DEM	demonstrative	this, that, those, these
FOC	independent focus particle
N	noun (incl. proper names)	house, Adam
NEG	negation	not
P	preposition	in
PART	particle other than tense, focus, or aspect, or question particle
POSS	possessive pronoun	his
PRO	personal pronoun	he
Q	question particle, question tag	right?
REL	relative pronoun, relative marker	that, which
SW	sentence word	yes, no
TNS	independent tense particle
V	verb	slaughter
WH	wh-pronoun or determiner	which, who

Word and morpheme separation

Conventions

Focus particles are separate words throughout. They are not marked as suffixes to the words.
Elements that are clearly suffixes are separated by a hyphen from the stem (e.g. conjoint, disjoint marking).

Tone marking

Notation

High tone: acute accent, e.g. "á"
Low tone: grave accent, e.g. "à"
Mid tone: macron, e.g. "ā"

How the data are stored in the database

The database is xml-based, which means that the data are stored in xml format. The following gives an example of how an example (i.e. "datapoint") is stored:

<datapoint ID="Likpakpaanl-1">
      <language>Likpakpaanl</language>
      <dialect>--</dialect>
      <speaker>SA</speaker>
      <date>2021-10-28</date>
      <audio>audio</audio>
      <audiofile>Likpakpaanl-1.mp3</audiofile>
      <construction>
            <embedding>simple cl.</embedding>
            <type>declarative</type>
            <focus>no.focus</focus>
            <target>no.target</target>
            <transitivity>intransitive</transitivity>
            <aspect>perfective</aspect>
            <tense>past</tense>
            <polarity>affirmative</polarity>
            <tone>no.tone</tone>
      </construction>
      <example>
            <sentence>
                        <name/>
                        <judgment/>
                        <or_gloss>
                                    <word><or>Adam </or><gl>Adam </gl><pos>N </pos></word>
                                    <word><or>fe </or><gl>hest.pst </gl><pos>TNS </pos></word>
                                    <word><or>tun </or><gl>work </gl><pos>V </pos></word>
                                    <word><or>(fenna) </or><gl>yesterday </gl><pos>ADV </pos></word>
                        </or_gloss>
                        <translation>Adam worked yesterday.</translation>
            </sentence>
      </example>
</datapoint>

An example a.k.a. datapoint consists of three parts:

Some metadata: language, dialect, speaker, date of elicitation and possibly an audiofile
Information about the construction: see filters above.
The example itself: It can consist of multiple sentences (e.g. question-answer pair and/or grammatical and ungrammatical versions of the same sentence). A sentence consists of triples for each word: the original word in the respective language, the gloss and the POS tag. After this a translation is added.

Each datapoint has a unique key to identify it.

The database uses xml, xsl, php, javascript, html, and css.

Guide to database

Contents

Copyright disclaimer for the database content

Terms of use

How to cite

Origin of the data

How to search for data

Searching for words, glosses or POS tags

Search terms including or excluding tone marking

Finding only glosses or parts-of-speech (gl:x, pos:x)

Sequences of search terms (* = same datapoint, + = adjacent words)

Filtering the data for certain constructions

Language

Audio

Sentence embedding

Sentence type

Focus type

Wh/rel/foc element

Transitivity

Aspect

Tense

Polarity

Tone

Audiofiles

Glosses

Conventions

List of glosses

Part-of-speech tagging

Conventions

List of POS tags

Word and morpheme separation

Conventions

Tone marking

Notation

How the data are stored in the database

Menu