linguistic research – Mark Norris

October 15, 2021January 29, 2023

Reflections on research after academic jobs

When I left the job where I was paid to do linguistic research (among other things), I told myself that I was going to continue to do research (i) as long as I had time for it and (ii) as long as I found it fun (or you know, as long as I wanted to). I anticipated that within 2-5 years, the sun would set on my ability to contribute new research to generative and typological linguistics. I left my academic job just about 2.5 years ago (in May 2019), and I have been working in industry for almost 11 months. As I continue to develop roots in this next phase of my working life, these thoughts have been creeping back into my mind.

Between deciding to leave and actually leaving: anticipatory grieving

Some post-academic folks I speak to seem to have little love lost over their academic research interests, but this was not me. I felt very sad about saying goodbye to those things. In particular, I recall feeling saddest about the changing relationship to Estonian, my primary research language. Leaving my academic post meant saying goodbye to biannual trips to Estonia (for fieldwork and swimming), and it meant less occasion to contact the friends who taught me about their language. I might say that I was worried I would miss Estonian and Estonia so much that I would regret my choice to leave my academic job, but really I think I was just sad that things were going to change.

In the middle: sometimes a source of comfort and purpose and sometimes a distraction

My transition to industry took longer than I expected: 14 months after I relocated to SF (11-12 of those spent actually searching), I got my first industry offer. I have shared this before, but it bears repeating (just so people know): I experienced some of the lowest/darkest moments of my life during that time. Searching for jobs ALWAYS sucks, and so does trying to make a career transition. During this period, doing a little bit of linguistic research would help temper the feeling of purposelessness I often felt. For example, for a few months, I had a weekly reading group with Ruth Kramer, who has been both a dear friend and research advisor essentially since we met in 2008. I spent maybe 20-30% of my “working time” doing linguistics, just because it gave me something to do that I felt like I knew how to do.

There were other times where doing linguistics felt like an indulgence. It felt like it wasn’t quite the thing I was “supposed to do” in order to make myself more competitive for a job. In retrospect, it wasn’t that doing linguistics was EITHER helpful for me or a distraction; it’s that sometimes I needed it, and sometimes I didn’t.

Now completely in industry: I liked it then and I like it still, but I don’t regret my choice

I’m now just over 6 months into my first permanent position at industry, working on problems I find interesting with a team of people I truly enjoy. I did actually have some research output over the last 12 months, and (to my surprise, honestly) I was recently invited to give a colloquium and contribute to another handbook, so I think it’s clear that I’m still doing research at this point. BUT GOODNESS, it’s even harder to make time for it now! After I wrapped up my joint paper with Kyle Mahowald and Dan Jurafsky, I didn’t have any research deadlines, and weeks without doing linguistics passed by before I realized. It’s not that I no longer enjoy it, it’s just that it’s one of many things I enjoy that I have to use time outside of work to enjoy.

This made me think about how I felt before I left: would I miss it? Would I be sad about letting these things go? At this point in my post-academic life, it seems the answer is “No.” That could be because I’ve let go gradually. It could also be because I still haven’t completely let go— I have a handbook chapter that is still set to come out (handbooks are… slow) and I was just asked to contribute to a different handbook (hello, deadline). But I think a large part of it is (i) I’ve had space to move on and (ii) I have a new career that is providing plenty of intellectual stimulation. AGAIN, I must stress that this doesn’t mean I didn’t like it then or don’t like it anymore! I’m still very happy I spent 11 years of my working life dedicated to linguistics teaching and research. I’m also happy about learning to do new language-related things!

Future: What’s actually worth my investment? Can I walk away from unanswered questions?

Since leaving, I have realized that even if I continue to do theoretical/typological research when it is no longer part of my job description, it will not look the same as it did before. There were many research-related activities I did when I was a professor:

Read theoretical papers: both to stay current and to try to find inspiration when solving a particular puzzle
Write papers: to share knowledge and proposals in a permanent form
Present at conferences: to share knowledge and proposals
Give invited talks: both colloquia and working group talks
Review articles: if I’m going to keep writing, I should keep reviewing

This is a lot of tasks! And realistically, on a busy research week, I probably can spend about 5 hours on this. Deadlines have become significantly more motivating than they were in the past. I have been able to complete necessary work and not much else. For example, I have had to be more selective about reviewing, and I barely read enough to support my own projects. Forget about staying current!

At some point, it will be time to effectively stop. I have a pipe dream of writing a book and just making it available, published or not. There are too many things I’ve learned—especially about concord—to just leave them in my brain. There are questions that I want to know the answers to, and if I don’t find the answers to these questions, then I don’t get to know what they are (because either nobody else will, or they they won’t tell me if they do). Trying to get all of my knowledge on paper is one way to possibly avoid that, but I also think I will have to leave some of these questions unanswered. I suppose that’s also just part of moving on from jobs more generally— letting go of in-progress work.

January 17, 2021January 17, 2021

Case and K

Before you read too far, let me issue this disclaimer: when I say case in this blogpost, I am never talking about syntactic Case with a capital C.

If you ask anybody who works on generative nominal morphosyntax where case is, my guess is that most of them will bring up KP, a head that (most of the time) is assumed to take a DP complement. The earliest citation I’m aware of for KP is Lamontagne & Travis (1987) (see also Travis & Lamontagne, 1992), but since as early as 2005, people have been using KP without citation. I think many/most NP generative syntax folks would not disagree with a statement like, “KP is the location of case features,” but there are in fact very few works that carefully explore the connection between K and case morphemes.

Case = K: Case particles

I can’t get too in the weeds with this (b/c blog), but here’s one example where the connection between case and K is brought up. At the beginning of their paper which is mostly about nominal licensing (but does use KP), Bittner and Hale (1996:4) suggest that the order of case particles and nominal phrases tracks that of verb and object.

In Mískito (ISO miq; Misumalpan, Honduras/Nicaragua), verbs follow objects and case particles follow NPs (Bittner and Hale, 1996:4).

In Khasi (ISO kha; Austroasiatic, Bangladesh/India), verbs precede objects and case particles precede NPs (Bittner and Hale, 1996:4).

But they do not cite or report on the results of a typological study. And Dryer’s sample of case affixes does not include case particles. However, the border between adpositions and case is fuzzy, and since adpositions also closely track VO order, I expect it’s true that case particles do likewise (to the extent that a border between case particles and adpositions can be established).

Case ≠ K: Case concord

The mapping between case and K is clearest in these case particle languages, because there is one case morpheme and one syntactic locus (and 1, as they say, = 1). There are languages with case multiple times per NP (languages with case concord), and here it is not clear what the connection is between K and case. Take, for example, Estonian:

In Estonian (ISO ekk; Uralic, Estonian), case is marked on many of the words inside NP. In this example, inessive case -s appears on each word (Norris, 2018: 539)

In my NLLT paper (and in my other work on case concord), I do treat case as originating on K in some sense, but I do not specify how the K head itself is realized. I believe the same is true for Ingason (2016), who discusses case concord in Icelandic (hmm, where have I heard of that before?). When people would ask me about this when I was in grad school, I would provide a joke answer of, “Oh, I like to pretend the K head explodes and rains down its pieces on the heads below,” but I did not have a real answer. The only plausible answer (that I don’t anticipate working out any time soon) is that K is realized as case on the noun.

There are some other approaches, too, like treating case as a feature assigned to a phrase rather than originating on a head (Baker and Kramer, 2014:148) or inserted as postsyntactic morphemes (eg, Embick and Noyer, 2001).

Case ≟ K: other case suffixes

The missing piece of this investigation imo are languages with case suffixes but no case concord (or at least, no robust case concord). I have wondered: in a language with no case concord but a case suffix on N, how does case end up on N? Sometimes, nothing special needs to be said. In an N-final language like Turkish, case could be a suffix in K and just end up landing on N because they happen to be adjacent. So I went looking in my concord sample for languages that were [-Nfinal, +case, -case concord] to see where the case morpheme ended up.

A number of these languages are coded by Dryer as having postpositional clitics—the case marker ends up on whatever word is last in the NP (or perhaps there are some restrictions, but the idea is that case can attach to a variety of bases). There were also a number of languages reported as having case suffixes, but when I looked more closely, I found that in fact many of these languages might actually have postpositional clitics instead. I only say “many” because I don’t have clear data for some of them, but importantly: I don’t have any examples showing a bound case formative on a non-final N in a language without case concord. (!!) What! Here are a couple examples to show what I mean

Yuchi (ISO yuc; isolate, North America)

Yuchi is coded as having case suffixes, but in Mary Linn’s grammar of the language, I found a couple examples where the case morpheme was not on the noun, but on its modifier. This would be evidence for labeling Yuchi case as a postpositional clitic. (Dryer’s data come from a different source for Yuchi, so I can’t say for sure what the discrepancy is.)

From Linn’s grammar of Yuchi, the locative case marker ‘-le’ meaning “back to” attaches to a numeral, not the noun.

Fur (ISO fvr; Fur (controversially Nilo-Saharan), CAF/Chad/Sudan)

Fur is also characterized as having case suffixes, but in examples from Tucker & Bryan (1966), the case marker attaches after postnominal adjectives.

From Tucker & Bryan (1966), but alas, I don’t have a page number! Look at the second sentence where “-si” attaches to “futa.” In the third row, we again see a case marker (this time “-ŋ”) attaching to postnominal “futa”.

Again, if these examples are representative (and, of course, assuming it’s reasonable to treat adjectives as different from nouns in Fur), then these are more like postpositional clitics, too.

Hang on, I’ve lost the thread.

Blogs are hard. The point is this: Nobody has (to my knowledge) a worked out demonstration of what needs to be said to maintain that case morphemes are connected to a high head in nominal phrases (“call it K, if you like”). There should be one— or there should be something talking about why that can’t work. I say that because there is such interesting work on gender and number in this domain. Why not case? I guess it could be because case has a more indirect relationship to the noun and thus has less noun-related idiosyncrasy, by and large. Or it could be that in many languages, cases are just tiny and/or dependent adpositions, and there’s not a lot of morphology to adpositions generally.

If I wanted to take the time to re-write this (not really how blog posts work), it might look like this:

Are case formatives realizations of K?
Easiest stuff: case particles
Pretty easy stuff: peripheral case affixes/clitics in N-final languages
Pretty hard stuff: Case concord
Pretty does it exist stuff: that puppy-ACC fuzzy is a pattern we don’t expect and importantly, we don’t see it (very often)

Somebody get into case formatives! End of blog post.

December 27, 2020

Kinds of hybrid agreement and analyses thereof

How is a linguistics blog post different from a linguistics article? I think one key thing is that they’re short. So ommina try to keep this short!

Hybrid agreement (in gender)

The example below demonstrates the complexities of what is starting-to-be-standardly called “Hybrid Agreement.”

BCS hybrid agreement: some words are masculine, other words are feminine

What’s particularly of note is that the adjective stare ‘old’ is feminine, but the demonstrative ovi ‘these’ is masculine. Note there is optionality here— the demonstrative could also be feminine. Hybrid agreement has been front and center in the debate around the headedness of nominal phrases. Salzmann (2018) argues on the basis of hybrid agreement that NPs cannot be headed by N, and Bruening (2020) reanalyzes the data in a framework where N is the head. I’m not going to recapitulate the discussion here (because this is a blog post!), but there are some key properties of this pattern in BCS (as well as the non-BCS patterns of hybrid agreement that are sometimes discussed, e.g., by Landau (2016)):

Lexical: only certain lexical items show this hybrid behavior
Construction-general: This hybrid behavior shows up in a variety of syntactic contexts (eg, NP internal, verbs, pronouns)
Optionality/variation: Hybrid agreement occurs “optionally”, which I use to here to mean “presence of identifiable hybrid agreement is not required for grammaticality.”

Because of these properties, the debates around hybrid agreement have always involved the question of how much information is encoded in the lexical representation of a noun. From the seminal monograph by Wechsler and Zlatić (2003) to Bruening’s (2020) update of the broad strokes of that approach, capturing hybrid agreement via additional lexical information explains (or some other word if you don’t like “explains” here) the three properties in the following ways.

Lexical: lexical information is known to vary from word to word. If hybrid behavior is lexically-encoded, we expect it to be localized to certain lexical items but not others.
Construction-general: Lexical properties are most compelling when they are not affected by the syntactic contexts in which they appear (that’s why they’re lexical). If hybrid behavior is lexically encoded, we expect that hybrid agreement would be visible in many syntactic constructions.
Optionality/variation: The two parts of hybrid agreement—e.g., masculine and feminine features in the case of this BCS pattern—are not encoded in exactly the same way. We expect to see different behavior (or it’s at least not a surprise to see it) because of how processes access lexical information (e.g., what kinds of encoding they pay attention to). This can result in surface variation or optionality.

Finnish/Estonian hybrid agreement in number

In Finnish and Estonian (and possibly other Finnic languages where the patterns are not well documented), another kind of hybrid agreement pattern occurs.

Estonian hybrid agreement: some words are singular, other words are plural

The catalyst for this hybrid agreement is a numeral (anything other than `one’). Material to the right of the numeral is singular in form, and material to the left is plural in form. The numeral itself is also singular in form (yes, numerals in Finnish and Estonian clearly distinguish plural and singular forms, see my LSA paper for some examples and references). There is also a case distinction here, but only sometimes, and I’m not going to talk about it, since this is my blog and I will not be entertaining a lexical treatment of case in this post or ever. But Finnic hybrid number is rather different from the more well beaten paths of hybrid gender.

Not lexical: nearly every noun that can be counted in Finnish/Estonian exhibits this number split. The exceptions that exist are in fact nouns which exceptionally do not show hybrid agreement—they’re plural on both sides (see my LSA paper on this, for example).
Construction-specific: this is a property of numeral-noun constructions and numeral- noun constructions only (or, if you twist my arm, fine, we could just say it’s in vaguely non- universal quantificational contexts). We do not see this number split in other areas (e.g., not in simple NPs).
Obligatory: As far as I know, this property of Estonian and Finnish is fully obligatory. It is ungrammatical to count plural nouns with singular numerals, and to the best of my knowledge, it is ungrammatical (if not completely, then very nearly so) to use a singular demonstrative in a numeral-noun construction with a numeral ≠ ‘one’. Perhaps a rigorous corpus study would reveal examples in some corner of the data, but my own fieldwork and the normative grammars certainly suggest that the only option is a plural demonstrative.

And just like that, the blog post is over.

Well, this is already verging on too long for a blog post, so let me try to concisely say what the point is. The Finnish/Estonian form of hybrid agreement is not a lexical pattern. (Landau (2016) actually does touch briefly on Finnish in his excellent work on hybrid agreement, but as I discuss in my LSA paper, the analysis is really only sketched. And anyway, Landau’s analysis of these patterns is also not actually lexical.) It’s thus not obvious how the lexicalist analyses of hybrid agreement—which I have not discussed in detail, because this is a blog post—can generalize to the Finnish/Estonian form of hybrid agreement. I have a 3/4 (ha! Hybrid agreement joke) completed squib on the topic—posting this in part to make sure I’m not missing any obvious beeves.

Of course, “well just because you call them the same thing does not mean they’re the same thing.” I’m not saying the Finnish /Estonian pattern and the BCS (etc.) pattern must have the same analysis because both can be called “hybrid agreement.” But I am saying that the Finnish/Estonian pattern must have an analysis. If your analysis of BCS (etc) hybrid agreement is part of a bigger point about the architecture of the grammar, then I contend it is important to consider how Finnish/Estonian fit into that architecture, too, now that you know the pattern exists.

August 21, 2020August 26, 2020

Conctypo: What, why, and how

In the past year or so since leaving my academic position and relocating to San Francisco, I have spent my “work time” learning and doing different things. Some of that has been developing my technical skills, and some of it has been continuing the research program I developed while in academia. In particular, nominal concord continues to be an obsession of mine, and I still have unanswered questions that I think nobody will find the answers to if not me. The most satisfying result is when I’m able to marry these two pursuits by using my increased technical skills to improve my research effectiveness. Today, I’m going to introduce the research project I’ve spent the most time with, my typological sample of nominal concord, aka Conctypo.

I’ve debated whether to kick off this series from the very beginning, but I decided instead to start where I am right now. I’ll dig into the past in some subsequent posts.

What is nominal concord and what is Conctypo?

If you’ve studied a European language before, you’ve probably encountered the phenomenon that I (and others) call nominal concord. For example, in the Spanish phrases la casa blanca ‘the white house’ and el edificio blanco ‘the white building’, the words for ‘the’ (la/el) and ‘white’ (blanca/blanco) change their form based on the noun that they modify. In this instance, it’s because the noun casa ‘house’ is feminine and the noun edificio ‘building’ is masculine. This is an example of nominal concord.

A schematic representation of nominal concord. There are orange-colored lines connecting the feminine noun casa 'house' to its modifiers. There are green-colored lines connecting the masculine noun edificio 'building' to its modifiers. — A graphical representation of nominal concord for gender in Spanish

More technically, nominal concord is the agreement process in language whereby modifiers of a noun (eg, adjectives, numerals, or demonstratives) must match the noun they modify in particular features (eg, gender, number, or case). Nominal concord is a well-known process in linguistics, perhaps due to the fact that it is widespread in Indo-European languages. But it exists outside of the range of Indo-European languages: it is found on all 6 inhabited continents (sorry, Antarctica, but you don’t count as inhabited).

Conctypo is a typological sample of nominal concord in the world’s languages. As of this writing, I (with the help of research assistants while I was at OU) have collected data on 244 languages. The first time I presented about the project was at the LSA meeting in 2019. The paper and entire data set (including only 174 languages for better genetic/geographic balance) is available in the SHAREOK archive here: A typological perspective on nominal concord. Since then, I have stopped managing the data with spreadsheets and now store the information in JSON files. All that I have to do to update the database is add the JSON files for the new languages and run the Python scripts I’ve written to pull relevant numbers. But more on that in a later post!

Why build this typological database?

While there are many broad tendencies in language structure—go tool around WALS if you never have—languages also have plenty of idiosyncratic properties. When devising models of language structure, a reasonable approach (to my mind) would be to use common properties as the foundations of the theory. In order to build a theory of nominal concord, we would need an understanding of what the common properties of nominal concord are. That’s where Conctypo comes in— the cross-linguistic sample can tell us what is common in concord systems. In turn, when looking at the concord system of a particular language, we can correctly identify idiosyncratic properties as idiosyncratic (instead of mistaking them for plausibly general properties of concord).

A map of the world with dots scattered throughout. The green dots show languages with nominal concord. The gray dots show languages without nominal concord. — A map (image) showing languages with concord (green dots) and languages without (gray dots). Made in R with the lingtypology package. See this tweet thread for more concord map images.

To put a finer point on this, let me discuss Indo-European briefly. Often, when a researcher brings up nominal concord, they use data from an Indo-European language to highlight its behavior. Concord systems in other languages are compared to Indo-European systems, with the implicit assumption that Indo-European systems are normal or common. Yet without a cross-linguistic understanding of concord systems, we can’t be sure this is true! Concord in Indo-European languages is robust and regular; commonly, gender and number are represented on nearly every word modifying a noun. But concord in the world’s languages could be more sporadic. It could involve, for example, number on some words and gender on other words.

A schematic representation of two kinds of concord systems. — Schematic representations of concord systems. On top, a system like Spanish, where both the article and the adjective must agree with the noun’s feminine and plural features. On the bottom, a hypothetical system where the article only agrees in gender and the adjective only agrees in number.

In this world, Indo-European concord would be perhaps overzealous. The only way to know—well, the only way to feel more assured—is to go to the data.

At the time I started gathering data, I knew of no other typological investigation of nominal concord. Thus, if I wanted to know the answers to these questions, I had to find them myself. After about a year and a half, I found the work of Ranko Matasović and İsa Kerem Bayırlı, who have collected their own concord or concord-related typological samples. We do not all document the same properties, though, so the more, the merrier!

How did we collect the data?

We look for linguistic examples on three different kinds of words:

Demonstratives: words like this/these or that/those
Cardinal numerals greater than ‘one’: number words like two, three, or seven. We specifically avoid ordinal numerals like second, third, or seventh as these often behave like adjectives. We also avoid one because it shows idiosyncratic behaviors in some languages—the goal was to try as much as possible to look at numerals as a distinct category.
Adjectives: words like green, tall, old, etc. This can get tricky as some languages lack a clearly defined adjective class.

To find the examples needed, we look in published sources, including PhD dissertations. Ideally, this would be a grammar, i.e., a reference guide to the linguistic properties of the language. Failing a suitable grammar, we will use other types of writing (ideally published, but for some languages, the only available material may be unpublished). We look through the grammar to find suitable attested examples, where “suitable” means something like If the language had concord, we would be able to see it in this example. I take pictures, take screenshots, or copy the text of the example and save it in a Google Doc for archival purposes (and in case I ever want to check my work).

Linguistic examples showing adjective concord for number in the Pondi language. — *In Pondi (Ulmapo; Papua New Guinea), adjectives show concord in number (Barlow, 2020:77)*

Once I have finished documenting all three word classes, I can update the database. I wrote a program in Python that asks me the requisite questions and then creates a properly formatted JSON file and saves it in the proper place (more on this program later!).

And the work continues…

My work on this project continues in the form of data collection, computational streamlining, and pursuing theoretical implications. Until next time!