Language use and interactions on social media are geographically biased. In this work
we utilise this bias in predictive models of user geolocation and lexical dialectology.
User geolocation is an important component of applications such as personalised
search and recommendation systems. We propose text-based and network-based
geolocation models, and compare them over benchmark datasets yielding state-of-the-
art performance. We also propose hybrid and joint text and network geolocation models
that improve upon text or network only models and show that the joint models are able
to achieve reasonable performance in minimal supervision scenarios, as often happens
in real world datasets. Finally, we also propose the use of continuous representations
of location, which enables regression modelling of geolocation and lexical dialectology.
We show that our proposed data-driven lexical dialectology model provides qualitative
insights in studying geographical lexical variation.