postsjilo.blogg.se

Coherence meaning for nlp
Coherence meaning for nlp









In this paper, we address these challenges through the lens of theories of discourse coherence (Phillips, 1977 Hobbs, 1979, 1985 Asher and Lascarides, 2003). In these two unrelated image–text pairs, the images provide explanations for what is described in the text. People combine text and images creatively to communicate. How good then are large vision–language models at capturing the varied implicit generalizations and relationships that connect text and imagery? How robust to this variation are machine learning approaches to using natural language as a supervision signal for multimodal inference? Can we design models that better understand and reason about these inferential links? More generally, what concepts and methods are needed for AI researchers to explore such questions in precise and effective ways?

coherence meaning for nlp

Though such text is “grounded” in imagery only in a very abstract way, such inferences seem like common-sense to human readers. In Figure 1, for example, we see image–text pairs where the image depicts the reason for the text contribution-the depicted traffic is why the author will be late the possibility of encountering bears in nature is why repellent is needed. ( 2021), involve diverse and surprising juxtapositions. In fact, even the basic data points, “image–text pairs” scraped from the web, as in Radford et al. The rapid progress in multimodal AI brings new urgency to the challenge of better understanding the data, tasks, model architectures, and performance metrics in the field.

coherence meaning for nlp

At the same time, this heterogeneity has empowered AI researchers to compile vast multimodal datasets (e.g., Sharma et al., 2018) and to build large scale “foundation” models (e.g., Lu et al., 2019 Radford et al., 2021) trained to capture cross-modal patterns and make cross-modal predictions.Īpplications of such models, like the DALL-E system for synthesizing imagery from text (Ramesh et al., 2021), have captured the imagination of researchers and the public alike and serve as high-profile examples of the ability of representation learning to drive surprisingly rich AI capabilities. Faced with the heterogeneity of online information, Artificial Intelligence (AI) researchers have increasingly characterized problems of information access from the perspective of multimodality: for example, producing text captions that make visual information more accessible (e.g., Lin et al., 2014 Young et al., 2014) or taking both text and image content into account in information retrieval (e.g., Funaki and Nakayama, 2015 Chowdhury et al., 2019). The internet has become a multimodal information ecosystem, where units of content-news articles, web pages, posts to social media-regularly tie together written words, emoji and other icons, static and dynamic imagery, and links to yet more multimodal content.

coherence meaning for nlp

To argue this, we review case studies describing coherence in image–text data sets, predicting coherence from few-shot annotations, and coherence models of image–text tasks such as caption generation and caption evaluation. The advantage of coherence theory is that it provides a simple, robust, and effective abstraction of communicative goals for practical applications. Text accompanying an image may, for example, characterize what's visible in the image, explain how the image was obtained, offer the author's appraisal of or reaction to the depicted situation, and so forth. In contrast to alternative frameworks that characterize image–text presentations in terms of the priority, relevance, or overlap of information across modalities, coherence theory postulates that each unit of a discourse stands in specific pragmatic relations to other parts of the discourse, with each relation involving its own information goals and inferential connections. In this paper, we show how image–text coherence relations can be used to model the pragmatics of image–text presentations in AI systems. Human communication often combines imagery and text into integrated presentations, especially online.











Coherence meaning for nlp