Skip to contents

The Brown corpus in tabular format tokenized and pos-tagged as distributed on https://www.nltk.org/nltk_data/. Headings and sentence boundaries are currently not preserved.

Usage

brown

Format

A data frame with five variables: genre_id, doc_id, sentence_id, word, pos; and two string attributes: contents and readme

Details

For documentation, see http://korpus.uib.no/icame/brown/bcm.html. The the raw README and CONTENTS files are also included as attributes.

Examples


data(brown)
head(brown)
#>   genre_id doc_id sentence_id   word   pos
#> 1     news   ca01           1    The    at
#> 2     news   ca01           1 Fulton np-tl
#> 3     news   ca01           1 County nn-tl
#> 4     news   ca01           1  Grand jj-tl
#> 5     news   ca01           1   Jury nn-tl
#> 6     news   ca01           1   said   vbd

class(attr(brown, "README"))
#> [1] "NULL"