NLP Functions

nlp_get_entities

nlp_get_entities(text, label=None)

Extracts entities from natural language text.



Args:

    text (str): the text of interest

    label (str): filters for a specific kind of entity, such as PERSON or ORG. Defaults to None, which gets all entity types.



Returns:

    Returns a dictionary containing entities extracted from the text



Examples:

    nlp_get_entities('The Massachusetts Institute of Technology is a private research university in Cambridge, Massachusetts, United States.') ->



    {

      'entities': [

        {'char_pos': {'end': 41, 'start': 0},

         'entity': u'The Massachusetts Institute of Technology',

         'label': u'ORG',

         'word_pos': {'end': 5, 'start': 0}},

        {'char_pos': {'end': 87, 'start': 78},

         'entity': u'Cambridge',

         'label': u'GPE',

         'word_pos': {'end': 12, 'start': 11}},

        {'char_pos': {'end': 102, 'start': 89},

         'entity': u'Massachusetts',

         'label': u'GPE',

         'word_pos': {'end': 14, 'start': 13}},

        {'char_pos': {'end': 117, 'start': 104},

         'entity': u'United States',

         'label': u'GPE',

         'word_pos': {'end': 17, 'start': 15}}

      ],

      'status': 'OK'

    }

nlp_token_clean

nlp_token_clean(text, model=None, model_config=None)

Cleans a token according to the provided model.



Args:

    text (str): the token of interest

    model (str): the name of a valid token model

    model_config (map): A map of options to configure the model



Returns:

    The input token, cleaned according to the logic of the token model.



Examples:

    nlp_token_clean('20IB0A1B', model='matcher:only-digits') -> '20180418'

nlp_token_find

nlp_token_find(text, model=None, separator=None, tokenizer=None, model_config=None, tokenizer_config=None)

Tokenizes the input string and returns the best scoring token according to the provided matcher.



If given, will use specified tokenizer. Otherwise, will use the default

tokenizer specified in the tokenmatcher class. If no tokenizer is specified or

set as default, will use unigram tokenizer.



Args:

    text (str): the text assumed to contain the token of interest

    model (str): the name of a valid token matcher

    separator (str): the string on which to to split text into tokens

    tokenizer (str): the name of a valid tokenizer

    model_config (map): A map of options to configure the model

    tokenizer_config (map): A map of options to configure the tokenizer





Returns:

    The best scoring token according to the provided matcher logic.



Examples:

    nlp_token_find('Due on 20IB-0A-1B', model='matcher:only-digits', separator=' ') -> '20IB-0A-1B'

nlp_token_find_all

nlp_token_find_all(text, model=None, separator=None, threshold=None, tokenizer=None, model_config=None, tokenizer_config=None)

Tokenizes the input string and returns all tokens with score above

threshold, according to the provided matcher.



If given, will use specified tokenizer. Otherwise, will use the default

tokenizer specified in the tokenmatcher class. If no tokenizer is specified or

set as default, will use unigram tokenizer.



Args:

    text (str): the text assumed to contain the token of interest

    model (str): the name of a valid token matcher

    separator (str): the string on which to to split text into tokens

    threshold (float): the threshold for determining whether token fits the

      model, default=0.8.

    tokenizer (str): the name of a valid tokenizer

    model_config (map): A map of options to configure the model

    tokenizer_config (map): A map of options to configure the tokenizer



Returns:

    The best scoring token according to the provided matcher logic.



Examples:

    nlp_token_find_all('ID: 20I80A1B', model='matcher:only-digits', separator=' ') -> ['20I80A1B']

nlp_token_score

nlp_token_score(text, model=None, model_config=None)

Scores a token from 0 to 1.0 according to the provided matcher.



Args:

    text (str): the token of interest

    model (str): the name of a valid token matcher

    model_config (map): A map of options to configure the model



Returns:

    A score for the input token, from 0 to 1.0, according to the logic

    of the token matcher.



Examples:

    nlp_token_score('20IB0A1B', model='matcher:only-digits') -> 0.75

nlp_token_select

nlp_token_select(*args: Any)

Returns the best scoring token, among provided inputs, according to the provided matcher.



Args:

    args: dict containing:

        text1 .. textN (str): the tokens of interest.

        model (str): the name of a valid token matcher

        model_config (map): A map of options to configure the model



Returns:

    The best scoring token according to the provided matcher logic.



Examples:

    nlp_token_select('20IB-0A-1B', '2018-01-20', model='matcher:only-digits') -> '2018-01-20'