API Reference

This document describes the API of the pandasvalidation module.

Module for validating data with the library pandas.

exception pandasvalidation.ValidationWarning

Bases: Warning

pandasvalidation.mask_nonconvertible(series, to_datatype, datetime_format=None, exact_date=True)

Return a boolean same-sized object indicating whether values cannot be converted.

Parameters:
  • series (pandas.Series) – Values to check.
  • to_datatype (str) – Datatype to which values should be converted. Available values are ‘numeric’ and ‘datetime’.
  • datetime_format (str) – strftime to parse time, eg ‘%d/%m/%Y’, note that ‘%f’ will parse all the way up to nanoseconds. Optional.
  • exact_date (bool) –
    • If True (default), require an exact format match.
    • If False, allow the format to match anywhere in the target string.
pandasvalidation.to_datetime(arg, dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, coerce=None, unit='ns', infer_datetime_format=False)

Convert argument to datetime and set nonconvertible values to NaT.

This function calls to_datetime() with errors='coerce' and issues a warning if values cannot be converted.

pandasvalidation.to_numeric(arg)

Convert argument to numeric type and set nonconvertible values to NaN.

This function calls to_numeric() with errors='coerce' and issues a warning if values cannot be converted.

pandasvalidation.to_string(series, float_format='%g', datetime_format='%Y-%m-%d')

Convert values in a pandas Series to strings.

Parameters:
  • series (pandas.Series) – Values to convert.
  • float_format (str) – Format string for floating point number. Default: ‘%g’.
  • datetime_format (str) – Format string for datetime type. Default: ‘%Y-%m-%d’
Returns:

converted

Return type:

pandas.Series

pandasvalidation.validate_date(series, nullable=True, unique=False, min_date=None, max_date=None, return_type=None)

Validate a pandas Series with values of type datetime.date. Values of a different data type will be replaced with NaN prior to the validataion.

Parameters:
  • series (pandas.Series) – Values to validate.
  • nullable (bool) – If False, check for NaN values. Default: True.
  • unique (bool) – If True, check that values are unique. Default: False
  • min_date (datetime.date) – If defined, check for values before min_date. Optional.
  • max_date (datetime.date) – If defined, check for value later than max_date. Optional.
  • return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.
pandasvalidation.validate_datetime(series, nullable=True, unique=False, min_datetime=None, max_datetime=None, return_type=None)

Validate a pandas Series containing datetimes.

Deprecated since version 0.5.0: validate_datetime() will be removed in version 0.7.0. Use validate_date() or validate_timestamp() instead.

Parameters:
  • series (pandas.Series) – Values to validate.
  • nullable (bool) – If False, check for NaN values. Default: True.
  • unique (bool) – If True, check that values are unique. Default: False
  • min_datetime (str) – If defined, check for values before min_datetime. Optional.
  • max_datetime (str) – If defined, check for value later than max_datetime. Optional.
  • return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.
pandasvalidation.validate_numeric(series, nullable=True, unique=False, integer=False, min_value=None, max_value=None, return_type=None)

Validate a pandas Series containing numeric values.

Parameters:
  • series (pandas.Series) – Values to validate.
  • nullable (bool) – If False, check for NaN values. Default: True
  • unique (bool) – If True, check that values are unique. Default: False
  • integer (bool) – If True, check that values are integers. Default: False
  • min_value (int) – If defined, check for values below minimum. Optional.
  • max_value (int) – If defined, check for value above maximum. Optional.
  • return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.
pandasvalidation.validate_string(series, nullable=True, unique=False, min_length=None, max_length=None, case=None, newlines=True, trailing_whitespace=True, whitespace=True, matching_regex=None, non_matching_regex=None, whitelist=None, blacklist=None, return_type=None)

Validate a pandas Series with strings. Non-string values will be converted to strings prior to validation.

Parameters:
  • series (pandas.Series) – Values to validate.
  • nullable (bool) – If False, check for NaN values. Default: True.
  • unique (bool) – If True, check that values are unique. Default: False.
  • min_length (int) – If defined, check for strings shorter than minimum length. Optional.
  • max_length (int) – If defined, check for strings longer than maximum length. Optional.
  • case (str) – Check for a character case constraint. Available values are ‘lower’, ‘upper’ and ‘title’. Optional.
  • newlines (bool) – If False, check for newline characters. Default: True.
  • trailing_whitespace (bool) – If False, check for trailing whitespace. Default: True.
  • whitespace (bool) – If False, check for whitespace. Default: True.
  • matching_regex (str) – Check that strings matches some regular expression. Optional.
  • non_matching_regex (str) – Check that strings do not match some regular expression. Optional.
  • whitelist (list) – Check that values are in whitelist. Optional.
  • blacklist (list) – Check that values are not in blacklist. Optional.
  • return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.
pandasvalidation.validate_timestamp(series, nullable=True, unique=False, min_timestamp=None, max_timestamp=None, return_type=None)

Validate a pandas Series with values of type pandas.Timestamp. Values of a different data type will be replaced with NaT prior to the validataion.

Parameters:
  • series (pandas.Series) – Values to validate.
  • nullable (bool) – If False, check for NaN values. Default: True.
  • unique (bool) – If True, check that values are unique. Default: False
  • min_timestamp (pandas.Timestamp) – If defined, check for values before min_timestamp. Optional.
  • max_timestamp (pandas.Timestamp) – If defined, check for value later than max_timestamp. Optional.
  • return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.