API Reference¶
This document describes the API of the pandasvalidation module.
Module for validating data with the library pandas.
-
pandasvalidation.
mask_nonconvertible
(series, to_datatype, datetime_format=None, exact_date=True)¶ Return a boolean same-sized object indicating whether values cannot be converted.
Parameters: - series (pandas.Series) – Values to check.
- to_datatype (str) – Datatype to which values should be converted. Available values are ‘numeric’ and ‘datetime’.
- datetime_format (str) – strftime to parse time, eg ‘%d/%m/%Y’, note that ‘%f’ will parse all the way up to nanoseconds. Optional.
- exact_date (bool) –
- If True (default), require an exact format match.
- If False, allow the format to match anywhere in the target string.
-
pandasvalidation.
to_datetime
(arg, dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, coerce=None, unit='ns', infer_datetime_format=False)¶ Convert argument to datetime and set nonconvertible values to NaT.
This function calls
to_datetime()
witherrors='coerce'
and issues a warning if values cannot be converted.
-
pandasvalidation.
to_numeric
(arg)¶ Convert argument to numeric type and set nonconvertible values to NaN.
This function calls
to_numeric()
witherrors='coerce'
and issues a warning if values cannot be converted.
-
pandasvalidation.
to_string
(series, float_format='%g', datetime_format='%Y-%m-%d')¶ Convert values in a pandas Series to strings.
Parameters: - series (pandas.Series) – Values to convert.
- float_format (str) – Format string for floating point number. Default: ‘%g’.
- datetime_format (str) – Format string for datetime type. Default: ‘%Y-%m-%d’
Returns: converted
Return type:
-
pandasvalidation.
validate_date
(series, nullable=True, unique=False, min_date=None, max_date=None, return_type=None)¶ Validate a pandas Series with values of type datetime.date. Values of a different data type will be replaced with NaN prior to the validataion.
Parameters: - series (pandas.Series) – Values to validate.
- nullable (bool) – If False, check for NaN values. Default: True.
- unique (bool) – If True, check that values are unique. Default: False
- min_date (datetime.date) – If defined, check for values before min_date. Optional.
- max_date (datetime.date) – If defined, check for value later than max_date. Optional.
- return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.
-
pandasvalidation.
validate_datetime
(series, nullable=True, unique=False, min_datetime=None, max_datetime=None, return_type=None)¶ Validate a pandas Series containing datetimes.
Deprecated since version 0.5.0: validate_datetime() will be removed in version 0.7.0. Use validate_date() or validate_timestamp() instead.
Parameters: - series (pandas.Series) – Values to validate.
- nullable (bool) – If False, check for NaN values. Default: True.
- unique (bool) – If True, check that values are unique. Default: False
- min_datetime (str) – If defined, check for values before min_datetime. Optional.
- max_datetime (str) – If defined, check for value later than max_datetime. Optional.
- return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.
-
pandasvalidation.
validate_numeric
(series, nullable=True, unique=False, integer=False, min_value=None, max_value=None, return_type=None)¶ Validate a pandas Series containing numeric values.
Parameters: - series (pandas.Series) – Values to validate.
- nullable (bool) – If False, check for NaN values. Default: True
- unique (bool) – If True, check that values are unique. Default: False
- integer (bool) – If True, check that values are integers. Default: False
- min_value (int) – If defined, check for values below minimum. Optional.
- max_value (int) – If defined, check for value above maximum. Optional.
- return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.
-
pandasvalidation.
validate_string
(series, nullable=True, unique=False, min_length=None, max_length=None, case=None, newlines=True, trailing_whitespace=True, whitespace=True, matching_regex=None, non_matching_regex=None, whitelist=None, blacklist=None, return_type=None)¶ Validate a pandas Series with strings. Non-string values will be converted to strings prior to validation.
Parameters: - series (pandas.Series) – Values to validate.
- nullable (bool) – If False, check for NaN values. Default: True.
- unique (bool) – If True, check that values are unique. Default: False.
- min_length (int) – If defined, check for strings shorter than minimum length. Optional.
- max_length (int) – If defined, check for strings longer than maximum length. Optional.
- case (str) – Check for a character case constraint. Available values are ‘lower’, ‘upper’ and ‘title’. Optional.
- newlines (bool) – If False, check for newline characters. Default: True.
- trailing_whitespace (bool) – If False, check for trailing whitespace. Default: True.
- whitespace (bool) – If False, check for whitespace. Default: True.
- matching_regex (str) – Check that strings matches some regular expression. Optional.
- non_matching_regex (str) – Check that strings do not match some regular expression. Optional.
- whitelist (list) – Check that values are in whitelist. Optional.
- blacklist (list) – Check that values are not in blacklist. Optional.
- return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.
-
pandasvalidation.
validate_timestamp
(series, nullable=True, unique=False, min_timestamp=None, max_timestamp=None, return_type=None)¶ Validate a pandas Series with values of type pandas.Timestamp. Values of a different data type will be replaced with NaT prior to the validataion.
Parameters: - series (pandas.Series) – Values to validate.
- nullable (bool) – If False, check for NaN values. Default: True.
- unique (bool) – If True, check that values are unique. Default: False
- min_timestamp (pandas.Timestamp) – If defined, check for values before min_timestamp. Optional.
- max_timestamp (pandas.Timestamp) – If defined, check for value later than max_timestamp. Optional.
- return_type (str) – Kind of data object to return; ‘mask_series’, ‘mask_frame’ or ‘values’. Default: None.