ProCoDA Parser

aguaclara.research.procoda_parser.column_of_data(path, start, column, end=None, units='')[source]

This function extracts a column of data from a ProCoDA data file.

Note: Column 0 is time. The first data column is column 1.

Parameters:
  • path (string) – The file path of the ProCoDA data file

  • start (int) – Index of first row of data to extract, inclusive

  • end (int, optional) – Index of last row of data to extract until, exclusive. Defaults to extracting all rows.

  • column (int or string) – Index or label of the column that you want to extract

  • units (string, optional) – The units you want to apply to the data, e.g. ‘mg/L’. Defaults to “” (dimensionless).

Returns:

The column of data

Return type:

numpy.ndarray in units of [units]

Examples:

data = column_of_data("Reactor_data.txt", 0, 1, -1, "mg/L")
aguaclara.research.procoda_parser.column_of_time(path, start, end=None, units='day')[source]

This function extracts the column of times as elasped times from a ProCoDA data file.

Parameters:
  • path (string) – The file path of the ProCoDA data file.

  • start (int) – Index of first row of data to extract from the data file

  • end (int) – Index of last row of data to extract from the data. Defaults to last row

  • units (string, optional) – The return type units, which defaults to day.

Returns:

Experimental times starting at 0

Return type:

numpy.ndarray in units of days or hours, specified with units parameter

Examples:

time = column_of_time("Reactor_data.txt", 0)
aguaclara.research.procoda_parser.plot_columns(path, columns, x_axis=None)[source]

Plot columns of data, located by labels, in the given data file.

Parameters:
  • path (string) – The file path of the ProCoDA data file

  • columns (string or string list) – A single column label or list of column labels

  • x_axis (string, optional) – The label of the x-axis column (defaults to None)

Returns:

A list of Line2D objects representing the plotted data

Return type:

matplotlib.lines.Line2D list

aguaclara.research.procoda_parser.iplot_columns(path, columns, x_axis=None)[source]

Plot columns of data, located by indexes, in the given data file.

Parameters:
  • path (string) – The file path of the ProCoDA data file

  • columns (int or int list) – A single column index or list of column indexes

  • x_axis (int, optional) – The index of the x-axis column (defaults to None)

  • sep (string) – The separator or delimiter, of the data file. Use ‘,’ for CSV’s, ‘ ‘ for TSV’s.

Returns:

a list of Line2D objects representing the plotted data

Return type:

matplotlib.lines.Line2D list

aguaclara.research.procoda_parser.notes(path)[source]

This function extracts any experimental notes from a ProCoDA data file. Use this to identify the section of the data file that you want to extract.

Parameters:

path (string) – The file path of the ProCoDA data file.

Returns:

The rows of the data file that contain text notes inserted during the experiment.

Return type:

pandas.Dataframe

aguaclara.research.procoda_parser.remove_notes(data)[source]

Omit notes from a DataFrame object, where notes are identified as rows with non-numerical entries in the first column.

Parameters:

data (Pandas.DataFrame) – DataFrame object to remove notes from

Returns:

DataFrame object with no notes

Return type:

Pandas.DataFrame

aguaclara.research.procoda_parser.get_data_by_time(path, columns, dates, start_time='00:00', end_time='23:59', extension='.tsv', units='', elapsed=False)[source]

Extract columns of data over one or more ProCoDA data files based on date and time. Valid only for files whose names are automatically generated by date, i.e. of the form “datalog_M-D-YYYY”.

Note: Column 0 is time. The first data column is column 1. Results for the time column are adjusted for multi-day experiments.

Parameters:
  • path (string) – The path to the folder containing the ProCoDA data file(s)

  • columns (int or int list) – A single column index or a list of column indexes

  • dates (string or string list) – A single date or list of dates, formatted “M-D-YYYY”

  • start_time (string, optional) – Starting time of data to extract, formatted ‘HH:MM’ (24-hour time)

  • end_time (string, optional) – Ending time of data to extract, formatted ‘HH:MM’ (24-hour time)

  • extension (string, optional) – File extension of the data file(s). Defaults to ‘.tsv’

  • units (string, optional) – A single unit or list of units to apply to each column, e.g. ‘mg/L’ or [‘hr’, ‘mg/L’]. Defaults to ‘’ (dimensionless).

  • elapsed (boolean) – If true, results for the time column are given in elapsed time

Returns:

the single column of data or a list of the columns of data (in the order of the indexes given in the columns variable)

Return type:

1D or 2D float list

Examples:

data = get_data_by_time(path='/Users/.../ProCoDA Data/', columns=4, dates=['6-14-2018', '6-15-2018'], start_time='12:20', end_time='10:50')
data = get_data_by_time(path='/Users/.../ProCoDA Data/', columns=[0,4], dates='6-14-2018', start_time='12:20', end_time='23:59')
data = get_data_by_time(path='/Users/.../ProCoDA Data/', columns=[0,3,4], dates='6-14-2018')
aguaclara.research.procoda_parser.get_data_by_state(path, dates, state, column, extension='.tsv')[source]

Reads a ProCoDA file and extracts the time and data column for each iteration of the given state.

Note: column 0 is time, the first data column is column 1. Results for the time column are given in elasped time.

Parameters:
  • path (string) – The path to the folder containing the ProCoDA data file(s), defaults to the current directory

  • dates (string or string list) – A single date or list of dates for which data was recorded, formatted “M-D-YYYY”

  • state (int) – The state ID number for which data should be extracted

  • column (int or string) – The integer index of the column that you want to extract OR the header of the column that you want to extract

  • extension (string, optional) – File extension of the data file(s). Defaults to ‘.tsv’

Returns:

A list of lists of the time and data columns extracted for each iteration of the state. For example, if “data” is the output, data[i][:,0] gives the time column and data[i][:,1] gives the data column for the ith iteration of the given state and column. data[i][0] would give the first [time, data] pair.

Type:

3D float list

Examples:

data = get_data_by_state(path='/Users/.../ProCoDA Data/', dates=["6-19-2013", "6-20-2013"], state=1, column=28)
aguaclara.research.procoda_parser.read_state(dates, state, column, units='', path='', extension='.tsv')[source]

Reads a ProCoDA file and outputs the data column and time vector for each iteration of the given state.

Note: Column 0 is time. The first data column is column 1.

Parameters:
  • dates (string or string list) – A single date or list of dates for which data was recorded, formatted “M-D-YYYY”

  • state (int) – The state ID number for which data should be extracted

  • column (int or string) – Index of the column that you want to extract OR header of the column that you want to extract

  • units (string, optional) – The units you want to apply to the data, e.g. ‘mg/L’. Defaults to “” (dimensionless)

  • path (string) – The file path of the ProCoDA data file.

  • extension (string, optional) – The file extension of the tab delimited file. Defaults to “.tsv”

Returns:

time (numpy.ndarray) - Times corresponding to the data (with units)

Returns:

data (numpy.ndarray) - Data in the given column during the given state with units

Examples:

time, data = read_state(["6-19-2013", "6-20-2013"], 1, 28, "mL/s")
aguaclara.research.procoda_parser.average_state(dates, state, column, units='', path='', extension='.tsv')[source]

Outputs the average value of the data for each instance of a state in the given ProCoDA files

Note: Column 0 is time. The first data column is column 1.

Parameters:
  • dates (string or string list) – A single date or list of dates for which data was recorded, formatted “M-D-YYYY”

  • state (int) – The state ID number for which data should be extracted

  • column (int or string) – Index of the column that you want to extract OR header of the column that you want to extract

  • units (string, optional) – The units you want to apply to the data, e.g. ‘mg/L’. Defaults to “” (dimensionless)

  • path (string) – The file path of the ProCoDA data file.

  • extension (string, optional) – The file extension of the tab delimited file. Defaults to “.tsv”

Returns:

A list of averages for each instance of the given state

Return type:

float list

Examples:

data_avgs = average_state(["6-19-2013", "6-20-2013"], 1, 28, "mL/s")
aguaclara.research.procoda_parser.perform_function_on_state(func, dates, state, column, units='', path='', extension='.tsv')[source]

Performs the function given on each state of the data for the given state in the given column and outputs the result for each instance of the state

Note: Column 0 is time. The first data column is column 1.

Parameters:
  • func (function) – A function that will be applied to data from each instance of the state

  • dates (string or string list) – A single date or list of dates for which data was recorded, formatted “M-D-YYYY”

  • state (int) – The state ID number for which data should be extracted

  • column (int or string) – Index of the column that you want to extract OR header of the column that you want to extract

  • units (string, optional) – The units you want to apply to the data, e.g. ‘mg/L’. Defaults to “” (dimensionless)

  • path (string) – The file path of the ProCoDA data file.

  • extension (string, optional) – The file extension of the tab delimited file. Defaults to “.tsv”.

Requires:

func takes in a list of data with units and outputs the correct units

Returns:

The outputs of the given function for each instance of the given state

Type:

list

Examples:

def avg_with_units(lst):
num = np.size(lst)
acc = 0
for i in lst:
    acc = i + acc

return acc / num

data_avgs = perform_function_on_state(avg_with_units, ["6-19-2013", "6-20-2013"], 1, 28, "mL/s")
aguaclara.research.procoda_parser.read_state_with_metafile(func, state, column, path, metaids=[], extension='.tsv', units='')[source]

Takes in a ProCoDA meta file and performs a function for all data of a certain state in each of the experiments (denoted by file paths in then metafile)

Note: Column 0 is time. The first data column is column 1.

Parameters:
  • func (function) – A function that will be applied to data from each instance of the state

  • state (int) – The state ID number for which data should be extracted

  • column (int or string) – Index of the column that you want to extract OR header of the column that you want to extract

  • path (string) – The file path of the ProCoDA data file (must be tab-delimited)

  • metaids (string list, optional) – a list of the experiment IDs you’d like to analyze from the metafile

  • extension (string, optional) – The file extension of the tab delimited file. Defaults to “.tsv”

  • units (string, optional) – The units you want to apply to the data, e.g. ‘mg/L’. Defaults to “” (dimensionless)

Returns:

ids (string list) - The list of experiment ids given in the metafile

Returns:

outputs (list) - The outputs of the given function for each experiment

Examples:

def avg_with_units(lst):
    num = np.size(lst)
    acc = 0
    for i in lst:
        acc = i + acc

    return acc / num

path = "../tests/data/Test Meta File.txt"
ids, answer = read_state_with_metafile(avg_with_units, 1, 28, path, [], ".tsv", "mg/L")
aguaclara.research.procoda_parser.write_calculations_to_csv(funcs, states, columns, path, headers, out_name, metaids=[], extension='.tsv')[source]

Writes each output of the given functions on the given states and data columns to a new column in the specified output file.

Note: Column 0 is time. The first data column is column 1.

Parameters:
  • funcs (function or function list) – A function or list of functions which will be applied in order to the data. If only one function is given it is applied to all the states/columns

  • states (string or string list) – The state ID numbers for which data should be extracted. List should be in order of calculation or if only one state is given then it will be used for all the calculations

  • columns (int, string, int list, or string list) – The index of a column, the header of a column, a list of indexes, OR a list of headers of the column(s) that you want to apply calculations to

  • path (string) – Path to your ProCoDA metafile (must be tab-delimited)

  • headers (string list) – List of the desired header for each calculation, in order

  • out_name (string) – Desired name for the output file. Can include a relative path

  • metaids (string list, optional) – A list of the experiment IDs you’d like to analyze from the metafile

  • extension (string, optional) – The file extension of the tab delimited file. Defaults to “.tsv”

Requires:

funcs, states, columns, and headers are all of the same length if they are lists. Some being lists and some single values are okay.

Returns:

out_name.csv (CVS file) - A CSV file with the each column being a new calcuation and each row being a new experiment on which the calcuations were performed

Returns:

output (Pandas.DataFrame)- Pandas DataFrame holding the same data that was written to the output file

aguaclara.research.procoda_parser.intersect(x, y1, y2)[source]

Returns the intersections of two lines represented by a common set of x coordinates and two sets of y coordinates as three numpy arrays: the x coordinates of the intersections, the y coordinates of the intersections, and the indexes in x, y1, y2 immediately after the intersections.

Parameters:
  • x (numpy.ndarray) – common set of x coordinates for the two lines

  • y1 (numpy.ndarray) – the y coordinates of the first line

  • y2 (numpy.ndarray) – the y coordinates of the second line

Requires:

x have no repeating values and is in ascending order

Returns:

x_points-numpy.ndarray of the x coordinates where intersections occur

Returns:

y_points-numpy.ndarray of the y coordinates where intersections occur

Returns:

crossings-numpy.ndarray of the indexes after the intersections occur