cognite.correlation.correlation.columns_by_max_cross_correlation(df: pandas.core.frame.DataFrame, relate_to: Union[int, str], lags: pandas.core.indexes.timedeltas.TimedeltaIndex, return_cross_correlation_df: bool = False) → Union[pandas.core.frame.DataFrame, Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]]

Find lag of highest correlation and return relevant information for all columns of the inputted DataFrame. Note that the operation requires a DataFrame with even temporal spacing.

It is recommended to either have a lot of data in the data frame, or to use a short time frame for the lags, as the results are unstable if too few data points overlap in the time shifted time series.

Note about time lag: Time lag is defined such that with perfect correlation, original time series + time lag = lagged time series This means that an effect of the relate_to time series would have a positive time lag.

  • df (pandas.DataFrame) – Time series data
  • relate_to (Union[int, str]) – Name of column to compare others with
  • lags (pandas.TimedeltaIndex) – Pandas sequence of timedeltas for shifting the time series
  • return_cross_correlation_df (bool) – Whether or not to return the cross correlations for all columns. This is a pandas.DataFrame containing the cross correlation for all calculated lags.

Pandas DataFrame containing results of calculations, and optionally a DataFrame containing the cross correlations for each column at all calculated lags.

Return type:

Union[pandas.DataFrame, Tuple[pandas.DataFrame, pandas.DataFrame]]


Return maximum correlations and time lags for a simple dataframe.

>>> df = pd.DataFrame({'datetime': pd.date_range(datetime(2017, 1, 1), datetime(2017, 1, 3), periods=10),
>>>                    'x': np.sin(np.linspace(0, 2 * np.pi, 10)),
>>>                    'y': np.sin(np.linspace(1, 2 * np.pi + 1, 10))}).set_index('datetime')
>>> lags = pd.timedelta_range(timedelta(days=-3), timedelta(), periods=10)
>>> columns_by_cross_correlation(df, 'x', lags)
cognite.correlation.correlation.cross_correlate(df: pandas.core.frame.DataFrame, relate_to_series: pandas.core.series.Series, lag_idx=0)

Calculate cross correlation for a given lag.

It is recommended to either have a lot of data in the data frame, or to use a short time frame for the lags, as the results are unstable if too few data points overlap in the time shifted time series.

  • df (pandas.Series) – Time series data to correlate with some series
  • relate_to_series (pandas.Series) – Pandas Series with time series data to relate df to. Must have the same temporal spacing as df.
  • lag_idx (int) – How many indices to move the DataFrame in relation to the series.

Pandas DataFrame containing the cross correlations of the columns.

Return type:



>>> df = pd.DataFrame({'x': (1, 7, 3, 5), 'y': (3, 7, 6, 4)})
>>> cross_correlate(df, df['x'], lag_idx=1)
x   -0.981981
y   -0.960769
cognite.correlation.plot.plot_cross_correlations(cross_correlation_df: pandas.core.frame.DataFrame, cols_to_plot: Union[Iterable[str], Iterable[int]] = None, separate_plots: bool = True, mpl_args: Dict[str, Any] = None) → None

Plot cross-correlations over time lags. The cols_to_plot parameter can either be an iterable of strings of columns to plot, or a list of integers of the indices of the sorted list of columns to display.

  • cross_correlation_df (pd.DataFrame) – The DataFrame returned from columns_by_max_correlation (with the optional parameter return_cross_correlation_df enabled).
  • cols_to_plot (Union[Iterable[str], Iterable[int]]) – What columns to plot or, alternatively an iterable of indices for which of the ordered columns to plot.
  • separate_plots (bool) – Whether or not to divide the plots into individual plots, or to keep them in one plot
  • mpl_args (Dict[str, Any]) – Additional parameters for Matplotlib plotting function



Plot the cross-correlation for columns numbered 1-4 of the dataframe (0 is excluded to not plot auto-correlation).

>>> plot_cross_correlation(cross_correlation_df, range(1,5))