DatacompyReport

Added reporting functionality for Datacompy

source

DatacompyReport

 DatacompyReport (compare_instance)

A class to modify the output of a datacompy instance and add reporting functionality

Details
compare_instance a datacompy.core.Compare instance

source

DatacompyReport.prepare_compare_results

 DatacompyReport.prepare_compare_results ()

Reformats the Datacompy output to display all column differences found for matches between the two DataFrames based on the keys compare.join_columns


source

DatacompyReport.create_excel_report

 DatacompyReport.create_excel_report (write_file:str,
                                      use_names_in_headers:bool=True)

Create Excel workbook with nicely formatted output comparing two DataFrames

Type Default Details
write_file str full path + filename where the Excel file is to be created
use_names_in_headers bool True if True, replaces df1 and df2 with Comapre.df1_name, Comapre.df2_name
Returns None Excel workbook written at write_file location

Create Test Data

data1 = """id,col1_string,col2_string,col3_int,col4_float,col5_string
010,g1,g1-duck,1,1.0,r1
110,g1,g1-dog,2,2.1,r1
210,g2,g2-duck,3,3.2,r2
310,g2,g2-dog,4,4.3,r2
410,g1,g1-goat,5,5.4,r1
510,,g1-dog,6,6.5,r1
610,g2,g2-dog,7,7.6,r2
710,g2,g2-duck,8,8.7,r2
810,g1,g1-dog,9,9.8,r1
811,g1,g1-duck,10,10.9,r1
812,g1,g1-duck,11,11.0,r1"""

data2 = """id,col1_string,col2_string,col3_int,col4_float,col6_string
010,g1,g1-duck,1,1.0,z1
110,g2,g2-duck,2,2.1,z2
210,g2,g2-duck,4,3.3,z2
310,g1,g1-dog,3,4.5,z1
410,g1,g1-dog,5,5.9,z1
510,g1,g1-dog,6,6.5,z2
610,g2,g2-goat,7,,z2
710,g2,g2-duck,8,8.7,z1
810,g1,g1-dog,9,9.8,z1
911,g2,g2-dog,10,10.91,z2
912,g2,g2-dog,11,11.01,z2
922,g1,g1-duck,12,11.1,z2
"""

df1 = pandas.read_csv(StringIO(data1))
df2 = pandas.read_csv(StringIO(data2))

Create a Datacompy Compare Instance to Compare df1 and df2

compare = datacompy.Compare(
    df1,
    df2,
    join_columns=['id', 'col1_string'],  # Key columns, can be single or list
    abs_tol=0,                           # Optional, defaults to 0
    rel_tol=0,                           # Optional, defaults to 0
    df1_name='Base',                     # Optional, defaults to 'df1'
    df2_name='Compare'                   # Optional, defaults to 'df2'
)

Create a New DataCompyReport Instance

report = DatacompyReport(compare)
report
 datacompy_report instance:
 Comparing: Base and Compare

The .compare_instance attribute is the datacompy.core.Compare object. datacompy.core.Compare attributes can be accessed directly.

for item in dir(report.compare_instance):
    if item[0] != '_':
        print(f'report.compare_instance.{item}')
report.compare_instance.abs_tol
report.compare_instance.all_columns_match
report.compare_instance.all_mismatch
report.compare_instance.all_rows_overlap
report.compare_instance.cast_column_names_lower
report.compare_instance.column_stats
report.compare_instance.count_matching_rows
report.compare_instance.df1
report.compare_instance.df1_name
report.compare_instance.df1_unq_columns
report.compare_instance.df1_unq_rows
report.compare_instance.df2
report.compare_instance.df2_name
report.compare_instance.df2_unq_columns
report.compare_instance.df2_unq_rows
report.compare_instance.ignore_case
report.compare_instance.ignore_spaces
report.compare_instance.intersect_columns
report.compare_instance.intersect_rows
report.compare_instance.intersect_rows_match
report.compare_instance.join_columns
report.compare_instance.matches
report.compare_instance.on_index
report.compare_instance.rel_tol
report.compare_instance.report
report.compare_instance.sample_mismatch
report.compare_instance.subset

View Modified Matched Differences Report

Datacompy provides most of the methods and attributes necessary to create a very useful comparison of two DataFrames. For some particular use cases though it is helpful to represent the data where matches on key columns are found but at least one difference exists (by value and/or by type) in a column they both share. Further, there is no added value reporting values in columns where the two DataFrames match for this use case so matching values are blanked out.

To create such a table: report.prepare_compare_results()

report.prepare_compare_results()
col2_string_df1 col2_string_df2 col3_int_df1 col3_int_df2 col4_float_df1 col4_float_df2
id col1_string
210 g2 3.0 4.0 3.2 3.3
410 g1 g1-goat g1-dog 5.4 5.9
610 g2 g2-dog g2-goat 7.6 NaN

Create Excel Report

To create an Excel report, use report.create_excel_report('comparison_report_test_data.xlsx').

The workbook will contain multiple sheets:

  • Row Summary - overall comparison of DataFrame1 and DataFrame2 by row
  • Column Summary - overall comparison of DataFrame1 and DataFrame2 by row
  • Columns Stats Summary - overall comparison of column metadata for DataFrame1 and DataFrame2
  • DataFrame1 Observations Only - all rows (by key as defined by self.compare_instance.join_columns) found in DataFrame 1 but not DataFrame 2
  • DataFrame2 Observations Only - all rows (by key as defined by self.compare_instance.join_columns) found in DataFrame 2 but not DataFrame 1
  • Matched Differences - all rows (by key as defined by self.compare_instance.join_columns) found in both DataFrame1 and DataFrame2 having at least one difference in a non-key field. The Matched Differences DataFrame replaces all fields where the DataFrames match with blanks by using report.prepare_compare_results()
report.create_excel_report('comparison_report_test_data.xlsx')
Excel Workbook has been created