![]() ![]() The profiling report is written in HTML and CSS, which means a modern browser is required. You can install using the pip package manager by running: HCC (Open dataset from healthcare, showcasing compare between two sets of data, before and after preprocessing)Īdditional details, including information about widget support, are available on the documentation.USA Air Quality (Time-series air quality dataset EDA example).Coal prices (simple pricing evolution datasets, showcasing the theming options).Website Inaccessibility (website accessibility analysis, showcasing support for URL data).Russian Vocabulary (100 most common Russian words, showcasing unicode text analysis).UCI Bank Dataset (marketing dataset from a bank).NZA (open data from the Dutch Healthcare Authority).NASA Meteorites (comprehensive set of meteorite landing - object properties and locations). ![]() Census Income (US Adult Census data relating income with other demographic properties).The following example reports showcase the potentialities of the package across a wide range of dataset and data types: Pandas_profiling -title "Example Profiling Report " -config_file default.yaml data.csv report.htmlĪdditional details on the CLI are available on the documentation. The above is achieved by simply displaying the report as a set of widgets. There are two interfaces to consume the report inside a Jupyter notebook: through widgets and through an embedded HTML report. Generating reports which are mindful about sensitive data in the input datasetĬomplementing the report with dataset details and column-specific data dictionariesĬhanging the appearance of the report's page and of the contained visualizations Tips on how to prepare data and configure ydata-profiling for working with large datasets Generating a report for a time-series dataset with a single line of code The documentation includes guides, tips and tricks for tackling them: Use caseĬomparing multiple version of the same dataset YData-profiling can be used to deliver a variety of different use-case. Spark support has been released, but we are always looking for an extra pair of hands □.Ĭheck current work in progress!. You want to compare 2 datasets and get a report? Check this blogpost.Looking for how you can do an EDA for Time-Series □ ? Check this blogpost.Want to scale? Check the latest release with ⭐ ⚡ Spark support!.Reproduction: technical details about the analysis (time, version and configuration).Alerts: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others).Overview: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint).The report contains three additional sections: Flexible output formats: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook.Compare datasets: one-line solution to enable a fast and complete report on the comparison of datasets.File and Image analysis: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata.Text analysis: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic).Time-Series: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots.Multivariate analysis: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction.Univariate analysis: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms.Warnings: A summary of the problems/challenges in the data that you might need to work on ( missing data, inaccuracies, skewness, etc.). ![]()
0 Comments
Leave a Reply. |