Introduction
MiMFa RAVAR DataLab contains everything you normally need to collect, integrate and present small to large amounts of data from different resources and different data storage formats.
New tools have been developed in the recent years that data scientists and analysts can use to collect, integrate, and analyse data. Developers in this area of knowledge face the challenge of developing and learning new programming languages, frameworks, modules, etc. This led us to develop a new flexible tool that can integrate most problems and requirements into a single and customizing program. We have derived three basic categories of activities of this area from the data warehousing and management literature and consequently developed our idea in three distinct subject areas:
Data Extraction and Collecting
Automatic data extraction from the offline and online resources covering different formats in a semi up to big volume data
- Unstructured, semi-structured and structured texts
- Semi-big up to big volume data
- From different formats (XLSx, DOCx, PPTx, PDF, XML, HTML, and etc.)
- From different resources
- From local disk files, web pages and etc.
- Through parallel processing algorithms
Data Integration and Processing
Process and integration of collected and existing data for the future use (data normalization)
- Organization of small up to big volume data
- Converting heterogeneous data storage methods to the same format or structure
- Structuring data in a tagged (ris) or column format
- Aplying different standards on collected data for storage and exchange such as MARC, Dublin Core, MODS for the exchange of bibliographic and media metadata
- Detailed description of the data type and other related information about data for the better future processing
- General and subject indexing of the collected data
- Applying Regular-Expression patterns to find intended data and doing operation on them
- Applying quick filtering, finding, and searching methods on data for the normalization and the other purposes
- Cusomizing search methods
Data Presentation and Analysing
Presentation of data for analysis and reporting purposes through derivating data-mart and other types
- Presentation and view of collected and organized data in formats like table, text, diagram
- Deriviating a subset or a partial copy of data (like a data-mart) for a specific organizational area or a specific application or analysis
- Restructuring of a data collection based on a metadata element (e.g. a collection of customer information and its purchases based on customer names would be restructured into a new list of the purchased elements and the associated customers)
- Exporting collected and organized data based on various forms of metadata exchange
- Exporting matrices based on data types and measuring scales: (nominal, ordinal, interval and ratio)
- Exporting data as input for other special data analysis applications