Analysis Report for a given text dataset sample
1.0 Data set:-
data_file <- "/Users/zzahir1978/Desktop/Sample data/en.sahih.txt"
## Name Verse
## 1: Al-Fatihah 7
## 2: Al-Baqarah 286
## 3: Ale Imran 200
## 4: An-Nisa' 176
## 5: Al-Ma'idah 120
## ---
## 110: An-Nasr 3
## 111: Al-Masad 5
## 112: Al-Ikhlas 4
## 113: Al-Falaq 5
## 114: Al-Nas 6
## V1 V2
## 1: Data Size (MB) 0.86
## 2: Nos.Of Line 6249.00
## 3: Nos.Of Character 891800.00
## 4: Nos.Of Words 158992.00

2.0 Compute sample sizes in terms of lines
## data_size
## 1: 4374.3
3.0 Text Data Analysis Results
3.1 Most frequent and least frequent words
3.1.1 Top 10 most frequent words
## word count
## 1: allah 2065
## 2: will 1664
## 3: indeed 1044
## 4: lord 670
## 5: said 567
## 6: say 547
## 7: people 508
## 8: upon 443
## 9: except 333
## 10: among 333
3.1.2 Ten Least frequent words
## word count
## 1: losers 31
## 2: competent 31
## 3: former 31
## 4: thing 31
## 5: eyes 31
## 6: criminals 32
## 7: wrong 32
## 8: grateful 32
## 9: mountains 32
## 10: gives 32
3.1.3 Plotting 10 Most Frequent Words

3.1.4 Plotting 10 Least Frequent Words

3.1.5 Creating Words Cloud

4.0 Session info
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] twitteR_1.1.9 forcats_0.5.0 stringr_1.4.0 purrr_0.3.4
## [5] tibble_3.0.3 tidyverse_1.3.0 tidyr_1.1.2 readr_1.3.1
## [9] dtplyr_1.0.1 wordcloud_2.6 RColorBrewer_1.1-2 ggthemes_4.2.0
## [13] ggplot2_3.3.2 data.table_1.13.0 knitr_1.30 dplyr_1.0.2
## [17] ngram_3.0.4 tm_0.7-7 NLP_0.2-0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5 lubridate_1.7.9 assertthat_0.2.1 digest_0.6.25
## [5] slam_0.1-47 R6_2.4.1 cellranger_1.1.0 backports_1.1.10
## [9] reprex_0.3.0 evaluate_0.14 httr_1.4.2 pillar_1.4.6
## [13] rlang_0.4.7 readxl_1.3.1 rstudioapi_0.11 blob_1.2.1
## [17] rmarkdown_2.4 labeling_0.3 bit_4.0.4 munsell_0.5.0
## [21] broom_0.7.1 compiler_4.0.2 modelr_0.1.8 xfun_0.18
## [25] pkgconfig_2.0.3 htmltools_0.5.0 tidyselect_1.1.0 fansi_0.4.1
## [29] crayon_1.3.4 dbplyr_1.4.4 withr_2.3.0 grid_4.0.2
## [33] jsonlite_1.7.1 gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0
## [37] magrittr_1.5 scales_1.1.1 cli_2.0.2 stringi_1.5.3
## [41] farver_2.0.3 fs_1.5.0 xml2_1.3.2 ellipsis_0.3.1
## [45] generics_0.0.2 vctrs_0.3.4 rjson_0.2.20 tools_4.0.2
## [49] bit64_4.0.5 glue_1.4.2 hms_0.5.3 parallel_4.0.2
## [53] yaml_2.2.1 colorspace_1.4-1 rvest_0.3.6 haven_2.3.1