| Title: | Decode and Validate HEIMS Data from Department of Education, Australia | 
| Version: | 0.4.0 | 
| Date: | 2018-01-25 | 
| Description: | Decode elements of the Australian Higher Education Information Management System (HEIMS) data for clarity and performance. HEIMS is the record system of the Department of Education, Australia to record enrolments and completions in Australia's higher education system, as well as a range of relevant information. For more information, including the source of the data dictionary, see http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary. | 
| Depends: | R (≥ 3.4.0), data.table | 
| Imports: | hutils, magrittr, fastmatch, bit64, lubridate | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Suggests: | testthat, fst | 
| RoxygenNote: | 6.0.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2018-01-25 10:01:10 UTC; hughp | 
| Author: | Hugh Parsonage [aut, cre] | 
| Maintainer: | Hugh Parsonage <hugh.parsonage@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2018-01-25 10:05:11 UTC | 
Browse elements for description
Description
Browse elements for description
Usage
browse_elements(pattern)
Arguments
| pattern | A case-insensitive perl expression or expressions to match in the long name of  | 
Value
A data.table of all element-long name combinations matching the perl regular expression.
Examples
browse_elements(c("ProViDer", "Maj"))
Decode HEIMS elements
Description
Decode HEIMS elements
Usage
decode_heims(DT, show_progress = FALSE, check_valid = TRUE, selector)
Arguments
| DT | A  | 
| show_progress | Display the progress of the function (which is likely to be slow on real data). | 
| check_valid | Check the variable is valid before decoding. Setting to  | 
| selector | Original HEIMS names to restrict the decoding to. Other names will be preserved. | 
Details
Each variable in DT is validated according heims_data_dict before being decoded. Any failure stops the validation.
If DT has a key, the output will have a key, but set on the decoded columns and
the ordering will most likely change (to reflect the decoded values).
This function will, on the full HEIMS data, take a long time to finish. Typically in the order of 10 minutes for the enrol file.
Value
DT with the values decoded and the names renamed.
Examples
## Not run: 
# (E488 is made up so won't work if validation is attempted.)
decode_heims(dummy_enrol)
## End(Not run)
decode_heims(dummy_enrol, show_progress = TRUE, check_valid = FALSE)
Decoders
Description
Decoders
Usage
E089_decoder
E095_decoder
E306_decoder
E310_decoder
E312_decoder
E316_decoder
E329_decoder
E327_decoder
E330_decoder
E331_decoder
E337_decoder
E346_decoder
E348_decoder
E355_decoder
E358_decoder
E386_decoder
E392_decoder
E461_decoder
E463_decoder
E464_decoder
E490_decoder
U490_decoder
E551_decoder
E562_decoder
E919_decoder
E920_decoder
E922_decoder
FOE_uniter
HE_Provider_decoder
Format
An object of class data.table (inherits from data.frame) with 2 rows and 2 columns.
Dummy enrolment file
Description
A data.table of five fictitious enrolments.
Usage
dummy_enrol
Format
An object of class data.table (inherits from data.frame) with 5 rows and 56 columns.
Make HEIMS element nos human-readable
Description
Make HEIMS element nos human-readable
Usage
rename_heims(DT)
element2name(v)
Arguments
| DT | The data table with original names | 
| v | A vector of element names. | 
Details
See heims_data_dict. Note that decode_heims is generally better,
as it decodes the variable if a decoder is present in the dictionary.
element2name is the inverse of browse_elements:
given an element like E306, it returns
the name (HE_Provider_cd.)
Value
DT with the new names or the vector with the names translated.
Validate HEIMS elements
Description
Return TRUE or FALSE on whether or not each variable in a data.table complies with the HEIMS code limits
Usage
validate_elements(DT, .progress_cat = FALSE)
prop_elements_valid(DT, char = FALSE)
count_elements_invalid(DT, char = FALSE)
Arguments
| DT | The data.table whose variables are to be validated. | 
| .progress_cat | Should the progress of the function be displayed on the console? If  | 
| char | Return as character vector, in particular marking – any complete or completely absent values. | 
Details
For early detection of invalid results, the type of the variable (in particular integer vs double) is considered first,
vetoing a TRUE result if different.
Value
A named logical vector, whether or not the variable complies with the style requirements. A value of NA indicates the variable
was not checked (perhaps because it is absent from heims_data_dict).
Examples
X <- data.frame(E306 = c(0, 1011, 999, 9998))
validate_elements(X)  # FALSE
prop_elements_valid(X)
X <- data.frame(E306 = as.integer(c(0, 1011, 999, 9998)))
validate_elements(X)  # TRUE
First levels
Description
See relevel_heims.
Usage
first_levels
Format
An object of class data.table (inherits from data.frame) with 8 rows and 2 columns.
Read raw HEIMS file
Description
Read raw HEIMS file
Usage
fread_heims(filename)
Arguments
| filename | A text-delimited file, passed to  | 
Details
The strings "" "NA" "?" "." "*" "**" are treated as missing, as well as ZZZZZZZZZZ
(so students without a CHESSN will be marked with the integer64 missing value).
Value
A data.table with column names in ascending (lexicographical) order and
any columns starting with e will be uppercase.
HEIMS data dictionary
Description
HEIMS data dictionary
Usage
heims_data_dict
Format
A named list each containing 5 elements:
- long_name
- a human-readable version of the variable; - orig_namethe element number;
- mark_missing
- a vectorized-function returning - TRUEon values of the variable which should be coded as- NA;
- ad_hoc_prepare
- a function to apply before validation; 
- validate
- a single-value function returning - TRUEor- FALSEon vectors which comply with the variable's coding rules.
- ad_hoc_validation_note
- If the data dictionary did not cover elements in the file, how the - validatefunction was altered to suffer them.
- valid
- a vectorized function returning - TRUEor- FALSEon vectors which do not comply with the variable's coding rules.
- decoder
- A function of the - data.tabledecoding the variable decoded.
- post_fst
- A function of the - data.tablereturned by fst to be used (for example to reset attributes).
Details
Abbreviations in long_name:
- amt
- Amount 
- cd
- Code 
- det
- Detail(s) 
- FOE
- Field of education 
- Maj
- Major 
Source
http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary
Read HEIMS data from decoded fst files
Description
Read HEIMS data from decoded fst files
Usage
read_heims_fst(filename)
Arguments
| filename | File path to  | 
Value
A data.table with appropriate attributes.
Relevel categorical variables
Description
Changes categorical variables in a data.table to levels with a sensible reference level
Usage
relevel_heims(DT)
Arguments
| DT | A  | 
Value
The same data.table with character vectors changed to factors whose first level is the level intended.
Utility functions
Description
Only included here because of the unusual nature of heims_data_dict.
Usage
AND()
OR()
never(v)
every(v)
always(v)
is.Date(v)
is.YearMonth(v)
nth_digit_of(x, n)
between(...)
or(...)
and(...)
if_else(...)
coalesce(...)
a %fin% tbl
rm_leading_0s(v)
as.integer64(v)
is.integer64(v)
force_integer(v)
ymd(...)
Arguments
| v | A vector. | 
| x,n | vectors | 
| ... | Passed to other functions | 
| a | Element suspected to be in  | 
| tbl | A lookup table. | 
Details
nth_digit_of returns the nth digit of the number starting from the units and going up in magnitude.
Examples
nth_digit_of(503, 1) == 1