semtk3 package
- semtk3.build_connection_str(name: str, triple_store_type: str, triple_store_url: str, model_graphs: List[str], data_graph: str, extra_data_graphs: List[str] = [])
Throw exception if connection triplestore(s) don’t respond OK to http GET
- Parameters
conn_str¶ – a SemTK connection json string
- semtk3.build_constraint(sparql_id, operator, operand_list)
Build a contraint to be used as a query parameter
- Parameters
- Returns
the constraint
- Return type
- semtk3.build_default_connection_str(name, triple_store_type, triple_store_url)
Build a connection to the default graph only
- Parameters
name¶ – name is for display only
triple_store_type¶ – “fuseki” “neptune” “virtuoso”, etc.
triple_store_url¶ – the URL e.g. “http://localhost:3030/DATASET”
- semtk3.check_connection_up(conn_str)
Throw exception if connection triplestore(s) don’t respond OK to http GET
- Parameters
conn_str¶ – a SemTK connection json string
- semtk3.check_services()
Logs success or failure of each service
- Returns
did all pings succeed
- Return type
boolean
- semtk3.clear_graph(conn_json_str, model_or_data, index)
Clear a graph
- semtk3.combine_entities(target_uri, duplicate_uri, delete_predicates_from_target=None, delete_predicates_from_duplicate=None, conn=None)
Combine two entities. Exception on failure.
- Parameters
target_uri¶ – target instance to be combined INTO
duplicate_uri¶ – duplicate instance to be combined then removed
delete_predicates_from_target¶ – list of predicate URIs to be deleted from target
delete_predicates_from_duplicate¶ – list of predicate URIs to be deleted from duplicate
conn¶ – connection (can also be set with set_connection_override())
- semtk3.combine_entities_in_conn(same_as_class_uri=None, target_prop_uri=None, duplicate_prop_uri=None, delete_predicates_from_target=[], delete_predicates_from_duplicate=[], conn=None)
Combine entities described by SameAs instances in the data. See EntityResolution.sadl and Wiki on Entity Resolution
Every param is an unusual override, except perhaps conn. Normal: combine_entities_in_conn() :param _sphinx_paramlinks_semtk3.combine_entities_in_conn.same_as_class_uri: override :param _sphinx_paramlinks_semtk3.combine_entities_in_conn.target_prop_uri: override :param _sphinx_paramlinks_semtk3.combine_entities_in_conn.duplicate_prop_uri: override :param _sphinx_paramlinks_semtk3.combine_entities_in_conn.delete_predicates_from_target: list of propertyURI’s to remove from target before combining :param _sphinx_paramlinks_semtk3.combine_entities_in_conn.delete_predicatges_from_duplicate: list of propertyURI’s to remove from duplicate before combining :param _sphinx_paramlinks_semtk3.combine_entities_in_conn.conn : connection :return status string :throws exception with table of errors
- semtk3.combine_entities_table(csv_str, target_col_prop_dict, duplicate_col_prop_dict, delete_predicates_from_target=[], delete_predicates_from_duplicate=[], conn=None)
Combine entities described by rows in a table. Each row has col(s) to lookup the target and duplicate Any properties outgoing from duplicate are ignored if they exist in the target All incoming properties are combined delete_predicates_from_* parameters occur before combining using the above rules
“#type” may be used as a property shorthand
- Parameters
csv_str¶ – csv table of entities to combine
target_col_prop_dict¶ – dictionary describing how to look up target dict[col_name]=prop_uri
duplicate_col_prop_dict¶ – dictionary describing how to look up duplicate dict[col_name]=prop_uri
delete_predicates_from_target¶ – list of propertyURI’s to remove from target before combining
delete_predicatges_from_duplicate¶ – list of propertyURI’s to remove from duplicate before combining
conn¶ – connection
- Returns
status string
- Throws
exception with table of errors
- semtk3.copy_graph(from_graph: str, to_graph: str, from_server: Optional[str] = None, from_server_type: Optional[str] = None, to_server: Optional[str] = None, to_server_type: Optional[str] = None, user_name='noone', password='nopass')
Copy one graph to another (merging into the destination) So clear the “to” graph as a separate step if desired.
- Parameters
from_graph¶ – merge from this graph
to_graph¶ – merge to this graph
from_server¶ – merge from this server. if None: get from SEMTK_CONN_OVERRIDE.data[0]
from_server_type¶ – type of ‘from’ server. if None: get from SEMTK_CONN_OVERRIDE.data[0]
to_server¶ – merge to this server. if None: get from SEMTK_CONN_OVERRIDE.data[0]
to_server_type¶ – type of ‘to’ server. if None: get from SEMTK_CONN_OVERRIDE.data[0]
user_name¶ – if security needed on ‘to’ server
password¶ – if security needed on ‘to’ server
- Returns
status message string like “successfully copied uri://from into uri://to
- Throws
exception on any error
- semtk3.count_by_id(nodegroup_id, limit_override=0, offset_override=0, runtime_constraints=None, edc_constraints=None, flags=None)
Execute a count query for a given nodegroup id
- Parameters
nodegroup_id¶ – id of nodegroup in the store
limit_override¶ – optional override of LIMIT clause
offset_override¶ – optional override of OFFSET clause
runtime_constraints¶ – optional runtime constraints built by build_constraint()
edc_constraints¶ – optional edc constraints
flags¶ – optional query flags
- Returns
results
- Return type
semtktable
- semtk3.create_nodegroup(conn_json_str, class_uri, sparql_id=None)
Create a nodegroup containing a single uri
- semtk3.delete_item_from_store(item_id, item_type)
Delete item from the store, error if it doesn’t exist.
- semtk3.delete_items_from_store(regex_str, item_type='StoredItem')
Delete matching nodegroups from store
- semtk3.delete_nodegroup_from_store(nodegroup_id)
Delete nodegroup_id from the store error if it doesn’t exist.
- Parameters
nodegroup_id¶ – the id
- semtk3.delete_nodegroups_from_store(regex_str)
- semtk3.download_owl(owl_file_path, conn_json_str, user_name='noone', password='nopass', model_or_data='model', conn_index=0)
Download a graph as an OWL file
- Parameters
owl_file_path¶ – path to the file
conn_json_str¶ – connection json string (defaults to the first MODEL graph in the connection)
user_name¶ – optional user name
password¶ – optional password
model_or_data¶ – optional “model” or “data” specifying which endpoint in the sparql connection, defaults to “data”
conn_index¶ – index specifying which of the model or data endpoints in the sparql connection, defaults to 0
- Returns
None - raises exception on error
- semtk3.fdc_cache_bootstrap_table(conn_json_str, spec_id, bootstrap_table, recache_after_sec)
Run an fdc cache spec
- semtk3.get_class_names(conn_json_str=None)
Get a list of class names in the ontology
- Parameters
conn_json_str¶ – optional conenction json string defaults to override
:returns list of full class URI’s
- semtk3.get_class_template(class_uri, conn_json_str=None, id_regex='identifier')
Get class template nodegroup
- Parameters
:returns nodegroup json string
- semtk3.get_class_template_and_csv(class_uri, conn_json_str=None, id_regex='identifier')
Get class template nodegroup
- param class_uri
the class whose template should be used for ingestion
- param conn_json_str
optional conenction json string defaults to override
- param id_regex
optional regex to identify the key data properties of classes which are the object of object properties
:returns (ng_json_str, “col1, col2, col3
“, “string, int, dateTime “) note that types can be space-separated complex property types
- semtk3.get_class_template_csv(class_uri, conn_json_str=None, id_regex='identifier')
Get sample CSV that will work with class template
- Parameters
- Returns
“colname1, colname2, colname3”
- semtk3.get_constraints_by_id(nodegroup_id)
Get runtime constraints for a stored nodegroup
- Parameters
nodegroup_id¶ – the id
- Returns
columns valueId, itemType and valueType
- Return type
semtktable
- semtk3.get_filter_values_by_id(nodegroup_id, target_obj_sparql_id, override_conn_json_str=None, limit_override=None, offset_override=None, runtime_constraints=None, edc_constraints=None, flags=None)
Run a filter values query, which returns all the existing values for a given variable in the nodegroup
- Parameters
nodegroup_id¶ – the id
target_obj_sparql_id¶ – the variable to be interrogated
override_conn_json_str¶ – optional override connection json string
limit_override¶ – optional override of LIMIT clause
offset_override¶ – optional override of OFFSET clause
runtime_constraints¶ – optional runtime constraints built by build_constraint()
edc_constraints¶ – optional edc constraints
flags¶ – optional query flags
- Returns
results
- Return type
semtktable
- semtk3.get_graph_info(conn_json_str, skip_semtk_graphs=False, graph_names_only=True)
Get names and triple counts of graphs present in the triple store
- semtk3.get_instance_dictionary(max_words: int = 2, specificity_limit: int = 1, conn_json_str: Optional[str] = None) SemtkTable
- Get a table describing the uris and their labels. Columns:
instance_uri - the URI
class_uris - instance belongs to one or more classes
label - label (or name) associated with the instance. NOT UNIQUE: see label_specificity
label_specificity - how many uris have this label
property - what prop was used to associate label with instance_uri
- semtk3.get_logger()
- semtk3.get_nodegroup_by_id(nodegroup_id)
Retrieve a nodegroup from the store
- Parameters
nodegroup_id¶ – the id
- Returns
a nodegroup
- Return type
json string
- semtk3.get_nodegroup_store_data()
Get list of nodegroups in the nodegroup store
- Returns
SemtkTable with columns ‘ID’, ‘comments’, ‘creationDate’, ‘creator’, ‘itemType’
- Return type
semtktable
- semtk3.get_oinfo(conn_json_str=None)
Get a table describing the ontology model
- Parameters
conn_json_str¶ – connection string of graph(s) holding the model
- Return type
semtktable
- semtk3.get_oinfo_predicate_stats(conn_json_str=None)
Get a table describing the ontology model
- Parameters
conn_json_str¶ – connection string of graph(s) holding the model
- Return type
semtktable
- semtk3.get_oinfo_uri_label_table(conn_json_str=None)
Get a table describing the ontology model
- Parameters
conn_json_str¶ – connection string of graph(s) holding the model
- Return type
semtktable
- semtk3.get_plot_spec_names_by_id(nodegroup_id)
Get available plot names for a given nodegroup id
- semtk3.get_sparqlgraph_url(host_url: str, nodegroup_id: Optional[str] = None, report_id: Optional[str] = None, runtime_constraints: Optional[List[RuntimeConstraint]] = None, run_flag: Optional[str] = None, conn_json_str: Optional[str] = None)
Get a URL for sparqlgraph with params to launch a connection, nodegroup, query
- Parameters
host_url¶ (str) – base url e.g. http://localhost:8080
nodegroup_id¶ (str) – id of nodegroup in the store to launch. By default, run the query.
report_id¶ (str) – id of report in the store to launch. By default, run the report.
runtime_constraints¶ (RuntimeConstraint) – constraints to apply to query if nodegroup_id is specified
run_flag¶ (str) – “True” or “False”, default “True”
conn_json_str¶ (str) – connection to load. Will override nodegroup_id’s.
- Returns
url
- Return type
string
- semtk3.get_store_item(item_id, item_type)
- semtk3.get_store_table(item_type='StoredItem')
Get list of everything in the store
- Parameters
item_type¶ – one of the STORE_ITEM_TYPE constants
- Returns
SemtkTable with columns ‘ID’, ‘comments’, ‘creationDate’, ‘creator’, ‘itemType’
- Return type
semtktable
- semtk3.get_table(jobid)
Get a table from an async job
- Parameters
jobid¶ – the job id
- Return type
semtktable
- semtk3.ingest_by_id(nodegroup_id, csv_str, override_conn_json_str=None)
Perform data ingestion, throwing exception on failure
- semtk3.ingest_using_class_template(class_uri, csv_str, conn_json_str=None, id_regex='identifier')
Ingest using class template, throwing exception on failure
- semtk3.main()
- semtk3.override_hosts(query_host=None, status_host=None, results_host=None, hive_host=None, oinfo_host=None, nodegroup_exec_host=None, nodegroup_host=None, utility_host=None, fdcache_host=None, ingestion_host=None)
Override the default host(s) for Semtk service(s).
- semtk3.override_ports(query_port=None, status_port=None, results_port=None, hive_port=None, oinfo_port=None, nodegroup_exec_port=None, nodegroup_port=None, utility_port=None, fdcache_port=None, ingestion_port=None)
Override the default port(s) for Semtk service(s). Ports may be numbers (port will be appended with colon), e.g. 80 or “80” or context string (port will simply be appended) e.g. “/query”
- semtk3.print_wait_dots(seconds)
- semtk3.query(query, conn_json_str, model_or_data='data', conn_index=0)
Run a raw SPARQL query
- Parameters
- Returns
results
- Return type
semtktable
- semtk3.query_by_id(nodegroup_id, limit_override=0, offset_override=0, runtime_constraints=None, edc_constraints=None, flags=None, query_type=None, result_type=None)
Execute the default query type for a given nodegroup id
Check results for type(result) is
dict - json ld results
semtk3.semtktable.SemtkTable
A count query will be a SemtkTable with colum nname “count”
A confirm query will be a SemtkTable with column name “@message”
- Parameters
nodegroup_id¶ – id of nodegroup in the store
limit_override¶ – optional override of LIMIT clause
offset_override¶ – optional override of OFFSET clause
runtime_constraints¶ – optional runtime constraints built by build_constraint()
edc_constraints¶ – optional edc constraints
flags¶ – optional query flags
- Returns
results: dict or semtk3.semtktable.SemtkTable
- Return type
semtktable or JSON
- semtk3.query_by_nodegroup(nodegroup_str, runtime_constraints=None, edc_constraints=None, flags=None, query_type=None, result_type=None)
Execute the default query type for a given nodegroup id
Check results for type(result) is
dict - json ld results
semtk3.semtktable.SemtkTable
A count query will be a SemtkTable with colum nname “count”
A confirm query will be a SemtkTable with column name “@message”
- semtk3.query_hive(hiveserver_host, hiveserver_port, hiveserver_database, query)
- semtk3.retrieve_from_store(regex_str, folder_path)
- semtk3.retrieve_items_from_store(regex_str, folder_path, item_type='StoredItem')
Retrieve all items matching a pattern, create store_data.csv
- semtk3.retrieve_nodegroups_from_store(regex_str, folder_path)
- semtk3.retrieve_reports_from_store(regex_str, folder_path)
Retrieve all items matching a pattern, create store_data.csv Retrieves reports and any nodegroups they use
- semtk3.select_by_id(nodegroup_id, limit_override=0, offset_override=0, runtime_constraints=None, edc_constraints=None, flags=None)
Execute a select query for a given nodegroup id
- Parameters
nodegroup_id¶ – id of nodegroup in the store
limit_override¶ – optional override of LIMIT clause
offset_override¶ – optional override of OFFSET clause
runtime_constraints¶ – optional runtime constraints built by build_constraint()
edc_constraints¶ – optional edc constraints
flags¶ – optional query flags
- Returns
results
- Return type
semtktable
- semtk3.select_plot_by_id(nodegroup_id, plot_name)
Create a plot for a given nodegroup id
- semtk3.set_connection_override(conn_str)
Set a connection string to be used in all nodegroups
- Parameters
conn_str¶ – a SemTK connection json string
- semtk3.set_headers(headers)
- semtk3.set_host(hostUrl)
- semtk3.store_folder(folder_path)
- Reads a file of the standard “store_data.csv” format
ID,comments,creator,jsonfile, optional: type id27,Test comments,200001111,file.json
…and saves the specified nodegroups to the store, overwriting existing if needed
- Parameters
folder_path¶ – target folder
- semtk3.store_item(item_id, comments, creator, item_json_str, item_type, overwrite_flag=False)
Saves a single item to the store, fails if nodegroup_id already exists unless overwrite_flag
- Parameters
- Returns
status
- Return type
string
- semtk3.store_nodegroup(nodegroup_id, comments, creator, nodegroup_json_str, overwrite_flag=False)
Saves a single nodegroup to the store, fails if nodegroup_id already exists unless overwrite_flag
- semtk3.store_nodegroups(folder_path)
- semtk3.upload_owl(owl_file_path, conn_json_str, user_name='noone', password='nopass', model_or_data='model', conn_index=0)
Upload an owl file to a given graph
- Parameters
owl_file_path¶ – path to the file
conn_json_str¶ – connection json string
user_name¶ – optional user name
password¶ – optional password
model_or_data¶ – optional “model” or “data” specifying which graph in the sparql connection, defaults to “model”
conn_index¶ – index specifying which of the model or data graphs in the sparql connection, defaults to 0
- Returns
message
- Return type
string
- semtk3.upload_turtle(ttl_file_path, conn_json_str, user_name, password, model_or_data='model', conn_index=0)
Upload an turtle file
- Parameters
ttl_file_path¶ – path to the file
conn_json_str¶ – connection json string
user_name¶ – optional user name
password¶ – optional password
model_or_data¶ – optional “model” or “data” specifying which endpoint in the sparql connection, defaults to “model”
conn_index¶ – index specifying which of the model or data endpoints in the sparql connection, defaults to 0
- Returns
message
- Return type
string
Submodules
- semtk3.clients module
- semtk3.demo module
- semtk3.edcclient module
- semtk3.fdccacheclient module
- semtk3.nodegroupclient module
- semtk3.nodegroupexecclient module
NodegroupExecClient
NodegroupExecClient.USE_NODEGROUP_CONN
NodegroupExecClient.exec_async_dispatch_count_by_id()
NodegroupExecClient.exec_async_dispatch_filter_by_id()
NodegroupExecClient.exec_async_dispatch_query_by_id()
NodegroupExecClient.exec_async_dispatch_query_from_nodegroup()
NodegroupExecClient.exec_async_dispatch_raw_sparql()
NodegroupExecClient.exec_async_dispatch_select_by_id()
NodegroupExecClient.exec_async_ingest_from_csv()
NodegroupExecClient.exec_copy_graph()
NodegroupExecClient.exec_dispatch_clear_graph()
NodegroupExecClient.exec_dispatch_combine_entities()
NodegroupExecClient.exec_dispatch_combine_entities_in_conn()
NodegroupExecClient.exec_dispatch_combine_entities_table()
NodegroupExecClient.exec_get_runtime_constraints_by_id()
- semtk3.nodegroupstoreclient module
- semtk3.oinfoclient module
- semtk3.queryclient module
- semtk3.restclient module
- semtk3.resultsclient module
- semtk3.runtimeconstraint module
RuntimeConstraint
RuntimeConstraint.OP_GREATERTHAN
RuntimeConstraint.OP_GREATERTHANOREQUALS
RuntimeConstraint.OP_LESSTHAN
RuntimeConstraint.OP_LESSTHANOREQUALS
RuntimeConstraint.OP_MATCHES
RuntimeConstraint.OP_NOTMATCHES
RuntimeConstraint.OP_REGEX
RuntimeConstraint.OP_VALUEBETWEEN
RuntimeConstraint.OP_VALUEBETWEENUNINCLUSIVE
RuntimeConstraint.to_json()
- semtk3.semtkasyncclient module
SemTkAsyncClient
SemTkAsyncClient.PERCENT_INCREMENT
SemTkAsyncClient.PRINT_DOTS
SemTkAsyncClient.WAIT_MSEC
SemTkAsyncClient.exec_get_job_completion_percentage()
SemTkAsyncClient.exec_get_results_table()
SemTkAsyncClient.exec_job_status_boolean()
SemTkAsyncClient.exec_job_status_message()
SemTkAsyncClient.exec_wait_for_percent_or_msec()
SemTkAsyncClient.poll_until_success()
SemTkAsyncClient.post_async_to_json_blob()
SemTkAsyncClient.post_async_to_record_process()
SemTkAsyncClient.post_async_to_status()
SemTkAsyncClient.post_async_to_table()
SemTkAsyncClient.post_get_json_blob_results()
SemTkAsyncClient.post_get_percent_complete()
SemTkAsyncClient.post_get_status_boolean()
SemTkAsyncClient.post_get_status_message()
SemTkAsyncClient.post_get_table_results()
SemTkAsyncClient.post_wait_for_percent_or_msec()
- semtk3.semtkclient module
SemTkClient
SemTkClient.JOB_ID_KEY
SemTkClient.RESULT_TYPE_KEY
SemTkClient.WARNINGS_KEY
SemTkClient.get_simple_field()
SemTkClient.get_simple_field_int()
SemTkClient.get_simple_field_str()
SemTkClient.ping()
SemTkClient.post_to_jobid()
SemTkClient.post_to_jobid_warnings()
SemTkClient.post_to_record_process()
SemTkClient.post_to_simple()
SemTkClient.post_to_status()
SemTkClient.post_to_table()
- semtk3.semtktable module
SemtkTable
SemtkTable.create_table_dict()
SemtkTable.delete_column()
SemtkTable.get_cell()
SemtkTable.get_cell_as_date()
SemtkTable.get_cell_as_float()
SemtkTable.get_cell_as_int()
SemtkTable.get_cell_as_string()
SemtkTable.get_cell_typed()
SemtkTable.get_column()
SemtkTable.get_column_index()
SemtkTable.get_column_names()
SemtkTable.get_column_type()
SemtkTable.get_column_types()
SemtkTable.get_csv_string()
SemtkTable.get_matching_row_nums()
SemtkTable.get_matching_rows()
SemtkTable.get_num_columns()
SemtkTable.get_num_rows()
SemtkTable.get_pandas_data()
SemtkTable.get_rows()
SemtkTable.has_column()
SemtkTable.set_cell()
SemtkTable.to_dict()
SemtkTable.to_json_str()
- semtk3.sparqlconnection module
SparqlConnection
SparqlConnection.DATA
SparqlConnection.MODEL
SparqlConnection.build()
SparqlConnection.get_all_triplestore_urls()
SparqlConnection.get_graph()
SparqlConnection.get_password()
SparqlConnection.get_server_and_port()
SparqlConnection.get_server_type()
SparqlConnection.get_user_name()
SparqlConnection.to_conn_str()
- semtk3.statusclient module
- semtk3.util module