Basic SQL Server Setup
Go ahead and gather the basic details:
- Database host or IP Address
- Database port # (default is 1433)
SQL Server Specifications Required:
- SQL Server versions 2012-2019 (talk w/ support if other versions needed)
- Determine which way to connect: TLS or SSH Tunneling
- As of July 2021 only the default TLS approach is available. This is just like if you were to connect from SQL Server Management Studio for example.
- Firewalls on your database server should allow incoming connections through the public internet on your SQL Server port (typically 1433 unless your network guys have changed it). Be sure to whitelist our DataLakeHouse.io IP addresses for your database server network access.
Scroll down to ensure you review all steps, as needed...
Remember DataLakeHouse.io connects to your database instance with credentials supplied by you. We store your credentials securely with our bank-grade protocols.
- Grant the new user Permissions on Database, Schemas, Tables
- The new user needs SELECT permissions for the database, schemas, tables, or any specified columns that will be synchronized.
- We recommend granting SELECT access to everything in your specified database for this connection.
- Remember if you need to sync more databases, you need to create a separate source connection.
- Save the credentials somewhere and use them in the next steps.
- Please see other GRANT SELECT on schema and tables in our documentation, if you need to get more granular for security purposes.
- Enter your Credentials and Other Information in the Fields
- Enter in the Name/Alias field, the name you'll use within datalakehouse.io to differentiate this connection from others.
- Enter in the Target Schema Prefix field, the prefix of schema(s) that gets created on your destination target connection database for each of the schemas you load in this connection. So if your database has a schema named 'dbo' the target connection when synced will have a schema in that target database named the value of this field + '_dbo'.
- Alphanumeric characters only. Prefix must start and end with a letter but can contain an underscore(_).
- Enter in the Server/Host field, the name of the public server name or the IP Address (most customers use the IP for this field).
- Enter in the Port field, where this database is accessible and the firewall restrictions are open.
- Enter in the Database field, the name of the database to connect.
- Enter in the Username/Alias field, the username of user you created in the steps above to give access to DataLakeHouse.io.
- Leave the Auth Type field alone. It is set to password because DataLakeHouse is using TLS and requires username and password credentials to access the database.
- Enter in the Password field, the password for the user you created in the steps above.
- Click Save & Test
- This attempts to connect to your database with the credentials provided.
- A message of success or failure will be shown.
- If successful, you'll be prompted with the schema objects of the database and will need to complete the final steps for configuration shown below.
- If the test connection fails, the connection is still saved, but you will need to correct the failure based on the information provided within the error message.
This section of steps ensures you have coverage of other important steps required on your database side and in DataLakeHouse.io once you have completed the above test connection successfully.
- To capture incremental loads of the database records being synchronized and to reduce cost we use this native feature of SQL Server through Change Tracking (CT) or Change Data Capture (CDC). You could enable both if that is your desire but most companies set up one or the other based on the needs/content of the table. You must set up at least one of these on each table that you require to be synchronized.
- Determine which you will use and on which tables.
- Use CDC: typically if you want to capture incremental changes when you are synchronizing data incrementally through the day (or more inline with the frequency of data change). We recommend that you do not use the net changes option when setting up CDC on your tables using the MS SQL Server commands.
- Use Change Tracking (CT): typically when you just care about the change that happens in general during the sync frequency, or said another way, just getting the change at the end of the day.
Remember DataLakeHouse.io connects to your database instance with credentials supplied by you. We store your credentials securely with our bank-grade protocols. DataLakeHouse.io will connect through SSH Tunnel TCP/IP or via standard internet traffic. You must have enabled SQL Server configuration to provide access to your SQL Server database(s) and ensure that you have whitelisted/grantlisted our DataLakeHouse.io IP addresses that will access your SQL Server database(s).
To verify or setup port configuration on your instance via the SQL Server Configuration, please follow these instructions:
- On a standard managed SQL Server machine, access your SQL Server Configuration Manager and access the SQL Server Network Configuration.
- Expand and access "Protocols for MSSQLSERVER" from the left menu.
- Ensure that TCP/P is with an ENABLED status. (If it is disabled, right-click and select 'Enable').
- Confirm the port and the IP address by right clicking on TCP/IP > Properties.
- Click on the IP Addresses tab and scroll down to the IPAll section.
- Confirm the IP address which is usually 1433. Enter the IP Address of 1433 if not previously entered already.
- Click the "Apply" button, then click the "OK" button on the prompt.
- Return to the main area of the SQL Server Configuration Manager and expand SQL Native Client XX.X Configuration.
- Click on the Client Protocols option then in the right side ensure the TCP/IP status is set to Enabled.
- Right-click on the TCP/IP row and verify the Default Port settings. This is typically 1433, but confirm the port or enter 1433, then click the "Apply" button.
- Return to the main area of the SQL Server Configuration Manager, and in the left side find, one of the top options, "SQL Server Services".
- Right-click on "SQL Server Services," then click the "Restart" option. This will confirm all the above (especially if any settings were modified). The restart should typically be complete in under 90 seconds.
Reference for the above from Microsoft SQL Server documentation.